* [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]
@ 2023-07-21 17:55 Vineet Gupta
2023-07-21 18:15 ` Philipp Tomsich
2023-07-21 18:31 ` Palmer Dabbelt
0 siblings, 2 replies; 10+ messages in thread
From: Vineet Gupta @ 2023-07-21 17:55 UTC (permalink / raw)
To: gcc-patches, Manolis Tsamis
Cc: kito.cheng, Jeff Law, Palmer Dabbelt, gnu-toolchain, Vineet Gupta
DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.
void zd(double *) { *d = 0.0; }
currently:
| fmv.d.x fa5,zero
| fsd fa5,0(a0)
| ret
With patch
| sd zero,0(a0)
| ret
This came to light when testing the in-flight f-m-o patch where an ICE
was gettinh triggered due to lack of this pattern but turns out this
is an independent optimization of its own [1]
[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html
Apparently this is a regression in gcc-13, introduced by commit
ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
thus is a partial revert of that change.
Ran thru full multilib testsuite, there was 1 false failure due to
random string "lw" appearing in lto build assembler output,
which is also fixed in the patch.
gcc/Changelog:
* config/riscv/predicates.md (const_0_operand): Add back
const_double.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/pr110748-1.c: New Test.
* gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
patterns to avoid random string matches.
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
---
gcc/config/riscv/predicates.md | 2 +-
gcc/testsuite/gcc.target/riscv/pr110748-1.c | 10 ++++++++++
gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c | 8 ++++----
3 files changed, 15 insertions(+), 5 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 5a22c77f0cd0..9db28c2def7e 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -58,7 +58,7 @@
(match_test "INTVAL (op) + 1 != 0")))
(define_predicate "const_0_operand"
- (and (match_code "const_int,const_wide_int,const_vector")
+ (and (match_code "const_int,const_wide_int,const_double,const_vector")
(match_test "op == CONST0_RTX (GET_MODE (op))")))
(define_predicate "const_1_operand"
diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
new file mode 100644
index 000000000000..2f5bc08aae72
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target hard_float } */
+/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
+
+
+void zd(double *d) { *d = 0.0; }
+void zf(float *f) { *f = 0.0; }
+
+/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
+/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
index 1036044291e7..89eb48bed1b9 100644
--- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
+++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
@@ -18,7 +18,7 @@ d2ll (double d)
/* { dg-final { scan-assembler "th.fmv.hw.x" } } */
/* { dg-final { scan-assembler "fmv.x.w" } } */
/* { dg-final { scan-assembler "th.fmv.x.hw" } } */
-/* { dg-final { scan-assembler-not "sw" } } */
-/* { dg-final { scan-assembler-not "fld" } } */
-/* { dg-final { scan-assembler-not "fsd" } } */
-/* { dg-final { scan-assembler-not "lw" } } */
+/* { dg-final { scan-assembler-not "\tsw\t" } } */
+/* { dg-final { scan-assembler-not "\tfld\t" } } */
+/* { dg-final { scan-assembler-not "\tfsd\t" } } */
+/* { dg-final { scan-assembler-not "\tlw\t" } } */
--
2.34.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]
2023-07-21 17:55 [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748] Vineet Gupta
@ 2023-07-21 18:15 ` Philipp Tomsich
2023-07-21 18:23 ` Vineet Gupta
2023-07-21 18:31 ` Palmer Dabbelt
1 sibling, 1 reply; 10+ messages in thread
From: Philipp Tomsich @ 2023-07-21 18:15 UTC (permalink / raw)
To: Vineet Gupta
Cc: gcc-patches, Manolis Tsamis, kito.cheng, Jeff Law,
Palmer Dabbelt, gnu-toolchain
On Fri, 21 Jul 2023 at 19:56, Vineet Gupta <vineetg@rivosinc.com> wrote:
>
> DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.
>
> void zd(double *) { *d = 0.0; }
>
> currently:
>
> | fmv.d.x fa5,zero
> | fsd fa5,0(a0)
> | ret
>
> With patch
>
> | sd zero,0(a0)
> | ret
> This came to light when testing the in-flight f-m-o patch where an ICE
> was gettinh triggered due to lack of this pattern but turns out this
typo: "gettinh" -> "getting"
> is an independent optimization of its own [1]
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html
>
> Apparently this is a regression in gcc-13, introduced by commit
> ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
> thus is a partial revert of that change.
Should we add a "Fixes: "?
> Ran thru full multilib testsuite, there was 1 false failure due to
> random string "lw" appearing in lto build assembler output,
> which is also fixed in the patch.
>
> gcc/Changelog:
PR target/110748
>
> * config/riscv/predicates.md (const_0_operand): Add back
> const_double.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr110748-1.c: New Test.
> * gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
> patterns to avoid random string matches.
>
> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
> ---
> gcc/config/riscv/predicates.md | 2 +-
> gcc/testsuite/gcc.target/riscv/pr110748-1.c | 10 ++++++++++
> gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c | 8 ++++----
> 3 files changed, 15 insertions(+), 5 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c
>
> diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> index 5a22c77f0cd0..9db28c2def7e 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -58,7 +58,7 @@
> (match_test "INTVAL (op) + 1 != 0")))
>
> (define_predicate "const_0_operand"
> - (and (match_code "const_int,const_wide_int,const_vector")
> + (and (match_code "const_int,const_wide_int,const_double,const_vector")
> (match_test "op == CONST0_RTX (GET_MODE (op))")))
>
> (define_predicate "const_1_operand"
> diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
> new file mode 100644
> index 000000000000..2f5bc08aae72
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
> +/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
> +
> +
> +void zd(double *d) { *d = 0.0; }
> +void zf(float *f) { *f = 0.0; }
> +
> +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
> +/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> index 1036044291e7..89eb48bed1b9 100644
> --- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> @@ -18,7 +18,7 @@ d2ll (double d)
> /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
> /* { dg-final { scan-assembler "fmv.x.w" } } */
> /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
> -/* { dg-final { scan-assembler-not "sw" } } */
> -/* { dg-final { scan-assembler-not "fld" } } */
> -/* { dg-final { scan-assembler-not "fsd" } } */
> -/* { dg-final { scan-assembler-not "lw" } } */
> +/* { dg-final { scan-assembler-not "\tsw\t" } } */
> +/* { dg-final { scan-assembler-not "\tfld\t" } } */
> +/* { dg-final { scan-assembler-not "\tfsd\t" } } */
> +/* { dg-final { scan-assembler-not "\tlw\t" } } */
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]
2023-07-21 18:15 ` Philipp Tomsich
@ 2023-07-21 18:23 ` Vineet Gupta
0 siblings, 0 replies; 10+ messages in thread
From: Vineet Gupta @ 2023-07-21 18:23 UTC (permalink / raw)
To: Philipp Tomsich
Cc: gcc-patches, Manolis Tsamis, kito.cheng, Jeff Law,
Palmer Dabbelt, gnu-toolchain
On 7/21/23 11:15, Philipp Tomsich wrote:
> On Fri, 21 Jul 2023 at 19:56, Vineet Gupta <vineetg@rivosinc.com> wrote:
>> DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.
>>
>> void zd(double *) { *d = 0.0; }
>>
>> currently:
>>
>> | fmv.d.x fa5,zero
>> | fsd fa5,0(a0)
>> | ret
>>
>> With patch
>>
>> | sd zero,0(a0)
>> | ret
>> This came to light when testing the in-flight f-m-o patch where an ICE
>> was gettinh triggered due to lack of this pattern but turns out this
> typo: "gettinh" -> "getting"
Fixed.
>> is an independent optimization of its own [1]
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html
>>
>> Apparently this is a regression in gcc-13, introduced by commit
>> ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
>> thus is a partial revert of that change.
> Should we add a "Fixes: "?
Sure. Although gcc usage of Fixes tag seems slightly different than say
linux kernel's.
>
>> Ran thru full multilib testsuite, there was 1 false failure due to
>> random string "lw" appearing in lto build assembler output,
>> which is also fixed in the patch.
>>
>> gcc/Changelog:
> PR target/110748
Added.
Thx,
-Vineet
>
>> * config/riscv/predicates.md (const_0_operand): Add back
>> const_double.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/pr110748-1.c: New Test.
>> * gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
>> patterns to avoid random string matches.
>>
>> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
>> ---
>> gcc/config/riscv/predicates.md | 2 +-
>> gcc/testsuite/gcc.target/riscv/pr110748-1.c | 10 ++++++++++
>> gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c | 8 ++++----
>> 3 files changed, 15 insertions(+), 5 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c
>>
>> diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
>> index 5a22c77f0cd0..9db28c2def7e 100644
>> --- a/gcc/config/riscv/predicates.md
>> +++ b/gcc/config/riscv/predicates.md
>> @@ -58,7 +58,7 @@
>> (match_test "INTVAL (op) + 1 != 0")))
>>
>> (define_predicate "const_0_operand"
>> - (and (match_code "const_int,const_wide_int,const_vector")
>> + (and (match_code "const_int,const_wide_int,const_double,const_vector")
>> (match_test "op == CONST0_RTX (GET_MODE (op))")))
>>
>> (define_predicate "const_1_operand"
>> diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
>> new file mode 100644
>> index 000000000000..2f5bc08aae72
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
>> @@ -0,0 +1,10 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target hard_float } */
>> +/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
>> +
>> +
>> +void zd(double *d) { *d = 0.0; }
>> +void zf(float *f) { *f = 0.0; }
>> +
>> +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
>> +/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
>> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
>> index 1036044291e7..89eb48bed1b9 100644
>> --- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
>> +++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
>> @@ -18,7 +18,7 @@ d2ll (double d)
>> /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
>> /* { dg-final { scan-assembler "fmv.x.w" } } */
>> /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
>> -/* { dg-final { scan-assembler-not "sw" } } */
>> -/* { dg-final { scan-assembler-not "fld" } } */
>> -/* { dg-final { scan-assembler-not "fsd" } } */
>> -/* { dg-final { scan-assembler-not "lw" } } */
>> +/* { dg-final { scan-assembler-not "\tsw\t" } } */
>> +/* { dg-final { scan-assembler-not "\tfld\t" } } */
>> +/* { dg-final { scan-assembler-not "\tfsd\t" } } */
>> +/* { dg-final { scan-assembler-not "\tlw\t" } } */
>> --
>> 2.34.1
>>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]
2023-07-21 17:55 [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748] Vineet Gupta
2023-07-21 18:15 ` Philipp Tomsich
@ 2023-07-21 18:31 ` Palmer Dabbelt
2023-07-21 18:47 ` Jeff Law
` (2 more replies)
1 sibling, 3 replies; 10+ messages in thread
From: Palmer Dabbelt @ 2023-07-21 18:31 UTC (permalink / raw)
To: Vineet Gupta
Cc: gcc-patches, manolis.tsamis, Kito Cheng, jeffreyalaw,
gnu-toolchain, Vineet Gupta
On Fri, 21 Jul 2023 10:55:52 PDT (-0700), Vineet Gupta wrote:
> DF +0.0 is bitwise all zeros so int x0 store to mem can be used to optimize it.
>
> void zd(double *) { *d = 0.0; }
>
> currently:
>
> | fmv.d.x fa5,zero
> | fsd fa5,0(a0)
> | ret
>
> With patch
>
> | sd zero,0(a0)
> | ret
>
> This came to light when testing the in-flight f-m-o patch where an ICE
> was gettinh triggered due to lack of this pattern but turns out this
> is an independent optimization of its own [1]
>
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html
>
> Apparently this is a regression in gcc-13, introduced by commit
> ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
> thus is a partial revert of that change.
Given that it can ICE, we should probably backport it to 13.
> Ran thru full multilib testsuite, there was 1 false failure due to
Did you run the test with autovec? There's also a
pmode_reg_or_0_operand, some of those don't appear protected from FP
values. So we might need something like
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index cd5b19457f8..d8ce9223343 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -63,7 +63,7 @@ (define_expand "movmisalign<mode>"
(define_expand "len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
[(match_operand:VNX1_QHSD 0 "register_operand")
- (match_operand 1 "pmode_reg_or_0_operand")
+ (match_operand:P 1 "pmode_reg_or_0_operand")
(match_operand:VNX1_QHSDI 2 "register_operand")
(match_operand 3 "<VNX1_QHSD:gs_extension>")
(match_operand 4 "<VNX1_QHSD:gs_scale>")
a bunch of times, as there's a ton of them? I'm not entirely sure if that
could manifest as an actual bug, though...
> random string "lw" appearing in lto build assembler output,
> which is also fixed in the patch.
>
> gcc/Changelog:
>
> * config/riscv/predicates.md (const_0_operand): Add back
> const_double.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/pr110748-1.c: New Test.
> * gcc.target/riscv/xtheadfmv-fmv.c: Add '\t' around test
> patterns to avoid random string matches.
>
> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
> ---
> gcc/config/riscv/predicates.md | 2 +-
> gcc/testsuite/gcc.target/riscv/pr110748-1.c | 10 ++++++++++
> gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c | 8 ++++----
> 3 files changed, 15 insertions(+), 5 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/pr110748-1.c
>
> diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> index 5a22c77f0cd0..9db28c2def7e 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -58,7 +58,7 @@
> (match_test "INTVAL (op) + 1 != 0")))
>
> (define_predicate "const_0_operand"
> - (and (match_code "const_int,const_wide_int,const_vector")
> + (and (match_code "const_int,const_wide_int,const_double,const_vector")
> (match_test "op == CONST0_RTX (GET_MODE (op))")))
>
> (define_predicate "const_1_operand"
> diff --git a/gcc/testsuite/gcc.target/riscv/pr110748-1.c b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
> new file mode 100644
> index 000000000000..2f5bc08aae72
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/pr110748-1.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
> +/* { dg-options "-march=rv64g -mabi=lp64d -O2" } */
> +
> +
> +void zd(double *d) { *d = 0.0; }
> +void zf(float *f) { *f = 0.0; }
> +
> +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
> +/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
IIUC the pattern to emit fmv suffers from the same bug -- it's fixed in the same
way, but I think we might be able to come up with a test for it: `fmv.d.x FREG,
x0` would be the fastest way to generate 0.0, so maybe something like
double sum(double *d) {
double sum = 0;
for (int i = 0; i < 8; ++i)
sum += d[i];
return sum;
}
would do it? That's generating the fmv on 13 for me, though, so maybe I'm
missing something?`
> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> index 1036044291e7..89eb48bed1b9 100644
> --- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> +++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
> @@ -18,7 +18,7 @@ d2ll (double d)
> /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
> /* { dg-final { scan-assembler "fmv.x.w" } } */
> /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
> -/* { dg-final { scan-assembler-not "sw" } } */
> -/* { dg-final { scan-assembler-not "fld" } } */
> -/* { dg-final { scan-assembler-not "fsd" } } */
> -/* { dg-final { scan-assembler-not "lw" } } */
> +/* { dg-final { scan-assembler-not "\tsw\t" } } */
> +/* { dg-final { scan-assembler-not "\tfld\t" } } */
> +/* { dg-final { scan-assembler-not "\tfsd\t" } } */
> +/* { dg-final { scan-assembler-not "\tlw\t" } } */
I think that autovec one is the only possible dependency that might have snuck
in, so we should be safe otherwise. Thanks!
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]
2023-07-21 18:31 ` Palmer Dabbelt
@ 2023-07-21 18:47 ` Jeff Law
2023-07-25 23:05 ` Palmer Dabbelt
2023-07-21 18:55 ` Vineet Gupta
2023-07-21 19:37 ` Vineet Gupta
2 siblings, 1 reply; 10+ messages in thread
From: Jeff Law @ 2023-07-21 18:47 UTC (permalink / raw)
To: Palmer Dabbelt, Vineet Gupta
Cc: gcc-patches, manolis.tsamis, Kito Cheng, gnu-toolchain
On 7/21/23 12:31, Palmer Dabbelt wrote:
>
> (define_expand "len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
> [(match_operand:VNX1_QHSD 0 "register_operand")
> - (match_operand 1 "pmode_reg_or_0_operand")
> + (match_operand:P 1 "pmode_reg_or_0_operand")
> (match_operand:VNX1_QHSDI 2 "register_operand")
> (match_operand 3 "<VNX1_QHSD:gs_extension>")
> (match_operand 4 "<VNX1_QHSD:gs_scale>")
>
> a bunch of times, as there's a ton of them? I'm not entirely sure if that
> could manifest as an actual bug, though...
But won't this cause (const_int 0) to no longer match because CONST_INT
nodes are modeless (VOIDmode)?
Jeff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]
2023-07-21 18:31 ` Palmer Dabbelt
2023-07-21 18:47 ` Jeff Law
@ 2023-07-21 18:55 ` Vineet Gupta
2023-07-22 6:03 ` Jeff Law
2023-07-21 19:37 ` Vineet Gupta
2 siblings, 1 reply; 10+ messages in thread
From: Vineet Gupta @ 2023-07-21 18:55 UTC (permalink / raw)
To: Palmer Dabbelt
Cc: gcc-patches, manolis.tsamis, Kito Cheng, jeffreyalaw, gnu-toolchain
On 7/21/23 11:31, Palmer Dabbelt wrote:
> On Fri, 21 Jul 2023 10:55:52 PDT (-0700), Vineet Gupta wrote:
>> DF +0.0 is bitwise all zeros so int x0 store to mem can be used to
>> optimize it.
>>
>> void zd(double *) { *d = 0.0; }
>>
>> currently:
>>
>> | fmv.d.x fa5,zero
>> | fsd fa5,0(a0)
>> | ret
>>
>> With patch
>>
>> | sd zero,0(a0)
>> | ret
>>
>> This came to light when testing the in-flight f-m-o patch where an ICE
>> was gettinh triggered due to lack of this pattern but turns out this
>> is an independent optimization of its own [1]
>>
>> [1] https://gcc.gnu.org/pipermail/gcc-patches/2023-July/624857.html
>>
>> Apparently this is a regression in gcc-13, introduced by commit
>> ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
>> thus is a partial revert of that change.
>
> Given that it can ICE, we should probably backport it to 13.
FWIW ICE is on an in-flight for-gcc-14 patch, not something in tree
already. And this will merge ahead of that.
I'm fine with backport though.
>
>> Ran thru full multilib testsuite, there was 1 false failure due to
>
> Did you run the test with autovec?
I have standard 32/64 mutlilibs, but no 'v' in arch so autovec despite
being enabled at -O2 and above will not kick in.
I think we should add a 'v' multilib.
> There's also a pmode_reg_or_0_operand, some of those don't appear
> protected from FP values. So we might need something like
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index cd5b19457f8..d8ce9223343 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -63,7 +63,7 @@ (define_expand "movmisalign<mode>"
>
> (define_expand "len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
> [(match_operand:VNX1_QHSD 0 "register_operand")
> - (match_operand 1 "pmode_reg_or_0_operand")
> + (match_operand:P 1 "pmode_reg_or_0_operand")
> (match_operand:VNX1_QHSDI 2 "register_operand")
> (match_operand 3 "<VNX1_QHSD:gs_extension>")
> (match_operand 4 "<VNX1_QHSD:gs_scale>")
>
> a bunch of times, as there's a ton of them? I'm not entirely sure if
> that
> could manifest as an actual bug, though...
What does 'P' do here ?
>> +
>> +void zd(double *d) { *d = 0.0; }
>> +void zf(float *f) { *f = 0.0; }
>> +
>> +/* { dg-final { scan-assembler-not "\tfmv\\.d\\.x\t" } } */
>> +/* { dg-final { scan-assembler-not "\tfmv\\.s\\.x\t" } } */
>
> IIUC the pattern to emit fmv suffers from the same bug -- it's fixed
> in the same
> way, but I think we might be able to come up with a test for it:
> `fmv.d.x FREG,
> x0` would be the fastest way to generate 0.0, so maybe something like
>
> double sum(double *d) {
> double sum = 0;
> for (int i = 0; i < 8; ++i)
> sum += d[i];
> return sum;
> }
>
> would do it? That's generating the fmv on 13 for me, though, so maybe
> I'm
> missing something?`
I need to unpack this first :-)
>
>> diff --git a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
>> b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
>> index 1036044291e7..89eb48bed1b9 100644
>> --- a/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
>> +++ b/gcc/testsuite/gcc.target/riscv/xtheadfmv-fmv.c
>> @@ -18,7 +18,7 @@ d2ll (double d)
>> /* { dg-final { scan-assembler "th.fmv.hw.x" } } */
>> /* { dg-final { scan-assembler "fmv.x.w" } } */
>> /* { dg-final { scan-assembler "th.fmv.x.hw" } } */
>> -/* { dg-final { scan-assembler-not "sw" } } */
>> -/* { dg-final { scan-assembler-not "fld" } } */
>> -/* { dg-final { scan-assembler-not "fsd" } } */
>> -/* { dg-final { scan-assembler-not "lw" } } */
>> +/* { dg-final { scan-assembler-not "\tsw\t" } } */
>> +/* { dg-final { scan-assembler-not "\tfld\t" } } */
>> +/* { dg-final { scan-assembler-not "\tfsd\t" } } */
>> +/* { dg-final { scan-assembler-not "\tlw\t" } } */
>
> I think that autovec one is the only possible dependency that might
> have snuck
> in, so we should be safe otherwise. Thanks!
I'm not sure if this specific comment is related to the xthead test or
continuation of above.
For xthead it is real issue since I saw a random "lw" in lto assembler
output.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]
2023-07-21 18:31 ` Palmer Dabbelt
2023-07-21 18:47 ` Jeff Law
2023-07-21 18:55 ` Vineet Gupta
@ 2023-07-21 19:37 ` Vineet Gupta
2 siblings, 0 replies; 10+ messages in thread
From: Vineet Gupta @ 2023-07-21 19:37 UTC (permalink / raw)
To: Palmer Dabbelt
Cc: gcc-patches, manolis.tsamis, Kito Cheng, jeffreyalaw, gnu-toolchain
On 7/21/23 11:31, Palmer Dabbelt wrote:
>
> IIUC the pattern to emit fmv suffers from the same bug -- it's fixed
> in the same
> way, but I think we might be able to come up with a test for it:
> `fmv.d.x FREG,
> x0` would be the fastest way to generate 0.0, so maybe something like
>
> double sum(double *d) {
> double sum = 0;
> for (int i = 0; i < 8; ++i)
> sum += d[i];
> return sum;
> }
>
> would do it? That's generating the fmv on 13 for me, though, so maybe
> I'm
> missing something?`
I don't think we can avoid FMV in this case
fmv.d.x fa0,zero #1
addi a5,a0,64
.L2:
fld fa5,0(a0)
addi a0,a0,8
fadd.d fa0,fa0,fa5 #2
bne a0,a5,.L2
ret
In #1, the zero needs to be setup in FP reg (possible using FMV), since
in #2 it will be used for FP math.
If we change ur test slightly,
double zadd(double *d) {
double sum = 0.0;
for (int i = 0; i < 8; ++i)
d[i] = sum;
return sum;
}
We still get the optimal code for writing to FP 0. The last FMV is
unavoidable as we need an FP return reg.
addi a5,a0,64
.L2:
sd zero,0(a0)
addi a0,a0,8
bne a0,a5,.L2
fmv.d.x fa0,zero
ret
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]
2023-07-21 18:55 ` Vineet Gupta
@ 2023-07-22 6:03 ` Jeff Law
0 siblings, 0 replies; 10+ messages in thread
From: Jeff Law @ 2023-07-22 6:03 UTC (permalink / raw)
To: Vineet Gupta, Palmer Dabbelt
Cc: gcc-patches, manolis.tsamis, Kito Cheng, gnu-toolchain
On 7/21/23 12:55, Vineet Gupta wrote:
>>>
>>> Apparently this is a regression in gcc-13, introduced by commit
>>> ef85d150b5963 ("RISC-V: Enable TARGET_SUPPORTS_WIDE_INT") and the fix
>>> thus is a partial revert of that change.
>>
>> Given that it can ICE, we should probably backport it to 13.
>
> FWIW ICE is on an in-flight for-gcc-14 patch, not something in tree
> already. And this will merge ahead of that.
> I'm fine with backport though.
It's latent on the gcc-13 branch from an ICE standpoint and would
probably stay that way -- triggering would require something to
re-recognize the pattern after register allocation.
I won't object to it going into gcc-13. No strong opinions either way,
happy to go with consensus opinion.
>> There's also a pmode_reg_or_0_operand, some of those don't appear
>> protected from FP values. So we might need something like
>>
>> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
>> index cd5b19457f8..d8ce9223343 100644
>> --- a/gcc/config/riscv/autovec.md
>> +++ b/gcc/config/riscv/autovec.md
>> @@ -63,7 +63,7 @@ (define_expand "movmisalign<mode>"
>>
>> (define_expand "len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
>> [(match_operand:VNX1_QHSD 0 "register_operand")
>> - (match_operand 1 "pmode_reg_or_0_operand")
>> + (match_operand:P 1 "pmode_reg_or_0_operand")
>> (match_operand:VNX1_QHSDI 2 "register_operand")
>> (match_operand 3 "<VNX1_QHSD:gs_extension>")
>> (match_operand 4 "<VNX1_QHSD:gs_scale>")
>>
>> a bunch of times, as there's a ton of them? I'm not entirely sure if
>> that
>> could manifest as an actual bug, though...
>
> What does 'P' do here ?
It will force the operand to match the P mode iterator, which should be
DImode for rv64 and SImode for rv32. I'm not sure this is wise as I
think it'll end up rejecting (const_int 0) because CONST_INT nodes do
not have a mode.
Jeff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]
2023-07-21 18:47 ` Jeff Law
@ 2023-07-25 23:05 ` Palmer Dabbelt
2023-07-26 3:09 ` Jeff Law
0 siblings, 1 reply; 10+ messages in thread
From: Palmer Dabbelt @ 2023-07-25 23:05 UTC (permalink / raw)
To: gcc-patches
Cc: Vineet Gupta, gcc-patches, manolis.tsamis, Kito Cheng, gnu-toolchain
On Fri, 21 Jul 2023 11:47:58 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
> On 7/21/23 12:31, Palmer Dabbelt wrote:
>> (define_expand "len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
>> [(match_operand:VNX1_QHSD 0 "register_operand")
>> - (match_operand 1 "pmode_reg_or_0_operand")
>> + (match_operand:P 1 "pmode_reg_or_0_operand")
>> (match_operand:VNX1_QHSDI 2 "register_operand")
>> (match_operand 3 "<VNX1_QHSD:gs_extension>")
>> (match_operand 4 "<VNX1_QHSD:gs_scale>")
>>
>> a bunch of times, as there's a ton of them? I'm not entirely sure if that
>> could manifest as an actual bug, though...
> But won't this cause (const_int 0) to no longer match because CONST_INT
> nodes are modeless (VOIDmode)?
I poked around a bit and I'm not actually sure, I'm kind of lost on the docs
here. IIUC we're eliding the VOIDmode in the predicate correctly
(define_predicate "const_0_operand"
(and (match_code "const_int,const_wide_int,const_vector")
(match_test "op == CONST0_RTX (GET_MODE (op))")))
so we're OK there, otherwise we'd presumably have similar problems with
expanders like
(define_expand "subsi3"
[(set (match_operand:SI 0 "register_operand" "= r")
(minus:SI (match_operand:SI 1 "reg_or_0_operand" " rJ")
(match_operand:SI 2 "register_operand" " r")))]
""
which we have a few of -- though it'd be kind of a silent failure, as
presumably we'd just end up with some more move-x0s emitted?
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748]
2023-07-25 23:05 ` Palmer Dabbelt
@ 2023-07-26 3:09 ` Jeff Law
0 siblings, 0 replies; 10+ messages in thread
From: Jeff Law @ 2023-07-26 3:09 UTC (permalink / raw)
To: Palmer Dabbelt, gcc-patches
Cc: Vineet Gupta, manolis.tsamis, Kito Cheng, gnu-toolchain
On 7/25/23 17:05, Palmer Dabbelt wrote:
> On Fri, 21 Jul 2023 11:47:58 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
>> On 7/21/23 12:31, Palmer Dabbelt wrote:
>>> (define_expand "len_mask_gather_load<VNX1_QHSD:mode><VNX1_QHSDI:mode>"
>>> [(match_operand:VNX1_QHSD 0 "register_operand")
>>> - (match_operand 1 "pmode_reg_or_0_operand")
>>> + (match_operand:P 1 "pmode_reg_or_0_operand")
>>> (match_operand:VNX1_QHSDI 2 "register_operand")
>>> (match_operand 3 "<VNX1_QHSD:gs_extension>")
>>> (match_operand 4 "<VNX1_QHSD:gs_scale>")
>>>
>>> a bunch of times, as there's a ton of them? I'm not entirely sure if
>>> that
>>> could manifest as an actual bug, though...
>> But won't this cause (const_int 0) to no longer match because CONST_INT
>> nodes are modeless (VOIDmode)?
>
> I poked around a bit and I'm not actually sure, I'm kind of lost on the
> docs
> here. IIUC we're eliding the VOIDmode in the predicate correctly
>
> (define_predicate "const_0_operand"
> (and (match_code "const_int,const_wide_int,const_vector")
> (match_test "op == CONST0_RTX (GET_MODE (op))")))
>
> so we're OK there, otherwise we'd presumably have similar problems with
> expanders like
>
> (define_expand "subsi3"
> [(set (match_operand:SI 0 "register_operand" "= r")
> (minus:SI (match_operand:SI 1 "reg_or_0_operand" " rJ")
> (match_operand:SI 2 "register_operand" " r")))]
> ""
>
> which we have a few of -- though it'd be kind of a silent failure, as
> presumably we'd just end up with some more move-x0s emitted?
It's a bit messy to say the least. However, we can look at other ports
and after doing so I'm less sure my concern is valid.
Take the typical movXX pattern or expander. Both operands have a mode,
so things like CONST_INT must be passing through, even though they're
VOIDmode.
So it's probably a non-issue.
jeff
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-07-26 3:09 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-21 17:55 [PATCH] RISC-V: optim const DF +0.0 store to mem [PR/110748] Vineet Gupta
2023-07-21 18:15 ` Philipp Tomsich
2023-07-21 18:23 ` Vineet Gupta
2023-07-21 18:31 ` Palmer Dabbelt
2023-07-21 18:47 ` Jeff Law
2023-07-25 23:05 ` Palmer Dabbelt
2023-07-26 3:09 ` Jeff Law
2023-07-21 18:55 ` Vineet Gupta
2023-07-22 6:03 ` Jeff Law
2023-07-21 19:37 ` Vineet Gupta
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).