* [PATCH][GCC 12] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]
@ 2024-02-12 17:11 Alex Coplan
2024-02-14 11:18 ` Richard Sandiford
0 siblings, 1 reply; 4+ messages in thread
From: Alex Coplan @ 2024-02-12 17:11 UTC (permalink / raw)
To: gcc-patches; +Cc: Richard Sandiford, Kyrylo Tkachov, Richard Earnshaw
[-- Attachment #1: Type: text/plain, Size: 2929 bytes --]
This is a backport of the GCC 13 fix for PR111677 to the GCC 12 branch.
The only part of the patch that isn't a straight cherry-pick is due to
the TX iterator lacking TDmode for GCC 12, so this version adjusts
TX_V16QI accordingly.
Bootstrapped/regtested on aarch64-linux-gnu, the only changes in the
testsuite I saw were in
gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c where the dg-output
"READ of size 4 [...]" check appears to be flaky on the GCC 12 branch
since libhwasan gained the short granule tag feature, I've requested a
backport of the following patch (committed as
r13-100-g3771486daa1e904ceae6f3e135b28e58af33849f) which should fix that
(independent) issue for GCC 12:
https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645278.html
OK for the GCC 12 branch?
Thanks,
Alex
-- >8 --
The PR shows us ICEing due to an unrecognizable TFmode save emitted by
aarch64_process_components. The problem is that for T{I,F,D}mode we
conservatively require mems to be in range for x-register ldp/stp. That
is because (at least for TImode) it can be allocated to both GPRs and
FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
a q-register load/store.
As Richard pointed out in the PR, aarch64_get_separate_components
already checks that the offsets are suitable for a single load, so we
just need to choose a mode in aarch64_reg_save_mode that gives the full
q-register range. In this patch, we choose V16QImode as an alternative
16-byte "bag-of-bits" mode that doesn't have the artificial range
restrictions imposed on T{I,F,D}mode.
Unlike for GCC 14 we need additional handling in the load/store pair
code as various cases are not expecting to see V16QImode (particularly
the writeback patterns, but also aarch64_gen_load_pair).
gcc/ChangeLog:
PR target/111677
* config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use
V16QImode for the full 16-byte FPR saves in the vector PCS case.
(aarch64_gen_storewb_pair): Handle V16QImode.
(aarch64_gen_loadwb_pair): Likewise.
(aarch64_gen_load_pair): Likewise.
* config/aarch64/aarch64.md (loadwb_pair<TX:mode>_<P:mode>):
Rename to ...
(loadwb_pair<TX_V16QI:mode>_<P:mode>): ... this, extending to
V16QImode.
(storewb_pair<TX:mode>_<P:mode>): Rename to ...
(storewb_pair<TX_V16QI:mode>_<P:mode>): ... this, extending to
V16QImode.
* config/aarch64/iterators.md (TX_V16QI): New.
gcc/testsuite/ChangeLog:
PR target/111677
* gcc.target/aarch64/torture/pr111677.c: New test.
(cherry picked from commit 2bd8264a131ee1215d3bc6181722f9d30f5569c3)
---
gcc/config/aarch64/aarch64.cc | 13 ++++++-
gcc/config/aarch64/aarch64.md | 35 ++++++++++---------
gcc/config/aarch64/iterators.md | 3 ++
.../gcc.target/aarch64/torture/pr111677.c | 28 +++++++++++++++
4 files changed, 61 insertions(+), 18 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
[-- Attachment #2: 0001-GCC-12-aarch64-Avoid-out-of-range-shrink-wrapped-sav.patch --]
[-- Type: text/x-patch, Size: 5874 bytes --]
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 3bccd96a23d..2bbba323770 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -4135,7 +4135,7 @@ aarch64_reg_save_mode (unsigned int regno)
case ARM_PCS_SIMD:
/* The vector PCS saves the low 128 bits (which is the full
register on non-SVE targets). */
- return TFmode;
+ return V16QImode;
case ARM_PCS_SVE:
/* Use vectors of DImode for registers that need frame
@@ -8602,6 +8602,10 @@ aarch64_gen_storewb_pair (machine_mode mode, rtx base, rtx reg, rtx reg2,
return gen_storewb_pairtf_di (base, base, reg, reg2,
GEN_INT (-adjustment),
GEN_INT (UNITS_PER_VREG - adjustment));
+ case E_V16QImode:
+ return gen_storewb_pairv16qi_di (base, base, reg, reg2,
+ GEN_INT (-adjustment),
+ GEN_INT (UNITS_PER_VREG - adjustment));
default:
gcc_unreachable ();
}
@@ -8647,6 +8651,10 @@ aarch64_gen_loadwb_pair (machine_mode mode, rtx base, rtx reg, rtx reg2,
case E_TFmode:
return gen_loadwb_pairtf_di (base, base, reg, reg2, GEN_INT (adjustment),
GEN_INT (UNITS_PER_VREG));
+ case E_V16QImode:
+ return gen_loadwb_pairv16qi_di (base, base, reg, reg2,
+ GEN_INT (adjustment),
+ GEN_INT (UNITS_PER_VREG));
default:
gcc_unreachable ();
}
@@ -8730,6 +8738,9 @@ aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx mem1, rtx reg2,
case E_V4SImode:
return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2);
+ case E_V16QImode:
+ return gen_load_pairv16qiv16qi (reg1, mem1, reg2, mem2);
+
default:
gcc_unreachable ();
}
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index fb100bdf6b3..99f185718c9 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1874,17 +1874,18 @@ (define_insn "loadwb_pair<GPF:mode>_<P:mode>"
[(set_attr "type" "neon_load1_2reg")]
)
-(define_insn "loadwb_pair<TX:mode>_<P:mode>"
+(define_insn "loadwb_pair<TX_V16QI:mode>_<P:mode>"
[(parallel
[(set (match_operand:P 0 "register_operand" "=k")
- (plus:P (match_operand:P 1 "register_operand" "0")
- (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
- (set (match_operand:TX 2 "register_operand" "=w")
- (mem:TX (match_dup 1)))
- (set (match_operand:TX 3 "register_operand" "=w")
- (mem:TX (plus:P (match_dup 1)
+ (plus:P (match_operand:P 1 "register_operand" "0")
+ (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
+ (set (match_operand:TX_V16QI 2 "register_operand" "=w")
+ (mem:TX_V16QI (match_dup 1)))
+ (set (match_operand:TX_V16QI 3 "register_operand" "=w")
+ (mem:TX_V16QI (plus:P (match_dup 1)
(match_operand:P 5 "const_int_operand" "n"))))])]
- "TARGET_SIMD && INTVAL (operands[5]) == GET_MODE_SIZE (<TX:MODE>mode)"
+ "TARGET_SIMD
+ && known_eq (INTVAL (operands[5]), GET_MODE_SIZE (<TX_V16QI:MODE>mode))"
"ldp\\t%q2, %q3, [%1], %4"
[(set_attr "type" "neon_ldp_q")]
)
@@ -1923,20 +1924,20 @@ (define_insn "storewb_pair<GPF:mode>_<P:mode>"
[(set_attr "type" "neon_store1_2reg<q>")]
)
-(define_insn "storewb_pair<TX:mode>_<P:mode>"
+(define_insn "storewb_pair<TX_V16QI:mode>_<P:mode>"
[(parallel
[(set (match_operand:P 0 "register_operand" "=&k")
- (plus:P (match_operand:P 1 "register_operand" "0")
- (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
- (set (mem:TX (plus:P (match_dup 0)
+ (plus:P (match_operand:P 1 "register_operand" "0")
+ (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
+ (set (mem:TX_V16QI (plus:P (match_dup 0)
(match_dup 4)))
- (match_operand:TX 2 "register_operand" "w"))
- (set (mem:TX (plus:P (match_dup 0)
+ (match_operand:TX_V16QI 2 "register_operand" "w"))
+ (set (mem:TX_V16QI (plus:P (match_dup 0)
(match_operand:P 5 "const_int_operand" "n")))
- (match_operand:TX 3 "register_operand" "w"))])]
+ (match_operand:TX_V16QI 3 "register_operand" "w"))])]
"TARGET_SIMD
- && INTVAL (operands[5])
- == INTVAL (operands[4]) + GET_MODE_SIZE (<TX:MODE>mode)"
+ && known_eq (INTVAL (operands[5]),
+ INTVAL (operands[4]) + GET_MODE_SIZE (<TX_V16QI:MODE>mode))"
"stp\\t%q2, %q3, [%0, %4]!"
[(set_attr "type" "neon_stp_q")]
)
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 26a840d7fe9..d49e37893df 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -303,6 +303,9 @@ (define_mode_iterator VS [V2SI V4SI])
(define_mode_iterator TX [TI TF])
+;; TX plus V16QImode.
+(define_mode_iterator TX_V16QI [TI TF V16QI])
+
;; Advanced SIMD opaque structure modes.
(define_mode_iterator VSTRUCT [OI CI XI])
diff --git a/gcc/testsuite/gcc.target/aarch64/torture/pr111677.c b/gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
new file mode 100644
index 00000000000..6bb640c42c0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fopenmp } */
+/* { dg-options "-ffast-math -fstack-protector-strong -fopenmp" } */
+typedef struct {
+ long size_z;
+ int width;
+} dt_bilateral_t;
+typedef float dt_aligned_pixel_t[4];
+#pragma omp declare simd
+void dt_bilateral_splat(dt_bilateral_t *b) {
+ float *buf;
+ long offsets[8];
+ for (; b;) {
+ int firstrow;
+ for (int j = firstrow; j; j++)
+ for (int i; i < b->width; i++) {
+ dt_aligned_pixel_t contrib;
+ for (int k = 0; k < 4; k++)
+ buf[offsets[k]] += contrib[k];
+ }
+ float *dest;
+ for (int j = (long)b; j; j++) {
+ float *src = (float *)b->size_z;
+ for (int i = 0; i < (long)b; i++)
+ dest[i] += src[i];
+ }
+ }
+}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH][GCC 12] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]
2024-02-12 17:11 [PATCH][GCC 12] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677] Alex Coplan
@ 2024-02-14 11:18 ` Richard Sandiford
2024-02-15 16:06 ` Alex Coplan
0 siblings, 1 reply; 4+ messages in thread
From: Richard Sandiford @ 2024-02-14 11:18 UTC (permalink / raw)
To: Alex Coplan; +Cc: gcc-patches, Kyrylo Tkachov, Richard Earnshaw
Alex Coplan <alex.coplan@arm.com> writes:
> This is a backport of the GCC 13 fix for PR111677 to the GCC 12 branch.
> The only part of the patch that isn't a straight cherry-pick is due to
> the TX iterator lacking TDmode for GCC 12, so this version adjusts
> TX_V16QI accordingly.
>
> Bootstrapped/regtested on aarch64-linux-gnu, the only changes in the
> testsuite I saw were in
> gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c where the dg-output
> "READ of size 4 [...]" check appears to be flaky on the GCC 12 branch
> since libhwasan gained the short granule tag feature, I've requested a
> backport of the following patch (committed as
> r13-100-g3771486daa1e904ceae6f3e135b28e58af33849f) which should fix that
> (independent) issue for GCC 12:
> https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645278.html
>
> OK for the GCC 12 branch?
OK, thanks.
Richard
> Thanks,
> Alex
>
> -- >8 --
>
> The PR shows us ICEing due to an unrecognizable TFmode save emitted by
> aarch64_process_components. The problem is that for T{I,F,D}mode we
> conservatively require mems to be in range for x-register ldp/stp. That
> is because (at least for TImode) it can be allocated to both GPRs and
> FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
> a q-register load/store.
>
> As Richard pointed out in the PR, aarch64_get_separate_components
> already checks that the offsets are suitable for a single load, so we
> just need to choose a mode in aarch64_reg_save_mode that gives the full
> q-register range. In this patch, we choose V16QImode as an alternative
> 16-byte "bag-of-bits" mode that doesn't have the artificial range
> restrictions imposed on T{I,F,D}mode.
>
> Unlike for GCC 14 we need additional handling in the load/store pair
> code as various cases are not expecting to see V16QImode (particularly
> the writeback patterns, but also aarch64_gen_load_pair).
>
> gcc/ChangeLog:
>
> PR target/111677
> * config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use
> V16QImode for the full 16-byte FPR saves in the vector PCS case.
> (aarch64_gen_storewb_pair): Handle V16QImode.
> (aarch64_gen_loadwb_pair): Likewise.
> (aarch64_gen_load_pair): Likewise.
> * config/aarch64/aarch64.md (loadwb_pair<TX:mode>_<P:mode>):
> Rename to ...
> (loadwb_pair<TX_V16QI:mode>_<P:mode>): ... this, extending to
> V16QImode.
> (storewb_pair<TX:mode>_<P:mode>): Rename to ...
> (storewb_pair<TX_V16QI:mode>_<P:mode>): ... this, extending to
> V16QImode.
> * config/aarch64/iterators.md (TX_V16QI): New.
>
> gcc/testsuite/ChangeLog:
>
> PR target/111677
> * gcc.target/aarch64/torture/pr111677.c: New test.
>
> (cherry picked from commit 2bd8264a131ee1215d3bc6181722f9d30f5569c3)
> ---
> gcc/config/aarch64/aarch64.cc | 13 ++++++-
> gcc/config/aarch64/aarch64.md | 35 ++++++++++---------
> gcc/config/aarch64/iterators.md | 3 ++
> .../gcc.target/aarch64/torture/pr111677.c | 28 +++++++++++++++
> 4 files changed, 61 insertions(+), 18 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 3bccd96a23d..2bbba323770 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -4135,7 +4135,7 @@ aarch64_reg_save_mode (unsigned int regno)
> case ARM_PCS_SIMD:
> /* The vector PCS saves the low 128 bits (which is the full
> register on non-SVE targets). */
> - return TFmode;
> + return V16QImode;
>
> case ARM_PCS_SVE:
> /* Use vectors of DImode for registers that need frame
> @@ -8602,6 +8602,10 @@ aarch64_gen_storewb_pair (machine_mode mode, rtx base, rtx reg, rtx reg2,
> return gen_storewb_pairtf_di (base, base, reg, reg2,
> GEN_INT (-adjustment),
> GEN_INT (UNITS_PER_VREG - adjustment));
> + case E_V16QImode:
> + return gen_storewb_pairv16qi_di (base, base, reg, reg2,
> + GEN_INT (-adjustment),
> + GEN_INT (UNITS_PER_VREG - adjustment));
> default:
> gcc_unreachable ();
> }
> @@ -8647,6 +8651,10 @@ aarch64_gen_loadwb_pair (machine_mode mode, rtx base, rtx reg, rtx reg2,
> case E_TFmode:
> return gen_loadwb_pairtf_di (base, base, reg, reg2, GEN_INT (adjustment),
> GEN_INT (UNITS_PER_VREG));
> + case E_V16QImode:
> + return gen_loadwb_pairv16qi_di (base, base, reg, reg2,
> + GEN_INT (adjustment),
> + GEN_INT (UNITS_PER_VREG));
> default:
> gcc_unreachable ();
> }
> @@ -8730,6 +8738,9 @@ aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx mem1, rtx reg2,
> case E_V4SImode:
> return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2);
>
> + case E_V16QImode:
> + return gen_load_pairv16qiv16qi (reg1, mem1, reg2, mem2);
> +
> default:
> gcc_unreachable ();
> }
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index fb100bdf6b3..99f185718c9 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -1874,17 +1874,18 @@ (define_insn "loadwb_pair<GPF:mode>_<P:mode>"
> [(set_attr "type" "neon_load1_2reg")]
> )
>
> -(define_insn "loadwb_pair<TX:mode>_<P:mode>"
> +(define_insn "loadwb_pair<TX_V16QI:mode>_<P:mode>"
> [(parallel
> [(set (match_operand:P 0 "register_operand" "=k")
> - (plus:P (match_operand:P 1 "register_operand" "0")
> - (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
> - (set (match_operand:TX 2 "register_operand" "=w")
> - (mem:TX (match_dup 1)))
> - (set (match_operand:TX 3 "register_operand" "=w")
> - (mem:TX (plus:P (match_dup 1)
> + (plus:P (match_operand:P 1 "register_operand" "0")
> + (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
> + (set (match_operand:TX_V16QI 2 "register_operand" "=w")
> + (mem:TX_V16QI (match_dup 1)))
> + (set (match_operand:TX_V16QI 3 "register_operand" "=w")
> + (mem:TX_V16QI (plus:P (match_dup 1)
> (match_operand:P 5 "const_int_operand" "n"))))])]
> - "TARGET_SIMD && INTVAL (operands[5]) == GET_MODE_SIZE (<TX:MODE>mode)"
> + "TARGET_SIMD
> + && known_eq (INTVAL (operands[5]), GET_MODE_SIZE (<TX_V16QI:MODE>mode))"
> "ldp\\t%q2, %q3, [%1], %4"
> [(set_attr "type" "neon_ldp_q")]
> )
> @@ -1923,20 +1924,20 @@ (define_insn "storewb_pair<GPF:mode>_<P:mode>"
> [(set_attr "type" "neon_store1_2reg<q>")]
> )
>
> -(define_insn "storewb_pair<TX:mode>_<P:mode>"
> +(define_insn "storewb_pair<TX_V16QI:mode>_<P:mode>"
> [(parallel
> [(set (match_operand:P 0 "register_operand" "=&k")
> - (plus:P (match_operand:P 1 "register_operand" "0")
> - (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
> - (set (mem:TX (plus:P (match_dup 0)
> + (plus:P (match_operand:P 1 "register_operand" "0")
> + (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
> + (set (mem:TX_V16QI (plus:P (match_dup 0)
> (match_dup 4)))
> - (match_operand:TX 2 "register_operand" "w"))
> - (set (mem:TX (plus:P (match_dup 0)
> + (match_operand:TX_V16QI 2 "register_operand" "w"))
> + (set (mem:TX_V16QI (plus:P (match_dup 0)
> (match_operand:P 5 "const_int_operand" "n")))
> - (match_operand:TX 3 "register_operand" "w"))])]
> + (match_operand:TX_V16QI 3 "register_operand" "w"))])]
> "TARGET_SIMD
> - && INTVAL (operands[5])
> - == INTVAL (operands[4]) + GET_MODE_SIZE (<TX:MODE>mode)"
> + && known_eq (INTVAL (operands[5]),
> + INTVAL (operands[4]) + GET_MODE_SIZE (<TX_V16QI:MODE>mode))"
> "stp\\t%q2, %q3, [%0, %4]!"
> [(set_attr "type" "neon_stp_q")]
> )
> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 26a840d7fe9..d49e37893df 100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -303,6 +303,9 @@ (define_mode_iterator VS [V2SI V4SI])
>
> (define_mode_iterator TX [TI TF])
>
> +;; TX plus V16QImode.
> +(define_mode_iterator TX_V16QI [TI TF V16QI])
> +
> ;; Advanced SIMD opaque structure modes.
> (define_mode_iterator VSTRUCT [OI CI XI])
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/torture/pr111677.c b/gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
> new file mode 100644
> index 00000000000..6bb640c42c0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target fopenmp } */
> +/* { dg-options "-ffast-math -fstack-protector-strong -fopenmp" } */
> +typedef struct {
> + long size_z;
> + int width;
> +} dt_bilateral_t;
> +typedef float dt_aligned_pixel_t[4];
> +#pragma omp declare simd
> +void dt_bilateral_splat(dt_bilateral_t *b) {
> + float *buf;
> + long offsets[8];
> + for (; b;) {
> + int firstrow;
> + for (int j = firstrow; j; j++)
> + for (int i; i < b->width; i++) {
> + dt_aligned_pixel_t contrib;
> + for (int k = 0; k < 4; k++)
> + buf[offsets[k]] += contrib[k];
> + }
> + float *dest;
> + for (int j = (long)b; j; j++) {
> + float *src = (float *)b->size_z;
> + for (int i = 0; i < (long)b; i++)
> + dest[i] += src[i];
> + }
> + }
> +}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH][GCC 12] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]
2024-02-14 11:18 ` Richard Sandiford
@ 2024-02-15 16:06 ` Alex Coplan
2024-02-20 10:05 ` Richard Sandiford
0 siblings, 1 reply; 4+ messages in thread
From: Alex Coplan @ 2024-02-15 16:06 UTC (permalink / raw)
To: gcc-patches, Kyrylo Tkachov, Richard Earnshaw, richard.sandiford
On 14/02/2024 11:18, Richard Sandiford wrote:
> Alex Coplan <alex.coplan@arm.com> writes:
> > This is a backport of the GCC 13 fix for PR111677 to the GCC 12 branch.
> > The only part of the patch that isn't a straight cherry-pick is due to
> > the TX iterator lacking TDmode for GCC 12, so this version adjusts
> > TX_V16QI accordingly.
> >
> > Bootstrapped/regtested on aarch64-linux-gnu, the only changes in the
> > testsuite I saw were in
> > gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c where the dg-output
> > "READ of size 4 [...]" check appears to be flaky on the GCC 12 branch
> > since libhwasan gained the short granule tag feature, I've requested a
> > backport of the following patch (committed as
> > r13-100-g3771486daa1e904ceae6f3e135b28e58af33849f) which should fix that
> > (independent) issue for GCC 12:
> > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645278.html
> >
> > OK for the GCC 12 branch?
>
> OK, thanks.
Thanks. The patch cherry-picks cleanly on the GCC 11 branch, and
bootstraps/regtests OK there. Is it OK for GCC 11 too, even though the
issue is latent there (at least for the testcase in the patch)?
Alex
>
> Richard
>
> > Thanks,
> > Alex
> >
> > -- >8 --
> >
> > The PR shows us ICEing due to an unrecognizable TFmode save emitted by
> > aarch64_process_components. The problem is that for T{I,F,D}mode we
> > conservatively require mems to be in range for x-register ldp/stp. That
> > is because (at least for TImode) it can be allocated to both GPRs and
> > FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
> > a q-register load/store.
> >
> > As Richard pointed out in the PR, aarch64_get_separate_components
> > already checks that the offsets are suitable for a single load, so we
> > just need to choose a mode in aarch64_reg_save_mode that gives the full
> > q-register range. In this patch, we choose V16QImode as an alternative
> > 16-byte "bag-of-bits" mode that doesn't have the artificial range
> > restrictions imposed on T{I,F,D}mode.
> >
> > Unlike for GCC 14 we need additional handling in the load/store pair
> > code as various cases are not expecting to see V16QImode (particularly
> > the writeback patterns, but also aarch64_gen_load_pair).
> >
> > gcc/ChangeLog:
> >
> > PR target/111677
> > * config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use
> > V16QImode for the full 16-byte FPR saves in the vector PCS case.
> > (aarch64_gen_storewb_pair): Handle V16QImode.
> > (aarch64_gen_loadwb_pair): Likewise.
> > (aarch64_gen_load_pair): Likewise.
> > * config/aarch64/aarch64.md (loadwb_pair<TX:mode>_<P:mode>):
> > Rename to ...
> > (loadwb_pair<TX_V16QI:mode>_<P:mode>): ... this, extending to
> > V16QImode.
> > (storewb_pair<TX:mode>_<P:mode>): Rename to ...
> > (storewb_pair<TX_V16QI:mode>_<P:mode>): ... this, extending to
> > V16QImode.
> > * config/aarch64/iterators.md (TX_V16QI): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/111677
> > * gcc.target/aarch64/torture/pr111677.c: New test.
> >
> > (cherry picked from commit 2bd8264a131ee1215d3bc6181722f9d30f5569c3)
> > ---
> > gcc/config/aarch64/aarch64.cc | 13 ++++++-
> > gcc/config/aarch64/aarch64.md | 35 ++++++++++---------
> > gcc/config/aarch64/iterators.md | 3 ++
> > .../gcc.target/aarch64/torture/pr111677.c | 28 +++++++++++++++
> > 4 files changed, 61 insertions(+), 18 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
> >
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index 3bccd96a23d..2bbba323770 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -4135,7 +4135,7 @@ aarch64_reg_save_mode (unsigned int regno)
> > case ARM_PCS_SIMD:
> > /* The vector PCS saves the low 128 bits (which is the full
> > register on non-SVE targets). */
> > - return TFmode;
> > + return V16QImode;
> >
> > case ARM_PCS_SVE:
> > /* Use vectors of DImode for registers that need frame
> > @@ -8602,6 +8602,10 @@ aarch64_gen_storewb_pair (machine_mode mode, rtx base, rtx reg, rtx reg2,
> > return gen_storewb_pairtf_di (base, base, reg, reg2,
> > GEN_INT (-adjustment),
> > GEN_INT (UNITS_PER_VREG - adjustment));
> > + case E_V16QImode:
> > + return gen_storewb_pairv16qi_di (base, base, reg, reg2,
> > + GEN_INT (-adjustment),
> > + GEN_INT (UNITS_PER_VREG - adjustment));
> > default:
> > gcc_unreachable ();
> > }
> > @@ -8647,6 +8651,10 @@ aarch64_gen_loadwb_pair (machine_mode mode, rtx base, rtx reg, rtx reg2,
> > case E_TFmode:
> > return gen_loadwb_pairtf_di (base, base, reg, reg2, GEN_INT (adjustment),
> > GEN_INT (UNITS_PER_VREG));
> > + case E_V16QImode:
> > + return gen_loadwb_pairv16qi_di (base, base, reg, reg2,
> > + GEN_INT (adjustment),
> > + GEN_INT (UNITS_PER_VREG));
> > default:
> > gcc_unreachable ();
> > }
> > @@ -8730,6 +8738,9 @@ aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx mem1, rtx reg2,
> > case E_V4SImode:
> > return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2);
> >
> > + case E_V16QImode:
> > + return gen_load_pairv16qiv16qi (reg1, mem1, reg2, mem2);
> > +
> > default:
> > gcc_unreachable ();
> > }
> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > index fb100bdf6b3..99f185718c9 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -1874,17 +1874,18 @@ (define_insn "loadwb_pair<GPF:mode>_<P:mode>"
> > [(set_attr "type" "neon_load1_2reg")]
> > )
> >
> > -(define_insn "loadwb_pair<TX:mode>_<P:mode>"
> > +(define_insn "loadwb_pair<TX_V16QI:mode>_<P:mode>"
> > [(parallel
> > [(set (match_operand:P 0 "register_operand" "=k")
> > - (plus:P (match_operand:P 1 "register_operand" "0")
> > - (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
> > - (set (match_operand:TX 2 "register_operand" "=w")
> > - (mem:TX (match_dup 1)))
> > - (set (match_operand:TX 3 "register_operand" "=w")
> > - (mem:TX (plus:P (match_dup 1)
> > + (plus:P (match_operand:P 1 "register_operand" "0")
> > + (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
> > + (set (match_operand:TX_V16QI 2 "register_operand" "=w")
> > + (mem:TX_V16QI (match_dup 1)))
> > + (set (match_operand:TX_V16QI 3 "register_operand" "=w")
> > + (mem:TX_V16QI (plus:P (match_dup 1)
> > (match_operand:P 5 "const_int_operand" "n"))))])]
> > - "TARGET_SIMD && INTVAL (operands[5]) == GET_MODE_SIZE (<TX:MODE>mode)"
> > + "TARGET_SIMD
> > + && known_eq (INTVAL (operands[5]), GET_MODE_SIZE (<TX_V16QI:MODE>mode))"
> > "ldp\\t%q2, %q3, [%1], %4"
> > [(set_attr "type" "neon_ldp_q")]
> > )
> > @@ -1923,20 +1924,20 @@ (define_insn "storewb_pair<GPF:mode>_<P:mode>"
> > [(set_attr "type" "neon_store1_2reg<q>")]
> > )
> >
> > -(define_insn "storewb_pair<TX:mode>_<P:mode>"
> > +(define_insn "storewb_pair<TX_V16QI:mode>_<P:mode>"
> > [(parallel
> > [(set (match_operand:P 0 "register_operand" "=&k")
> > - (plus:P (match_operand:P 1 "register_operand" "0")
> > - (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
> > - (set (mem:TX (plus:P (match_dup 0)
> > + (plus:P (match_operand:P 1 "register_operand" "0")
> > + (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
> > + (set (mem:TX_V16QI (plus:P (match_dup 0)
> > (match_dup 4)))
> > - (match_operand:TX 2 "register_operand" "w"))
> > - (set (mem:TX (plus:P (match_dup 0)
> > + (match_operand:TX_V16QI 2 "register_operand" "w"))
> > + (set (mem:TX_V16QI (plus:P (match_dup 0)
> > (match_operand:P 5 "const_int_operand" "n")))
> > - (match_operand:TX 3 "register_operand" "w"))])]
> > + (match_operand:TX_V16QI 3 "register_operand" "w"))])]
> > "TARGET_SIMD
> > - && INTVAL (operands[5])
> > - == INTVAL (operands[4]) + GET_MODE_SIZE (<TX:MODE>mode)"
> > + && known_eq (INTVAL (operands[5]),
> > + INTVAL (operands[4]) + GET_MODE_SIZE (<TX_V16QI:MODE>mode))"
> > "stp\\t%q2, %q3, [%0, %4]!"
> > [(set_attr "type" "neon_stp_q")]
> > )
> > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> > index 26a840d7fe9..d49e37893df 100644
> > --- a/gcc/config/aarch64/iterators.md
> > +++ b/gcc/config/aarch64/iterators.md
> > @@ -303,6 +303,9 @@ (define_mode_iterator VS [V2SI V4SI])
> >
> > (define_mode_iterator TX [TI TF])
> >
> > +;; TX plus V16QImode.
> > +(define_mode_iterator TX_V16QI [TI TF V16QI])
> > +
> > ;; Advanced SIMD opaque structure modes.
> > (define_mode_iterator VSTRUCT [OI CI XI])
> >
> > diff --git a/gcc/testsuite/gcc.target/aarch64/torture/pr111677.c b/gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
> > new file mode 100644
> > index 00000000000..6bb640c42c0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
> > @@ -0,0 +1,28 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target fopenmp } */
> > +/* { dg-options "-ffast-math -fstack-protector-strong -fopenmp" } */
> > +typedef struct {
> > + long size_z;
> > + int width;
> > +} dt_bilateral_t;
> > +typedef float dt_aligned_pixel_t[4];
> > +#pragma omp declare simd
> > +void dt_bilateral_splat(dt_bilateral_t *b) {
> > + float *buf;
> > + long offsets[8];
> > + for (; b;) {
> > + int firstrow;
> > + for (int j = firstrow; j; j++)
> > + for (int i; i < b->width; i++) {
> > + dt_aligned_pixel_t contrib;
> > + for (int k = 0; k < 4; k++)
> > + buf[offsets[k]] += contrib[k];
> > + }
> > + float *dest;
> > + for (int j = (long)b; j; j++) {
> > + float *src = (float *)b->size_z;
> > + for (int i = 0; i < (long)b; i++)
> > + dest[i] += src[i];
> > + }
> > + }
> > +}
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH][GCC 12] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677]
2024-02-15 16:06 ` Alex Coplan
@ 2024-02-20 10:05 ` Richard Sandiford
0 siblings, 0 replies; 4+ messages in thread
From: Richard Sandiford @ 2024-02-20 10:05 UTC (permalink / raw)
To: Alex Coplan; +Cc: gcc-patches, Kyrylo Tkachov, Richard Earnshaw
Alex Coplan <alex.coplan@arm.com> writes:
> On 14/02/2024 11:18, Richard Sandiford wrote:
>> Alex Coplan <alex.coplan@arm.com> writes:
>> > This is a backport of the GCC 13 fix for PR111677 to the GCC 12 branch.
>> > The only part of the patch that isn't a straight cherry-pick is due to
>> > the TX iterator lacking TDmode for GCC 12, so this version adjusts
>> > TX_V16QI accordingly.
>> >
>> > Bootstrapped/regtested on aarch64-linux-gnu, the only changes in the
>> > testsuite I saw were in
>> > gcc/testsuite/c-c++-common/hwasan/large-aligned-1.c where the dg-output
>> > "READ of size 4 [...]" check appears to be flaky on the GCC 12 branch
>> > since libhwasan gained the short granule tag feature, I've requested a
>> > backport of the following patch (committed as
>> > r13-100-g3771486daa1e904ceae6f3e135b28e58af33849f) which should fix that
>> > (independent) issue for GCC 12:
>> > https://gcc.gnu.org/pipermail/gcc-patches/2024-February/645278.html
>> >
>> > OK for the GCC 12 branch?
>>
>> OK, thanks.
>
> Thanks. The patch cherry-picks cleanly on the GCC 11 branch, and
> bootstraps/regtests OK there. Is it OK for GCC 11 too, even though the
> issue is latent there (at least for the testcase in the patch)?
Yeah, I think we should apply it there too, since the bug is likely
to manifest with other testcases.
Richard
> Alex
>
>>
>> Richard
>>
>> > Thanks,
>> > Alex
>> >
>> > -- >8 --
>> >
>> > The PR shows us ICEing due to an unrecognizable TFmode save emitted by
>> > aarch64_process_components. The problem is that for T{I,F,D}mode we
>> > conservatively require mems to be in range for x-register ldp/stp. That
>> > is because (at least for TImode) it can be allocated to both GPRs and
>> > FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
>> > a q-register load/store.
>> >
>> > As Richard pointed out in the PR, aarch64_get_separate_components
>> > already checks that the offsets are suitable for a single load, so we
>> > just need to choose a mode in aarch64_reg_save_mode that gives the full
>> > q-register range. In this patch, we choose V16QImode as an alternative
>> > 16-byte "bag-of-bits" mode that doesn't have the artificial range
>> > restrictions imposed on T{I,F,D}mode.
>> >
>> > Unlike for GCC 14 we need additional handling in the load/store pair
>> > code as various cases are not expecting to see V16QImode (particularly
>> > the writeback patterns, but also aarch64_gen_load_pair).
>> >
>> > gcc/ChangeLog:
>> >
>> > PR target/111677
>> > * config/aarch64/aarch64.cc (aarch64_reg_save_mode): Use
>> > V16QImode for the full 16-byte FPR saves in the vector PCS case.
>> > (aarch64_gen_storewb_pair): Handle V16QImode.
>> > (aarch64_gen_loadwb_pair): Likewise.
>> > (aarch64_gen_load_pair): Likewise.
>> > * config/aarch64/aarch64.md (loadwb_pair<TX:mode>_<P:mode>):
>> > Rename to ...
>> > (loadwb_pair<TX_V16QI:mode>_<P:mode>): ... this, extending to
>> > V16QImode.
>> > (storewb_pair<TX:mode>_<P:mode>): Rename to ...
>> > (storewb_pair<TX_V16QI:mode>_<P:mode>): ... this, extending to
>> > V16QImode.
>> > * config/aarch64/iterators.md (TX_V16QI): New.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> > PR target/111677
>> > * gcc.target/aarch64/torture/pr111677.c: New test.
>> >
>> > (cherry picked from commit 2bd8264a131ee1215d3bc6181722f9d30f5569c3)
>> > ---
>> > gcc/config/aarch64/aarch64.cc | 13 ++++++-
>> > gcc/config/aarch64/aarch64.md | 35 ++++++++++---------
>> > gcc/config/aarch64/iterators.md | 3 ++
>> > .../gcc.target/aarch64/torture/pr111677.c | 28 +++++++++++++++
>> > 4 files changed, 61 insertions(+), 18 deletions(-)
>> > create mode 100644 gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
>> >
>> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
>> > index 3bccd96a23d..2bbba323770 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -4135,7 +4135,7 @@ aarch64_reg_save_mode (unsigned int regno)
>> > case ARM_PCS_SIMD:
>> > /* The vector PCS saves the low 128 bits (which is the full
>> > register on non-SVE targets). */
>> > - return TFmode;
>> > + return V16QImode;
>> >
>> > case ARM_PCS_SVE:
>> > /* Use vectors of DImode for registers that need frame
>> > @@ -8602,6 +8602,10 @@ aarch64_gen_storewb_pair (machine_mode mode, rtx base, rtx reg, rtx reg2,
>> > return gen_storewb_pairtf_di (base, base, reg, reg2,
>> > GEN_INT (-adjustment),
>> > GEN_INT (UNITS_PER_VREG - adjustment));
>> > + case E_V16QImode:
>> > + return gen_storewb_pairv16qi_di (base, base, reg, reg2,
>> > + GEN_INT (-adjustment),
>> > + GEN_INT (UNITS_PER_VREG - adjustment));
>> > default:
>> > gcc_unreachable ();
>> > }
>> > @@ -8647,6 +8651,10 @@ aarch64_gen_loadwb_pair (machine_mode mode, rtx base, rtx reg, rtx reg2,
>> > case E_TFmode:
>> > return gen_loadwb_pairtf_di (base, base, reg, reg2, GEN_INT (adjustment),
>> > GEN_INT (UNITS_PER_VREG));
>> > + case E_V16QImode:
>> > + return gen_loadwb_pairv16qi_di (base, base, reg, reg2,
>> > + GEN_INT (adjustment),
>> > + GEN_INT (UNITS_PER_VREG));
>> > default:
>> > gcc_unreachable ();
>> > }
>> > @@ -8730,6 +8738,9 @@ aarch64_gen_load_pair (machine_mode mode, rtx reg1, rtx mem1, rtx reg2,
>> > case E_V4SImode:
>> > return gen_load_pairv4siv4si (reg1, mem1, reg2, mem2);
>> >
>> > + case E_V16QImode:
>> > + return gen_load_pairv16qiv16qi (reg1, mem1, reg2, mem2);
>> > +
>> > default:
>> > gcc_unreachable ();
>> > }
>> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
>> > index fb100bdf6b3..99f185718c9 100644
>> > --- a/gcc/config/aarch64/aarch64.md
>> > +++ b/gcc/config/aarch64/aarch64.md
>> > @@ -1874,17 +1874,18 @@ (define_insn "loadwb_pair<GPF:mode>_<P:mode>"
>> > [(set_attr "type" "neon_load1_2reg")]
>> > )
>> >
>> > -(define_insn "loadwb_pair<TX:mode>_<P:mode>"
>> > +(define_insn "loadwb_pair<TX_V16QI:mode>_<P:mode>"
>> > [(parallel
>> > [(set (match_operand:P 0 "register_operand" "=k")
>> > - (plus:P (match_operand:P 1 "register_operand" "0")
>> > - (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
>> > - (set (match_operand:TX 2 "register_operand" "=w")
>> > - (mem:TX (match_dup 1)))
>> > - (set (match_operand:TX 3 "register_operand" "=w")
>> > - (mem:TX (plus:P (match_dup 1)
>> > + (plus:P (match_operand:P 1 "register_operand" "0")
>> > + (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
>> > + (set (match_operand:TX_V16QI 2 "register_operand" "=w")
>> > + (mem:TX_V16QI (match_dup 1)))
>> > + (set (match_operand:TX_V16QI 3 "register_operand" "=w")
>> > + (mem:TX_V16QI (plus:P (match_dup 1)
>> > (match_operand:P 5 "const_int_operand" "n"))))])]
>> > - "TARGET_SIMD && INTVAL (operands[5]) == GET_MODE_SIZE (<TX:MODE>mode)"
>> > + "TARGET_SIMD
>> > + && known_eq (INTVAL (operands[5]), GET_MODE_SIZE (<TX_V16QI:MODE>mode))"
>> > "ldp\\t%q2, %q3, [%1], %4"
>> > [(set_attr "type" "neon_ldp_q")]
>> > )
>> > @@ -1923,20 +1924,20 @@ (define_insn "storewb_pair<GPF:mode>_<P:mode>"
>> > [(set_attr "type" "neon_store1_2reg<q>")]
>> > )
>> >
>> > -(define_insn "storewb_pair<TX:mode>_<P:mode>"
>> > +(define_insn "storewb_pair<TX_V16QI:mode>_<P:mode>"
>> > [(parallel
>> > [(set (match_operand:P 0 "register_operand" "=&k")
>> > - (plus:P (match_operand:P 1 "register_operand" "0")
>> > - (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
>> > - (set (mem:TX (plus:P (match_dup 0)
>> > + (plus:P (match_operand:P 1 "register_operand" "0")
>> > + (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
>> > + (set (mem:TX_V16QI (plus:P (match_dup 0)
>> > (match_dup 4)))
>> > - (match_operand:TX 2 "register_operand" "w"))
>> > - (set (mem:TX (plus:P (match_dup 0)
>> > + (match_operand:TX_V16QI 2 "register_operand" "w"))
>> > + (set (mem:TX_V16QI (plus:P (match_dup 0)
>> > (match_operand:P 5 "const_int_operand" "n")))
>> > - (match_operand:TX 3 "register_operand" "w"))])]
>> > + (match_operand:TX_V16QI 3 "register_operand" "w"))])]
>> > "TARGET_SIMD
>> > - && INTVAL (operands[5])
>> > - == INTVAL (operands[4]) + GET_MODE_SIZE (<TX:MODE>mode)"
>> > + && known_eq (INTVAL (operands[5]),
>> > + INTVAL (operands[4]) + GET_MODE_SIZE (<TX_V16QI:MODE>mode))"
>> > "stp\\t%q2, %q3, [%0, %4]!"
>> > [(set_attr "type" "neon_stp_q")]
>> > )
>> > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
>> > index 26a840d7fe9..d49e37893df 100644
>> > --- a/gcc/config/aarch64/iterators.md
>> > +++ b/gcc/config/aarch64/iterators.md
>> > @@ -303,6 +303,9 @@ (define_mode_iterator VS [V2SI V4SI])
>> >
>> > (define_mode_iterator TX [TI TF])
>> >
>> > +;; TX plus V16QImode.
>> > +(define_mode_iterator TX_V16QI [TI TF V16QI])
>> > +
>> > ;; Advanced SIMD opaque structure modes.
>> > (define_mode_iterator VSTRUCT [OI CI XI])
>> >
>> > diff --git a/gcc/testsuite/gcc.target/aarch64/torture/pr111677.c b/gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
>> > new file mode 100644
>> > index 00000000000..6bb640c42c0
>> > --- /dev/null
>> > +++ b/gcc/testsuite/gcc.target/aarch64/torture/pr111677.c
>> > @@ -0,0 +1,28 @@
>> > +/* { dg-do compile } */
>> > +/* { dg-require-effective-target fopenmp } */
>> > +/* { dg-options "-ffast-math -fstack-protector-strong -fopenmp" } */
>> > +typedef struct {
>> > + long size_z;
>> > + int width;
>> > +} dt_bilateral_t;
>> > +typedef float dt_aligned_pixel_t[4];
>> > +#pragma omp declare simd
>> > +void dt_bilateral_splat(dt_bilateral_t *b) {
>> > + float *buf;
>> > + long offsets[8];
>> > + for (; b;) {
>> > + int firstrow;
>> > + for (int j = firstrow; j; j++)
>> > + for (int i; i < b->width; i++) {
>> > + dt_aligned_pixel_t contrib;
>> > + for (int k = 0; k < 4; k++)
>> > + buf[offsets[k]] += contrib[k];
>> > + }
>> > + float *dest;
>> > + for (int j = (long)b; j; j++) {
>> > + float *src = (float *)b->size_z;
>> > + for (int i = 0; i < (long)b; i++)
>> > + dest[i] += src[i];
>> > + }
>> > + }
>> > +}
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-02-20 10:05 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-12 17:11 [PATCH][GCC 12] aarch64: Avoid out-of-range shrink-wrapped saves [PR111677] Alex Coplan
2024-02-14 11:18 ` Richard Sandiford
2024-02-15 16:06 ` Alex Coplan
2024-02-20 10:05 ` Richard Sandiford
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).