public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/106340] New: flag set from SVE svwhilelt intrinsic not reused in loop
@ 2022-07-18 13:01 yyc1992 at gmail dot com
2022-07-18 13:13 ` [Bug target/106340] " yyc1992 at gmail dot com
2022-07-20 14:35 ` yyc1992 at gmail dot com
0 siblings, 2 replies; 3+ messages in thread
From: yyc1992 at gmail dot com @ 2022-07-18 13:01 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106340
Bug ID: 106340
Summary: flag set from SVE svwhilelt intrinsic not reused in
loop
Product: gcc
Version: 12.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: yyc1992 at gmail dot com
Target Milestone: ---
I'm experimenting with manually writing VLA loops and trying to match the
assembly code I expect/from autovectorizer. One of the main area I can't get it
to work is when setting the loop predicate using the svwhilelt intrinsics. The
instruction it corresponds to set the flags and can be directly used to
terminate the loop. Indeed, when using the autovectorizer, this is exactly what
happens.
```
void set1(uint32_t *__restrict__ out, size_t m)
{
for (size_t i = 0; i < m; i++) {
out[i] = 1;
}
}
```
compiles to
```
cbz x1, .L1
mov x2, 0
cntw x3
whilelo p0.s, xzr, x1
mov z0.s, #1
.p2align 3,,7
.L3:
st1w z0.s, p0, [x0, x2, lsl 2]
add x2, x2, x3
whilelo p0.s, x2, x1
b.any .L3
.L1:
ret
```
(Here I believe the flag set from the loop header whilelo could also be used
for the jump but that doesn't same much in this case.)
However, no matter how I trie to replicate this using manually written code
using the sve intrinsics, there is always an additional cmp instruction
generated. The closest I can get is by replicating the structure of the
auto-vectorized loop as much as possible with,
```
void set2(uint32_t *__restrict__ out, size_t m)
{
auto svelen = svcntw();
auto v = svdup_u32(1);
if (m != 0) {
auto pg = svwhilelt_b32(0ul, m);
for (size_t i = 0; i < m; i += svelen, pg = svwhilelt_b32(i, m)) {
svst1(pg, &out[i], v);
}
}
}
```
which is compiled to
```
cbz x1, .L9
mov x2, 0
cntw x3
whilelo p0.s, xzr, x1
mov z0.s, #1
.p2align 3,,7
.L11:
st1w z0.s, p0, [x0, x2, lsl 2]
add x2, x2, x3
whilelo p0.s, x2, x1
cmp x1, x2
bhi .L11
.L9:
ret
```
which is literally the same code down to register allocation except that the
branch following the `whilelo` instruction is replaced with another comparison
and branch.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug target/106340] flag set from SVE svwhilelt intrinsic not reused in loop
2022-07-18 13:01 [Bug target/106340] New: flag set from SVE svwhilelt intrinsic not reused in loop yyc1992 at gmail dot com
@ 2022-07-18 13:13 ` yyc1992 at gmail dot com
2022-07-20 14:35 ` yyc1992 at gmail dot com
1 sibling, 0 replies; 3+ messages in thread
From: yyc1992 at gmail dot com @ 2022-07-18 13:13 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106340
--- Comment #1 from Yichao Yu <yyc1992 at gmail dot com> ---
Also note that this is for code I've tweaked to match what the finally code as
much as possible. For a complete implementation of this, I expect the loop
transformation done for normal loop should move the whilelt as well so that
source code like the following would generate pretty much the same code.
```
void set3(uint32_t *__restrict__ out, size_t m)
{
auto svelen = svcntw();
auto v = svdup_u32(1);
for (size_t i = 0; i < m; i += svelen) {
auto pg = svwhilelt_b32(i, m);
svst1(pg, &out[i], v);
}
}
```
Currently, while the cmp was moved to the end of the loop body and the loop
header, the whilelt that is meant to be paired with it did not so the flag from
the whilelt instruction isn't directly usable as is in the code.
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Bug target/106340] flag set from SVE svwhilelt intrinsic not reused in loop
2022-07-18 13:01 [Bug target/106340] New: flag set from SVE svwhilelt intrinsic not reused in loop yyc1992 at gmail dot com
2022-07-18 13:13 ` [Bug target/106340] " yyc1992 at gmail dot com
@ 2022-07-20 14:35 ` yyc1992 at gmail dot com
1 sibling, 0 replies; 3+ messages in thread
From: yyc1992 at gmail dot com @ 2022-07-20 14:35 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106340
Yichao Yu <yyc1992 at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |INVALID
Status|UNCONFIRMED |RESOLVED
--- Comment #2 from Yichao Yu <yyc1992 at gmail dot com> ---
Over at the llvm bug report, it was pointed out to me that the standard pattern
to use is to do the branch based on ptest intrinsics. It matches the flag
setting of the whilelt family of instructions better and gcc is already able to
omit the ptest instruction in such case.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-07-20 14:35 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-18 13:01 [Bug target/106340] New: flag set from SVE svwhilelt intrinsic not reused in loop yyc1992 at gmail dot com
2022-07-18 13:13 ` [Bug target/106340] " yyc1992 at gmail dot com
2022-07-20 14:35 ` yyc1992 at gmail dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).