public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/113517] New: vector SLP cost model should be improved
@ 2024-01-20 5:42 pinskia at gcc dot gnu.org
0 siblings, 0 replies; only message in thread
From: pinskia at gcc dot gnu.org @ 2024-01-20 5:42 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113517
Bug ID: 113517
Summary: vector SLP cost model should be improved
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: enhancement
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: pinskia at gcc dot gnu.org
Target Milestone: ---
Target: aarch64
Take:
```
void f(short *a, signed int *b)
{
int sum = 0;
sum += a[0];
sum += a[1];
sum += a[2];
sum += a[3];
*b = sum;
}
```
Right now by default this produces:
```
ldrsh w3, [x0]
ldrsh w4, [x0, 2]
ldrsh w2, [x0, 4]
add w3, w3, w4
ldrsh w0, [x0, 6]
add w2, w2, w3
add w0, w0, w2
str w0, [x1]
```
But disabling the cost model we get:
```
ldr d31, [x0]
saddlv s31, v31.4h
str s31, [x1]
ret
```
(note this is better code generation than what LLVM produces as that uses
sshll/addv ).
For most cores, doing a float (vector) load and one vector instruction and one
vector store is better than doing 4 scalar loads and 3 scalar instructions and
one scalar store. This is true on even ThunderX 1 and Cortex-A57.
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2024-01-20 5:42 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-20 5:42 [Bug target/113517] New: vector SLP cost model should be improved pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).