public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/113678] New: SLP misses up vec_concat
@ 2024-01-31 2:33 pinskia at gcc dot gnu.org
2024-01-31 8:32 ` [Bug tree-optimization/113678] " rguenth at gcc dot gnu.org
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-31 2:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113678
Bug ID: 113678
Summary: SLP misses up vec_concat
Product: gcc
Version: 14.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: pinskia at gcc dot gnu.org
Target Milestone: ---
Target: x86_64
Take:
```
void f(char *a, char *b)
{
int b0 = b[0];
int b1 = b[1];
int b2 = b[2];
int b3 = b[3];
int b4 = 0;
int b5 = 0;
int b6 = 0;
int b7 = 0;
a[0] = b0;
a[1] = b1;
a[2] = b2;
a[3] = b3;
#if 0
asm("":::"memory");
#endif
a[4] = b0;
a[5] = b1;
a[6] = b2;
a[7] = b3;
}
```
On x86_64 we get some mess because SLP decides to do this:
```
_1 = *b_6(D);
_2 = MEM[(char *)b_6(D) + 1B];
_3 = MEM[(char *)b_6(D) + 2B];
_4 = MEM[(char *)b_6(D) + 3B];
_16 = {_1, _2, _3, _4, _1, _2, _3, _4};
```
But this is could be done as 2 stores (if we change the `#if 0` to `#if 1` we
get the better code):
```
vect__1.5_18 = MEM <vector(4) char> [(char *)b_6(D)];
MEM <vector(4) char> [(char *)a_7(D)] = vect__1.5_18;
MEM <vector(4) char> [(char *)a_7(D) + 4B] = vect__1.5_18;
```
Or we could get one store even like LLVM gets:
```
movd xmm0, dword ptr [rsi] # xmm0 = mem[0],zero,zero,zero
pshufd xmm0, xmm0, 0 # xmm0 = xmm0[0,0,0,0]
movq qword ptr [rdi], xmm0
ret
```
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/113678] SLP misses up vec_concat
2024-01-31 2:33 [Bug tree-optimization/113678] New: SLP misses up vec_concat pinskia at gcc dot gnu.org
@ 2024-01-31 8:32 ` rguenth at gcc dot gnu.org
2024-02-06 17:14 ` pinskia at gcc dot gnu.org
2024-02-06 17:46 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-31 8:32 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113678
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2024-01-31
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think the SLP tree we discover is sound:
t2.c:11:14: note: node 0x5db76f0 (max_nunits=8, refcnt=2) vector(8) char
t2.c:11:14: note: op template: *a_7(D) = _1;
t2.c:11:14: note: stmt 0 *a_7(D) = _1;
t2.c:11:14: note: stmt 1 MEM[(char *)a_7(D) + 1B] = _2;
t2.c:11:14: note: stmt 2 MEM[(char *)a_7(D) + 2B] = _3;
t2.c:11:14: note: stmt 3 MEM[(char *)a_7(D) + 3B] = _4;
t2.c:11:14: note: stmt 4 MEM[(char *)a_7(D) + 4B] = _1;
t2.c:11:14: note: stmt 5 MEM[(char *)a_7(D) + 5B] = _2;
t2.c:11:14: note: stmt 6 MEM[(char *)a_7(D) + 6B] = _3;
t2.c:11:14: note: stmt 7 MEM[(char *)a_7(D) + 7B] = _4;
t2.c:11:14: note: children 0x5db7778
t2.c:11:14: note: node 0x5db7778 (max_nunits=8, refcnt=2) vector(8) char
t2.c:11:14: note: op template: _1 = *b_6(D);
t2.c:11:14: note: stmt 0 _1 = *b_6(D);
t2.c:11:14: note: stmt 1 _2 = MEM[(char *)b_6(D) + 1B];
t2.c:11:14: note: stmt 2 _3 = MEM[(char *)b_6(D) + 2B];
t2.c:11:14: note: stmt 3 _4 = MEM[(char *)b_6(D) + 3B];
t2.c:11:14: note: stmt 4 _1 = *b_6(D);
t2.c:11:14: note: stmt 5 _2 = MEM[(char *)b_6(D) + 1B];
t2.c:11:14: note: stmt 6 _3 = MEM[(char *)b_6(D) + 2B];
t2.c:11:14: note: stmt 7 _4 = MEM[(char *)b_6(D) + 3B];
t2.c:11:14: note: load permutation { 0 1 2 3 0 1 2 3 }
the issue is as so often
t2.c:11:14: note: ==> examining statement: _1 = *b_6(D);
t2.c:11:14: missed: BB vectorization with gaps at the end of a load is not
supported
t2.c:3:19: missed: not vectorized: relevant stmt not supported: _1 = *b_6(D);
t2.c:11:14: note: Building vector operands of 0x5db7778 from scalars instead
where we are not applying much non-ad-hoc work to deal with those
"out-of-bound" accesses. The choice here would be obvious in doing
a single vector(4) load instead.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/113678] SLP misses up vec_concat
2024-01-31 2:33 [Bug tree-optimization/113678] New: SLP misses up vec_concat pinskia at gcc dot gnu.org
2024-01-31 8:32 ` [Bug tree-optimization/113678] " rguenth at gcc dot gnu.org
@ 2024-02-06 17:14 ` pinskia at gcc dot gnu.org
2024-02-06 17:46 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-06 17:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113678
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Noticed the same with:
```
void f(unsigned char *a, unsigned char *b, unsigned char *c)
{
unsigned char t[8];
t[0] = a[0];
t[1] = a[1];
t[2] = a[2];
t[3] = a[3];
t[4] = b[0];
t[5] = b[1];
t[6] = b[2];
t[7] = b[3];
c[0] = t[0];
c[1] = t[1];
c[2] = t[2];
c[3] = t[3];
c[4] = t[4];
c[5] = t[5];
c[6] = t[6];
c[7] = t[7];
}
```
Adding `-fno-tree-vectorize` gives the best code even.
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Bug tree-optimization/113678] SLP misses up vec_concat
2024-01-31 2:33 [Bug tree-optimization/113678] New: SLP misses up vec_concat pinskia at gcc dot gnu.org
2024-01-31 8:32 ` [Bug tree-optimization/113678] " rguenth at gcc dot gnu.org
2024-02-06 17:14 ` pinskia at gcc dot gnu.org
@ 2024-02-06 17:46 ` pinskia at gcc dot gnu.org
2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-06 17:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113678
--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note the SLP that happens in connection with the loop vectorizer actually does
a decent job ...
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-02-06 17:46 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-31 2:33 [Bug tree-optimization/113678] New: SLP misses up vec_concat pinskia at gcc dot gnu.org
2024-01-31 8:32 ` [Bug tree-optimization/113678] " rguenth at gcc dot gnu.org
2024-02-06 17:14 ` pinskia at gcc dot gnu.org
2024-02-06 17:46 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).