[Bug tree-optimization/113678] New: SLP misses up vec

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/113678] New: SLP misses up vec_concat
@ 2024-01-31  2:33 pinskia at gcc dot gnu.org
  2024-01-31  8:32 ` [Bug tree-optimization/113678] " rguenth at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-31  2:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113678

            Bug ID: 113678
           Summary: SLP misses up vec_concat
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: x86_64

Take:
```
void f(char *a, char *b)
{
        int b0 = b[0];
        int b1 = b[1];
        int b2 = b[2];
        int b3 = b[3];
        int b4 = 0;
        int b5 = 0;
        int b6 = 0;
        int b7 = 0;
        a[0] = b0;
        a[1] = b1;
        a[2] = b2;
        a[3] = b3;
#if 0
        asm("":::"memory");
#endif
        a[4] = b0;
        a[5] = b1;
        a[6] = b2;
        a[7] = b3;
}
```

On x86_64 we get some mess because SLP decides to do this:
```
  _1 = *b_6(D);
  _2 = MEM[(char *)b_6(D) + 1B];
  _3 = MEM[(char *)b_6(D) + 2B];
  _4 = MEM[(char *)b_6(D) + 3B];
  _16 = {_1, _2, _3, _4, _1, _2, _3, _4};
```

But this is could be done as 2 stores (if we change the `#if 0` to `#if 1` we
get the better code):
```
  vect__1.5_18 = MEM <vector(4) char> [(char *)b_6(D)];
  MEM <vector(4) char> [(char *)a_7(D)] = vect__1.5_18;
  MEM <vector(4) char> [(char *)a_7(D) + 4B] = vect__1.5_18;
```

Or we could get one store even like LLVM gets:
```
        movd    xmm0, dword ptr [rsi]           # xmm0 = mem[0],zero,zero,zero
        pshufd  xmm0, xmm0, 0                   # xmm0 = xmm0[0,0,0,0]
        movq    qword ptr [rdi], xmm0
        ret
```

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/113678] SLP misses up vec_concat
  2024-01-31  2:33 [Bug tree-optimization/113678] New: SLP misses up vec_concat pinskia at gcc dot gnu.org
@ 2024-01-31  8:32 ` rguenth at gcc dot gnu.org
  2024-02-06 17:14 ` pinskia at gcc dot gnu.org
  2024-02-06 17:46 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-01-31  8:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113678

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-01-31
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think the SLP tree we discover is sound:

t2.c:11:14: note:   node 0x5db76f0 (max_nunits=8, refcnt=2) vector(8) char
t2.c:11:14: note:   op template: *a_7(D) = _1;
t2.c:11:14: note:       stmt 0 *a_7(D) = _1;
t2.c:11:14: note:       stmt 1 MEM[(char *)a_7(D) + 1B] = _2;
t2.c:11:14: note:       stmt 2 MEM[(char *)a_7(D) + 2B] = _3;
t2.c:11:14: note:       stmt 3 MEM[(char *)a_7(D) + 3B] = _4;
t2.c:11:14: note:       stmt 4 MEM[(char *)a_7(D) + 4B] = _1;
t2.c:11:14: note:       stmt 5 MEM[(char *)a_7(D) + 5B] = _2;
t2.c:11:14: note:       stmt 6 MEM[(char *)a_7(D) + 6B] = _3;
t2.c:11:14: note:       stmt 7 MEM[(char *)a_7(D) + 7B] = _4;
t2.c:11:14: note:       children 0x5db7778
t2.c:11:14: note:   node 0x5db7778 (max_nunits=8, refcnt=2) vector(8) char
t2.c:11:14: note:   op template: _1 = *b_6(D);
t2.c:11:14: note:       stmt 0 _1 = *b_6(D);
t2.c:11:14: note:       stmt 1 _2 = MEM[(char *)b_6(D) + 1B];
t2.c:11:14: note:       stmt 2 _3 = MEM[(char *)b_6(D) + 2B];
t2.c:11:14: note:       stmt 3 _4 = MEM[(char *)b_6(D) + 3B];
t2.c:11:14: note:       stmt 4 _1 = *b_6(D);
t2.c:11:14: note:       stmt 5 _2 = MEM[(char *)b_6(D) + 1B];
t2.c:11:14: note:       stmt 6 _3 = MEM[(char *)b_6(D) + 2B];
t2.c:11:14: note:       stmt 7 _4 = MEM[(char *)b_6(D) + 3B];
t2.c:11:14: note:       load permutation { 0 1 2 3 0 1 2 3 }

the issue is as so often

t2.c:11:14: note:   ==> examining statement: _1 = *b_6(D);
t2.c:11:14: missed:   BB vectorization with gaps at the end of a load is not
supported
t2.c:3:19: missed:   not vectorized: relevant stmt not supported: _1 = *b_6(D);
t2.c:11:14: note:   Building vector operands of 0x5db7778 from scalars instead

where we are not applying much non-ad-hoc work to deal with those
"out-of-bound" accesses.  The choice here would be obvious in doing
a single vector(4) load instead.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/113678] SLP misses up vec_concat
  2024-01-31  2:33 [Bug tree-optimization/113678] New: SLP misses up vec_concat pinskia at gcc dot gnu.org
  2024-01-31  8:32 ` [Bug tree-optimization/113678] " rguenth at gcc dot gnu.org
@ 2024-02-06 17:14 ` pinskia at gcc dot gnu.org
  2024-02-06 17:46 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-06 17:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113678

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Noticed the same with:
```
void f(unsigned char *a, unsigned char *b, unsigned char *c)
{
  unsigned char t[8];
  t[0] = a[0];
  t[1] = a[1];
  t[2] = a[2];
  t[3] = a[3];
  t[4] = b[0];
  t[5] = b[1];
  t[6] = b[2];
  t[7] = b[3];
  c[0] = t[0];
  c[1] = t[1];
  c[2] = t[2];
  c[3] = t[3];
  c[4] = t[4];
  c[5] = t[5];
  c[6] = t[6];
  c[7] = t[7];
}

```

Adding `-fno-tree-vectorize` gives the best code even.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug tree-optimization/113678] SLP misses up vec_concat
  2024-01-31  2:33 [Bug tree-optimization/113678] New: SLP misses up vec_concat pinskia at gcc dot gnu.org
  2024-01-31  8:32 ` [Bug tree-optimization/113678] " rguenth at gcc dot gnu.org
  2024-02-06 17:14 ` pinskia at gcc dot gnu.org
@ 2024-02-06 17:46 ` pinskia at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-02-06 17:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113678

--- Comment #3 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note the SLP that happens in connection with the loop vectorizer actually does
a decent job ...

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-02-06 17:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-31  2:33 [Bug tree-optimization/113678] New: SLP misses up vec_concat pinskia at gcc dot gnu.org
2024-01-31  8:32 ` [Bug tree-optimization/113678] " rguenth at gcc dot gnu.org
2024-02-06 17:14 ` pinskia at gcc dot gnu.org
2024-02-06 17:46 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).