public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/102055] New: full 128byte swap using __builtin_shuffle should produce rev64 followed by ext
@ 2021-08-25  7:16 pinskia at gcc dot gnu.org
  2021-08-25  7:20 ` [Bug target/102055] " pinskia at gcc dot gnu.org
  2024-01-26  0:08 ` pinskia at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-25  7:16 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102055

            Bug ID: 102055
           Summary: full 128byte swap using __builtin_shuffle should
                    produce rev64 followed by ext
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64-*-*

Take:
#define vector __attribute__((vector_size(16)))

vector char g(vector char a)
{
    return __builtin_shuffle(a,(vector
char){15,14,13,12,11,10,9,8,7,6,5,4,3,2,1, 0});
}

vector char g1(vector char a)
{
    vector char t= __builtin_shuffle(a,(vector
char){7,6,5,4,3,2,1,0,15,14,13,12,11,10,9,8,});
    vector long long t1 = (vector long long)t;
    t1 = __builtin_shuffle(t1, (vector long long){1,0});
    return (vector char)t1;
}

The first case uses ldr/tlb but really it can be done in two steps as rev64
followed by ext.
        rev64   v0.16b, v0.16b
        ext     v0.16b, v0.16b, v0.16b, #8

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/102055] full 128byte swap using __builtin_shuffle should produce rev64 followed by ext
  2021-08-25  7:16 [Bug target/102055] New: full 128byte swap using __builtin_shuffle should produce rev64 followed by ext pinskia at gcc dot gnu.org
@ 2021-08-25  7:20 ` pinskia at gcc dot gnu.org
  2024-01-26  0:08 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-08-25  7:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102055

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=102056

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Note if PR 102056 is implemented this will both become the same as g.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/102055] full 128byte swap using __builtin_shuffle should produce rev64 followed by ext
  2021-08-25  7:16 [Bug target/102055] New: full 128byte swap using __builtin_shuffle should produce rev64 followed by ext pinskia at gcc dot gnu.org
  2021-08-25  7:20 ` [Bug target/102055] " pinskia at gcc dot gnu.org
@ 2024-01-26  0:08 ` pinskia at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: pinskia at gcc dot gnu.org @ 2024-01-26  0:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102055

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The use of ldr/tbl vs rev64/ext is questionable and depend on if we are inside
a loop or not. In the case of it being inside the loop and there are enough
registers, then using TBL is better on many (not all though) micro-arches as it
is similar latency as rev64. 

Though I should note that clang/LLVM implements it as rev64/ext.

E.g.:
```

#define vector __attribute__((vector_size(16)))

vector char g(vector char a)
{
    return __builtin_shufflevector (a,a,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,
0);
}

vector char g1(vector char a)
{
    vector char t= __builtin_shufflevector
(a,a,7,6,5,4,3,2,1,0,15,14,13,12,11,10,9,8);
    vector long long t1 = (vector long long)t;
    t1 = __builtin_shufflevector(t1,t1, 1,0);
    return (vector char)t1;
}
```

Produces:
```
        rev64   v0.16b, v0.16b
        ext     v0.16b, v0.16b, v0.16b, #8
```

For both.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-01-26  0:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-25  7:16 [Bug target/102055] New: full 128byte swap using __builtin_shuffle should produce rev64 followed by ext pinskia at gcc dot gnu.org
2021-08-25  7:20 ` [Bug target/102055] " pinskia at gcc dot gnu.org
2024-01-26  0:08 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).