public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug rtl-optimization/107563] New: __builtin_shufflevector fails to use pshufb and pshufd instructions under default x86_64 compilation toggle which is the sse2 one
@ 2022-11-07 22:56 unlvsur at live dot com
2022-11-07 23:00 ` [Bug target/107563] " unlvsur at live dot com
` (11 more replies)
0 siblings, 12 replies; 13+ messages in thread
From: unlvsur at live dot com @ 2022-11-07 22:56 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
Bug ID: 107563
Summary: __builtin_shufflevector fails to use pshufb and pshufd
instructions under default x86_64 compilation toggle
which is the sse2 one
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: unlvsur at live dot com
Target Milestone: ---
#if defined(__SSE2__)
using temp_vec_type [[__gnu__::__vector_size__ (16)]] = char;
void foo(temp_vec_type& v) noexcept
{
v=__builtin_shufflevector(v,v,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0);
}
#endif
g++ -S pq.cc -Ofast
proves sse2 is enabled by default, but it does not call
https://www.felixcloutier.com/x86/pshufb
neither
https://www.felixcloutier.com/x86/pshufd
while g++ -S pq.cc -Ofast -msse4.2 will generate them correctly. Which is buggy
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/107563] __builtin_shufflevector fails to use pshufb and pshufd instructions under default x86_64 compilation toggle which is the sse2 one
2022-11-07 22:56 [Bug rtl-optimization/107563] New: __builtin_shufflevector fails to use pshufb and pshufd instructions under default x86_64 compilation toggle which is the sse2 one unlvsur at live dot com
@ 2022-11-07 23:00 ` unlvsur at live dot com
2022-11-07 23:05 ` [Bug target/107563] __builtin_shufflevector fails to " unlvsur at live dot com
` (10 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: unlvsur at live dot com @ 2022-11-07 23:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
--- Comment #1 from cqwrteur <unlvsur at live dot com> ---
see
https://godbolt.org/z/1aM57z7jn
vs
https://godbolt.org/z/b356qzrMY
While clang does the right thing here
https://godbolt.org/z/hnfrnb694
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/107563] __builtin_shufflevector fails to pshufd instructions under default x86_64 compilation toggle which is the sse2 one
2022-11-07 22:56 [Bug rtl-optimization/107563] New: __builtin_shufflevector fails to use pshufb and pshufd instructions under default x86_64 compilation toggle which is the sse2 one unlvsur at live dot com
2022-11-07 23:00 ` [Bug target/107563] " unlvsur at live dot com
@ 2022-11-07 23:05 ` unlvsur at live dot com
2022-11-08 0:11 ` unlvsur at live dot com
` (9 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: unlvsur at live dot com @ 2022-11-07 23:05 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
--- Comment #2 from cqwrteur <unlvsur at live dot com> ---
(In reply to cqwrteur from comment #0)
> #if defined(__SSE2__)
>
> using temp_vec_type [[__gnu__::__vector_size__ (16)]] = char;
> void foo(temp_vec_type& v) noexcept
> {
> v=__builtin_shufflevector(v,v,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0);
> }
>
> #endif
>
> g++ -S pq.cc -Ofast
> proves sse2 is enabled by default, but it does not call
> https://www.felixcloutier.com/x86/pshufb
> neither
> https://www.felixcloutier.com/x86/pshufd
>
> while g++ -S pq.cc -Ofast -msse4.2 will generate them correctly. Which is
> buggy
pshufb is sse3 sorry. but pshufd is sse2. It can be used for generating the
right instruction.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/107563] __builtin_shufflevector fails to pshufd instructions under default x86_64 compilation toggle which is the sse2 one
2022-11-07 22:56 [Bug rtl-optimization/107563] New: __builtin_shufflevector fails to use pshufb and pshufd instructions under default x86_64 compilation toggle which is the sse2 one unlvsur at live dot com
2022-11-07 23:00 ` [Bug target/107563] " unlvsur at live dot com
2022-11-07 23:05 ` [Bug target/107563] __builtin_shufflevector fails to " unlvsur at live dot com
@ 2022-11-08 0:11 ` unlvsur at live dot com
2022-11-08 0:11 ` unlvsur at live dot com
` (8 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: unlvsur at live dot com @ 2022-11-08 0:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
--- Comment #3 from cqwrteur <unlvsur at live dot com> ---
(In reply to cqwrteur from comment #2)
> (In reply to cqwrteur from comment #0)
> > #if defined(__SSE2__)
> >
> > using temp_vec_type [[__gnu__::__vector_size__ (16)]] = char;
> > void foo(temp_vec_type& v) noexcept
> > {
> > v=__builtin_shufflevector(v,v,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0);
> > }
> >
> > #endif
> >
> > g++ -S pq.cc -Ofast
> > proves sse2 is enabled by default, but it does not call
> > https://www.felixcloutier.com/x86/pshufb
> > neither
> > https://www.felixcloutier.com/x86/pshufd
> >
> > while g++ -S pq.cc -Ofast -msse4.2 will generate them correctly. Which is
> > buggy
>
> pshufb is sse3 sorry. but pshufd is sse2. It can be used for generating the
> right instruction.
https://godbolt.org/z/6baWWoE4e
BTW. -msse3 does not use pshufb either. i do not know why
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/107563] __builtin_shufflevector fails to pshufd instructions under default x86_64 compilation toggle which is the sse2 one
2022-11-07 22:56 [Bug rtl-optimization/107563] New: __builtin_shufflevector fails to use pshufb and pshufd instructions under default x86_64 compilation toggle which is the sse2 one unlvsur at live dot com
` (2 preceding siblings ...)
2022-11-08 0:11 ` unlvsur at live dot com
@ 2022-11-08 0:11 ` unlvsur at live dot com
2022-11-08 3:23 ` crazylht at gmail dot com
` (7 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: unlvsur at live dot com @ 2022-11-08 0:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
--- Comment #4 from cqwrteur <unlvsur at live dot com> ---
(In reply to cqwrteur from comment #2)
> (In reply to cqwrteur from comment #0)
> > #if defined(__SSE2__)
> >
> > using temp_vec_type [[__gnu__::__vector_size__ (16)]] = char;
> > void foo(temp_vec_type& v) noexcept
> > {
> > v=__builtin_shufflevector(v,v,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0);
> > }
> >
> > #endif
> >
> > g++ -S pq.cc -Ofast
> > proves sse2 is enabled by default, but it does not call
> > https://www.felixcloutier.com/x86/pshufb
> > neither
> > https://www.felixcloutier.com/x86/pshufd
> >
> > while g++ -S pq.cc -Ofast -msse4.2 will generate them correctly. Which is
> > buggy
>
> pshufb is sse3 sorry. but pshufd is sse2. It can be used for generating the
> right instruction.
https://godbolt.org/z/6baWWoE4e
BTW. -msse3 does not use pshufb either. i do not know why
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/107563] __builtin_shufflevector fails to pshufd instructions under default x86_64 compilation toggle which is the sse2 one
2022-11-07 22:56 [Bug rtl-optimization/107563] New: __builtin_shufflevector fails to use pshufb and pshufd instructions under default x86_64 compilation toggle which is the sse2 one unlvsur at live dot com
` (3 preceding siblings ...)
2022-11-08 0:11 ` unlvsur at live dot com
@ 2022-11-08 3:23 ` crazylht at gmail dot com
2022-11-08 3:33 ` crazylht at gmail dot com
` (6 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2022-11-08 3:23 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
Hongtao.liu <crazylht at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |crazylht at gmail dot com
--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
>
> https://godbolt.org/z/6baWWoE4e
> BTW. -msse3 does not use pshufb either. i do not know why
It should be -mssse3.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/107563] __builtin_shufflevector fails to pshufd instructions under default x86_64 compilation toggle which is the sse2 one
2022-11-07 22:56 [Bug rtl-optimization/107563] New: __builtin_shufflevector fails to use pshufb and pshufd instructions under default x86_64 compilation toggle which is the sse2 one unlvsur at live dot com
` (4 preceding siblings ...)
2022-11-08 3:23 ` crazylht at gmail dot com
@ 2022-11-08 3:33 ` crazylht at gmail dot com
2022-11-08 6:11 ` unlvsur at live dot com
` (5 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2022-11-08 3:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---
Shufd only handles
void foo1(temp_vec_type& v) noexcept
{
v=__builtin_shufflevector(v,v,12,13,14,15,8,9,10,11,4,5,6,7,0,1,2,3);
}
Not the case in #c0.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/107563] __builtin_shufflevector fails to pshufd instructions under default x86_64 compilation toggle which is the sse2 one
2022-11-07 22:56 [Bug rtl-optimization/107563] New: __builtin_shufflevector fails to use pshufb and pshufd instructions under default x86_64 compilation toggle which is the sse2 one unlvsur at live dot com
` (5 preceding siblings ...)
2022-11-08 3:33 ` crazylht at gmail dot com
@ 2022-11-08 6:11 ` unlvsur at live dot com
2022-11-08 6:11 ` unlvsur at live dot com
` (4 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: unlvsur at live dot com @ 2022-11-08 6:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
--- Comment #7 from cqwrteur <unlvsur at live dot com> ---
(In reply to Hongtao.liu from comment #6)
> Shufd only handles
>
> void foo1(temp_vec_type& v) noexcept
> {
> v=__builtin_shufflevector(v,v,12,13,14,15,8,9,10,11,4,5,6,7,0,1,2,3);
> }
>
> Not the case in #c0.
I am using it for byte swap
actually, clang has a solution
using x86_64_v4si [[__gnu__::__vector_size__ (16)]] =
int;
using x86_64_v16qi [[__gnu__::__vector_size__ (16)]] =
char;
using x86_64_v8hi [[__gnu__::__vector_size__ (16)]] =
short;
constexpr x86_64_v16qi zero{};
if constexpr(sizeof(T)==8)
{
auto
res0{__builtin_ia32_punpcklbw128(temp_vec,zero)};
auto
res1{__builtin_ia32_pshufd((x86_64_v4si)res0,78)};
auto
res2{__builtin_ia32_pshuflw((x86_64_v8hi)res1,27)};
auto res3{__builtin_ia32_pshufhw(res2,27)};
auto
res4{__builtin_ia32_punpckhbw128(temp_vec,zero)};
auto
res5{__builtin_ia32_pshufd((x86_64_v4si)res4,78)};
auto
res6{__builtin_ia32_pshuflw((x86_64_v8hi)res5,27)};
auto res7{__builtin_ia32_pshufhw(res6,27)};
temp_vec=__builtin_ia32_packuswb128(res3,res7);
}
else if constexpr(sizeof(T)==4)
{
auto
res0{__builtin_ia32_punpcklbw128(temp_vec,zero)};
auto
res2{__builtin_ia32_pshuflw((x86_64_v8hi)res0,27)};
auto res3{__builtin_ia32_pshufhw(res2,27)};
auto
res4{__builtin_ia32_punpckhbw128(temp_vec,zero)};
auto
res6{__builtin_ia32_pshuflw((x86_64_v8hi)res4,27)};
auto res7{__builtin_ia32_pshufhw(res6,27)};
temp_vec=__builtin_ia32_packuswb128(res3,res7);
}
else if constexpr(sizeof(T)==2)
{
using x86_64_v8hu [[__gnu__::__vector_size__
(16)]] = unsigned short;
auto res0{(x86_64_v8hu)temp_vec};
temp_vec=(x86_64_v16qi)((res0>>8)|(res0<<8));
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/107563] __builtin_shufflevector fails to pshufd instructions under default x86_64 compilation toggle which is the sse2 one
2022-11-07 22:56 [Bug rtl-optimization/107563] New: __builtin_shufflevector fails to use pshufb and pshufd instructions under default x86_64 compilation toggle which is the sse2 one unlvsur at live dot com
` (6 preceding siblings ...)
2022-11-08 6:11 ` unlvsur at live dot com
@ 2022-11-08 6:11 ` unlvsur at live dot com
2022-11-08 6:14 ` unlvsur at live dot com
` (3 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: unlvsur at live dot com @ 2022-11-08 6:11 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
--- Comment #8 from cqwrteur <unlvsur at live dot com> ---
for sse2 to do the __builtin_convertvector job yeah
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/107563] __builtin_shufflevector fails to pshufd instructions under default x86_64 compilation toggle which is the sse2 one
2022-11-07 22:56 [Bug rtl-optimization/107563] New: __builtin_shufflevector fails to use pshufb and pshufd instructions under default x86_64 compilation toggle which is the sse2 one unlvsur at live dot com
` (7 preceding siblings ...)
2022-11-08 6:11 ` unlvsur at live dot com
@ 2022-11-08 6:14 ` unlvsur at live dot com
2022-11-08 8:28 ` crazylht at gmail dot com
` (2 subsequent siblings)
11 siblings, 0 replies; 13+ messages in thread
From: unlvsur at live dot com @ 2022-11-08 6:14 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
--- Comment #9 from cqwrteur <unlvsur at live dot com> ---
(In reply to cqwrteur from comment #8)
> for sse2 to do the __builtin_convertvector job yeah
https://godbolt.org/z/dsf3WK58E
using temp_vec_type [[__gnu__::__vector_size__ (16)]] = char;
void foo4(temp_vec_type& v) noexcept
{
v=__builtin_shufflevector(v,v,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14);
}
This is even more interesting.
foo4(char __vector(16)&): # @foo4(char __vector(16)&)
movdqa (%rdi), %xmm0
movdqa %xmm0, %xmm1
psrlw $8, %xmm1
psllw $8, %xmm0
por %xmm1, %xmm0
movdqa %xmm0, (%rdi)
retq
clang generates this. by using ror and or
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/107563] __builtin_shufflevector fails to pshufd instructions under default x86_64 compilation toggle which is the sse2 one
2022-11-07 22:56 [Bug rtl-optimization/107563] New: __builtin_shufflevector fails to use pshufb and pshufd instructions under default x86_64 compilation toggle which is the sse2 one unlvsur at live dot com
` (8 preceding siblings ...)
2022-11-08 6:14 ` unlvsur at live dot com
@ 2022-11-08 8:28 ` crazylht at gmail dot com
2024-05-15 4:47 ` cvs-commit at gcc dot gnu.org
2024-05-18 15:17 ` admin at levyhsu dot com
11 siblings, 0 replies; 13+ messages in thread
From: crazylht at gmail dot com @ 2022-11-08 8:28 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
--- Comment #10 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to cqwrteur from comment #9)
> (In reply to cqwrteur from comment #8)
> > for sse2 to do the __builtin_convertvector job yeah
>
> https://godbolt.org/z/dsf3WK58E
>
> using temp_vec_type [[__gnu__::__vector_size__ (16)]] = char;
> void foo4(temp_vec_type& v) noexcept
> {
> v=__builtin_shufflevector(v,v,1,0,3,2,5,4,7,6,9,8,11,10,13,12,15,14);
> }
>
> This is even more interesting.
>
> foo4(char __vector(16)&): # @foo4(char __vector(16)&)
> movdqa (%rdi), %xmm0
> movdqa %xmm0, %xmm1
> psrlw $8, %xmm1
> psllw $8, %xmm0
> por %xmm1, %xmm0
> movdqa %xmm0, (%rdi)
> retq
>
> clang generates this. by using ror and or
This is interesting case, similar for psrld/psrlq + pslld/psllq + or.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/107563] __builtin_shufflevector fails to pshufd instructions under default x86_64 compilation toggle which is the sse2 one
2022-11-07 22:56 [Bug rtl-optimization/107563] New: __builtin_shufflevector fails to use pshufb and pshufd instructions under default x86_64 compilation toggle which is the sse2 one unlvsur at live dot com
` (9 preceding siblings ...)
2022-11-08 8:28 ` crazylht at gmail dot com
@ 2024-05-15 4:47 ` cvs-commit at gcc dot gnu.org
2024-05-18 15:17 ` admin at levyhsu dot com
11 siblings, 0 replies; 13+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-15 4:47 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
--- Comment #11 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:a71f90c5a7ae2942083921033cb23dcd63e70525
commit r15-499-ga71f90c5a7ae2942083921033cb23dcd63e70525
Author: Levy Hsu <admin@levyhsu.com>
Date: Thu May 9 16:50:56 2024 +0800
x86: Add 3-instruction subroutine vector shift for V16QI in
ix86_expand_vec_perm_const_1 [PR107563]
Hi All
We've introduced a new subroutine in ix86_expand_vec_perm_const_1
to optimize vector shifting for the V16QI type on x86.
This patch uses a three-instruction sequence psrlw, psllw, and por
to handle specific vector shuffle operations more efficiently.
The change aims to improve assembly code generation for configurations
supporting SSE2.
Bootstrapped and tested on x86_64-linux-gnu, OK for trunk?
Best
Levy
gcc/ChangeLog:
PR target/107563
* config/i386/i386-expand.cc (expand_vec_perm_psrlw_psllw_por): New
subroutine.
(ix86_expand_vec_perm_const_1): Call
expand_vec_perm_psrlw_psllw_por.
gcc/testsuite/ChangeLog:
PR target/107563
* g++.target/i386/pr107563-a.C: New test.
* g++.target/i386/pr107563-b.C: New test.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Bug target/107563] __builtin_shufflevector fails to pshufd instructions under default x86_64 compilation toggle which is the sse2 one
2022-11-07 22:56 [Bug rtl-optimization/107563] New: __builtin_shufflevector fails to use pshufb and pshufd instructions under default x86_64 compilation toggle which is the sse2 one unlvsur at live dot com
` (10 preceding siblings ...)
2024-05-15 4:47 ` cvs-commit at gcc dot gnu.org
@ 2024-05-18 15:17 ` admin at levyhsu dot com
11 siblings, 0 replies; 13+ messages in thread
From: admin at levyhsu dot com @ 2024-05-18 15:17 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107563
--- Comment #12 from Levy Hsu <admin at levyhsu dot com> ---
switch (d->vmode)
{
case E_V8QImode:
if (!TARGET_MMX_WITH_SSE)
return false;
mode = V4HImode;
gen_shr = gen_ashrv4hi3(should be gen_lshrv4hi3);
gen_shl = gen_ashlv4hi3;
gen_or = gen_iorv4hi3;
break;
case E_V16QImode:
mode = V8HImode;
gen_shr = gen_vlshrv8hi3;
gen_shl = gen_vashlv8hi3;
gen_or = gen_iorv8hi3;
break;
default: return false;
}
Obviously, under V8QImode it should be gen_lshrv4hi3 instead of gen_ashrv4hi3.
I mistakenly used gen_ashrv4hi3 due to the similar naming conventions and
failed to find out. gen_lshrv4hi3 is the correct logical shift needed.
Will send a patch soon
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-05-18 15:17 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-07 22:56 [Bug rtl-optimization/107563] New: __builtin_shufflevector fails to use pshufb and pshufd instructions under default x86_64 compilation toggle which is the sse2 one unlvsur at live dot com
2022-11-07 23:00 ` [Bug target/107563] " unlvsur at live dot com
2022-11-07 23:05 ` [Bug target/107563] __builtin_shufflevector fails to " unlvsur at live dot com
2022-11-08 0:11 ` unlvsur at live dot com
2022-11-08 0:11 ` unlvsur at live dot com
2022-11-08 3:23 ` crazylht at gmail dot com
2022-11-08 3:33 ` crazylht at gmail dot com
2022-11-08 6:11 ` unlvsur at live dot com
2022-11-08 6:11 ` unlvsur at live dot com
2022-11-08 6:14 ` unlvsur at live dot com
2022-11-08 8:28 ` crazylht at gmail dot com
2024-05-15 4:47 ` cvs-commit at gcc dot gnu.org
2024-05-18 15:17 ` admin at levyhsu dot com
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).