public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "john_platts at hotmail dot com" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug target/105354] New: __builtin_shuffle for alignr generates suboptimal code unless SSSE3 is enabled Date: Fri, 22 Apr 2022 21:46:02 +0000 [thread overview] Message-ID: <bug-105354-4@http.gcc.gnu.org/bugzilla/> (raw) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105354 Bug ID: 105354 Summary: __builtin_shuffle for alignr generates suboptimal code unless SSSE3 is enabled Product: gcc Version: 11.2.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: john_platts at hotmail dot com Target Milestone: --- The below code generates suboptimal code if SSE2 is enabled but SSSE3 is not enabled: #include <cstdint> typedef std::uint8_t Simd128U8VectT __attribute__((__vector_size__(16))); template<int RotateAmt> static inline Simd128U8VectT RotateRightByByteAmt(Simd128U8VectT vect) noexcept { constexpr int NormalizedRotateAmt = RotateAmt & 15; if constexpr(NormalizedRotateAmt == 0) return vect; else return __builtin_shuffle(vect, vect, (Simd128U8VectT){ NormalizedRotateAmt, NormalizedRotateAmt + 1, NormalizedRotateAmt + 2, NormalizedRotateAmt + 3, NormalizedRotateAmt + 4, NormalizedRotateAmt + 5, NormalizedRotateAmt + 6, NormalizedRotateAmt + 7, NormalizedRotateAmt + 8, NormalizedRotateAmt + 9, NormalizedRotateAmt + 10, NormalizedRotateAmt + 11, NormalizedRotateAmt + 12, NormalizedRotateAmt + 13, NormalizedRotateAmt + 14, NormalizedRotateAmt + 15 }); } auto func1(Simd128U8VectT vect) noexcept { return RotateRightByByteAmt<5>(vect); } Here is the code that is generated on GCC 11 if the -O2 -mssse3 options are specified: func1(unsigned char __vector(16)): palignr xmm0, xmm0, 5 ret Here is the code that is generated on GCC 11 if the -O2 option is specified but the -mssse3 option is not specified on 64-bit x86 platforms: func1(unsigned char __vector(16)): sub rsp, 144 movd ecx, xmm0 movaps XMMWORD PTR [rsp+8], xmm0 movzx edx, BYTE PTR [rsp+20] movzx ecx, cl movaps XMMWORD PTR [rsp+24], xmm0 movzx eax, BYTE PTR [rsp+35] sal rdx, 8 movaps XMMWORD PTR [rsp+40], xmm0 or rdx, rax movzx eax, BYTE PTR [rsp+50] movaps XMMWORD PTR [rsp+56], xmm0 sal rdx, 8 movaps XMMWORD PTR [rsp+72], xmm0 or rdx, rax movzx eax, BYTE PTR [rsp+65] movaps XMMWORD PTR [rsp+88], xmm0 sal rdx, 8 movaps XMMWORD PTR [rsp+104], xmm0 or rdx, rax movzx eax, BYTE PTR [rsp+80] movaps XMMWORD PTR [rsp-104], xmm0 sal rdx, 8 movaps XMMWORD PTR [rsp-88], xmm0 movzx edi, BYTE PTR [rsp-85] or rdx, rax movzx eax, BYTE PTR [rsp+95] movaps XMMWORD PTR [rsp-72], xmm0 sal rdx, 8 movaps XMMWORD PTR [rsp-56], xmm0 or rdx, rax movzx eax, BYTE PTR [rsp+110] movaps XMMWORD PTR [rsp-40], xmm0 sal rdx, 8 movaps XMMWORD PTR [rsp-24], xmm0 or rdx, rax movzx eax, BYTE PTR [rsp-100] movaps XMMWORD PTR [rsp+120], xmm0 movzx esi, BYTE PTR [rsp+125] movaps XMMWORD PTR [rsp-8], xmm0 sal rdx, 8 sal rax, 8 or rdx, rsi or rax, rdi movzx edi, BYTE PTR [rsp-70] sal rax, 8 or rax, rdi movzx edi, BYTE PTR [rsp-55] sal rax, 8 or rax, rdi sal rax, 8 or rax, rcx movzx ecx, BYTE PTR [rsp-25] sal rax, 8 or rax, rcx movzx ecx, BYTE PTR [rsp-10] sal rax, 8 or rax, rcx movzx ecx, BYTE PTR [rsp+5] mov QWORD PTR [rsp-120], rdx sal rax, 8 or rax, rcx mov QWORD PTR [rsp-112], rax movdqa xmm0, XMMWORD PTR [rsp-120] add rsp, 144 ret Here is a more optimal implementation of the above code on 64-bit x86 platforms when SSE2 is enabled but SSSE3 is not enabled: func1(unsigned char __vector(16)): movdqa xmm1, xmm0 psrldq xmm1, 5 pslldq xmm0, 11 por xmm0, xmm1 ret
next reply other threads:[~2022-04-22 21:46 UTC|newest] Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-04-22 21:46 john_platts at hotmail dot com [this message] 2022-04-22 21:52 ` [Bug target/105354] " pinskia at gcc dot gnu.org 2022-04-24 3:38 ` crazylht at gmail dot com 2022-04-24 3:57 ` crazylht at gmail dot com 2022-04-28 1:34 ` crazylht at gmail dot com 2022-05-09 13:23 ` cvs-commit at gcc dot gnu.org 2022-05-09 13:27 ` crazylht at gmail dot com 2023-05-11 13:21 ` chfast at gmail dot com
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-105354-4@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).