public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "peter at cordes dot ca" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP Date: Mon, 25 Oct 2021 21:44:09 +0000 [thread overview] Message-ID: <bug-102494-4-hdA7DQ6Jy2@http.gcc.gnu.org/bugzilla/> (raw) In-Reply-To: <bug-102494-4@http.gcc.gnu.org/bugzilla/> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494 Peter Cordes <peter at cordes dot ca> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |peter at cordes dot ca --- Comment #10 from Peter Cordes <peter at cordes dot ca> --- Current trunk with -fopenmp is still not good https://godbolt.org/z/b3jjhcvTa Still doing two separate sign extensions and two stores / wider reload (store forwarding stall): -O3 -march=skylake -fopenmp simde_vaddlv_s8: push rbp vpmovsxbw xmm2, xmm0 vpsrlq xmm0, xmm0, 32 mov rbp, rsp vpmovsxbw xmm3, xmm0 and rsp, -32 vmovq QWORD PTR [rsp-16], xmm2 vmovq QWORD PTR [rsp-8], xmm3 vmovdqa xmm4, XMMWORD PTR [rsp-16] ... then asm using byte-shifts Including stuff like movdqa xmm1, xmm0 psrldq xmm1, 4 instead of pshufd, which is an option because high garbage can be ignored. And ARM64 goes scalar. ---- Current trunk *without* -fopenmp produces decent asm https://godbolt.org/z/h1KEKPTW9 For ARM64 we've been making good asm since GCC 10.x (vs. scalar in 9.3) simde_vaddlv_s8: sxtl v0.8h, v0.8b addv h0, v0.8h umov w0, v0.h[0] ret x86-64 gcc -O3 -march=skylake simde_vaddlv_s8: vpmovsxbw xmm1, xmm0 vpsrlq xmm0, xmm0, 32 vpmovsxbw xmm0, xmm0 vpaddw xmm0, xmm1, xmm0 vpsrlq xmm1, xmm0, 32 vpaddw xmm0, xmm0, xmm1 vpsrlq xmm1, xmm0, 16 vpaddw xmm0, xmm0, xmm1 vpextrw eax, xmm0, 0 ret That's pretty good, but VMOVD eax, xmm0 would be more efficient than VPEXTRW when we don't need to avoid high garbage (because it's a return value in this case). VPEXTRW zero-extends into RAX, so it's not directly helpful if we need to sign-extend to 32 or 64-bit for some reason; we'd still need a scalar movsx. Or with BMI2, go scalar before the last shift / VPADDW step, e.g. ... vmovd eax, xmm0 rorx edx, eax, 16 add eax, edx
next prev parent reply other threads:[~2021-10-25 21:44 UTC|newest] Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-09-27 0:52 [Bug tree-optimization/102494] New: Failure to optimize out " gabravier at gmail dot com 2021-09-27 1:45 ` [Bug tree-optimization/102494] " pinskia at gcc dot gnu.org 2021-09-27 3:01 ` crazylht at gmail dot com 2021-09-27 5:08 ` [Bug tree-optimization/102494] Failure to optimize " crazylht at gmail dot com 2021-09-27 5:13 ` crazylht at gmail dot com 2021-09-27 5:55 ` crazylht at gmail dot com 2021-09-27 8:47 ` rguenth at gcc dot gnu.org 2021-09-28 6:57 ` crazylht at gmail dot com 2021-09-28 7:09 ` rguenther at suse dot de 2021-10-08 2:10 ` cvs-commit at gcc dot gnu.org 2021-10-25 21:44 ` peter at cordes dot ca [this message] 2021-10-25 22:00 ` peter at cordes dot ca 2021-10-26 8:13 ` crazylht at gmail dot com
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-102494-4-hdA7DQ6Jy2@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).