public inbox for gcc-bugs@sourceware.org help / color / mirror / Atom feed
From: "peter at cordes dot ca" <gcc-bugzilla@gcc.gnu.org> To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP Date: Mon, 25 Oct 2021 22:00:55 +0000 [thread overview] Message-ID: <bug-102494-4-owy5PtJjUn@http.gcc.gnu.org/bugzilla/> (raw) In-Reply-To: <bug-102494-4@http.gcc.gnu.org/bugzilla/> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494 --- Comment #11 from Peter Cordes <peter at cordes dot ca> --- Also, horizontal byte sums are generally best done with VPSADBW against a zero vector, even if that means some fiddling to flip to unsigned first and then undo the bias. simde_vaddlv_s8: vpxor xmm0, xmm0, .LC0[rip] # set1_epi8(0x80) flip to unsigned 0..255 range vpxor xmm1, xmm1 vpsadbw xmm0, xmm0, xmm1 # horizontal byte sum within each 64-bit half vmovd eax, xmm0 # we only wanted the low half anyway sub eax, 8 * 128 # subtract the bias we added earlier by flipping sign bits ret This is so much shorter we'd still be ahead if we generated the vector constant on the fly instead of loading it. (3 instructions: vpcmpeqd same,same / vpabsb / vpslld by 7. Or pcmpeqd / psllw 8 / packsswb same,same to saturate to -128) If we had wanted a 128-bit (16 byte) vector sum, we'd need ... vpsadbw ... vpshufd xmm1, xmm0, 0xfe # shuffle upper 64 bits to the bottom vpaddd xmm0, xmm0, xmm1 vmovd eax, xmm0 sub eax, 16 * 128 Works efficiently with only SSE2. Actually with AVX2, we should unpack the top half with VUNPCKHQDQ to save a byte (no immediate operand), since we don't need PSHUFD copy-and-shuffle. Or movd / pextrw / scalar add but that's more uops: pextrw is 2 on its own.
next prev parent reply other threads:[~2021-10-25 22:00 UTC|newest] Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-09-27 0:52 [Bug tree-optimization/102494] New: Failure to optimize out " gabravier at gmail dot com 2021-09-27 1:45 ` [Bug tree-optimization/102494] " pinskia at gcc dot gnu.org 2021-09-27 3:01 ` crazylht at gmail dot com 2021-09-27 5:08 ` [Bug tree-optimization/102494] Failure to optimize " crazylht at gmail dot com 2021-09-27 5:13 ` crazylht at gmail dot com 2021-09-27 5:55 ` crazylht at gmail dot com 2021-09-27 8:47 ` rguenth at gcc dot gnu.org 2021-09-28 6:57 ` crazylht at gmail dot com 2021-09-28 7:09 ` rguenther at suse dot de 2021-10-08 2:10 ` cvs-commit at gcc dot gnu.org 2021-10-25 21:44 ` peter at cordes dot ca 2021-10-25 22:00 ` peter at cordes dot ca [this message] 2021-10-26 8:13 ` crazylht at gmail dot com
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=bug-102494-4-owy5PtJjUn@http.gcc.gnu.org/bugzilla/ \ --to=gcc-bugzilla@gcc.gnu.org \ --cc=gcc-bugs@gcc.gnu.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).