public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "peter at cordes dot ca" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/102494] Failure to optimize vector reduction properly especially when using OpenMP
Date: Mon, 25 Oct 2021 22:00:55 +0000	[thread overview]
Message-ID: <bug-102494-4-owy5PtJjUn@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-102494-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102494

--- Comment #11 from Peter Cordes <peter at cordes dot ca> ---
Also, horizontal byte sums are generally best done with  VPSADBW against a zero
vector, even if that means some fiddling to flip to unsigned first and then
undo the bias.

simde_vaddlv_s8:
 vpxor    xmm0, xmm0, .LC0[rip]  # set1_epi8(0x80) flip to unsigned 0..255
range
 vpxor    xmm1, xmm1
 vpsadbw  xmm0, xmm0, xmm1       # horizontal byte sum within each 64-bit half
 vmovd    eax, xmm0              # we only wanted the low half anyway
 sub      eax, 8 * 128      # subtract the bias we added earlier by flipping
sign bits
 ret

This is so much shorter we'd still be ahead if we generated the vector constant
on the fly instead of loading it.  (3 instructions: vpcmpeqd same,same / vpabsb
/ vpslld by 7.  Or pcmpeqd / psllw 8 / packsswb same,same to saturate to -128)

If we had wanted a 128-bit (16 byte) vector sum, we'd need

  ...
  vpsadbw ...

  vpshufd  xmm1, xmm0, 0xfe     # shuffle upper 64 bits to the bottom
  vpaddd   xmm0, xmm0, xmm1
  vmovd    eax, xmm0
  sub      eax, 16 * 128

Works efficiently with only SSE2.  Actually with AVX2, we should unpack the top
half with VUNPCKHQDQ to save a byte (no immediate operand), since we don't need
PSHUFD copy-and-shuffle.

Or movd / pextrw / scalar add but that's more uops: pextrw is 2 on its own.

  parent reply	other threads:[~2021-10-25 22:00 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-27  0:52 [Bug tree-optimization/102494] New: Failure to optimize out " gabravier at gmail dot com
2021-09-27  1:45 ` [Bug tree-optimization/102494] " pinskia at gcc dot gnu.org
2021-09-27  3:01 ` crazylht at gmail dot com
2021-09-27  5:08 ` [Bug tree-optimization/102494] Failure to optimize " crazylht at gmail dot com
2021-09-27  5:13 ` crazylht at gmail dot com
2021-09-27  5:55 ` crazylht at gmail dot com
2021-09-27  8:47 ` rguenth at gcc dot gnu.org
2021-09-28  6:57 ` crazylht at gmail dot com
2021-09-28  7:09 ` rguenther at suse dot de
2021-10-08  2:10 ` cvs-commit at gcc dot gnu.org
2021-10-25 21:44 ` peter at cordes dot ca
2021-10-25 22:00 ` peter at cordes dot ca [this message]
2021-10-26  8:13 ` crazylht at gmail dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-102494-4-owy5PtJjUn@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).