public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "jakub at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/107748] [13 Regression] Isn't _mm_cvtsbh_ss incorrect?
Date: Fri, 18 Nov 2022 11:46:40 +0000	[thread overview]
Message-ID: <bug-107748-4-BwAk8vg0CF@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-107748-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107748

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Hongtao.liu from comment #2)
> float
> _mm_cvtsbh_ss (__bf16 __A)
> {
>   union{ float sf; __bf16 bf[2];} __tmp;
>   __tmp.sf = 0.0f;
>   __tmp.bf[1] = __A;
>   return __tmp.sf;
> }
> 
> Looks like gcc can optimize it to
> 
> _mm_cvtsbh_ss(bool _Accum):
>         movd    %xmm0, %eax
>         sall    $16, %eax
>         movd    %eax, %xmm0
>         ret

That is an option too, but please uglify with __ the sf and bf identifiers
above.
Also, not just for this but more importantly for the __bf16 -> float
conversions
gcc emits for -ffast-math or for cstorebf4 or cbranchcc4, it would be nice if
we optimized those so that if the source and destination are in SSE registers
that we don't convert from SSE to GPR, shift and convert back from GPR to SSE,
while we could do it through some permutation of the SSE register that just
pretends it is a V*HImode and moves the first element to second and zeros the
first (and perhaps all elements above second too, or not, whatever is faster).
Dunno if it could be done as a peephole2, or something different.
Just try:
__attribute__((optimize ("fast-math")))
float foo (__bf16 x) { return x; }
int bar (__bf16 x, __bf16 y) { return x == y; }
void baz (void);
void qux (__bf16 x, __bf16 y) { if (x == y) baz (); }
Oh, and one more thing, for -mavx512bf16 -mavx512vl -ffast-math it would be
nice
to use the AVX512BF16 instruction for float -> __bf16 conversions rather than
library routine.  But that instruction doesn't handle sNaNs properly and
flushes subnormals to 0, so I think we shouldn't do it if HONORS_NANS (BFmode)
or
!flag_unsafe_math_optimizations.

  parent reply	other threads:[~2022-11-18 11:46 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-18 10:10 [Bug target/107748] New: " jakub at gcc dot gnu.org
2022-11-18 10:11 ` [Bug target/107748] " jakub at gcc dot gnu.org
2022-11-18 10:14 ` jakub at gcc dot gnu.org
2022-11-18 11:31 ` crazylht at gmail dot com
2022-11-18 11:46 ` jakub at gcc dot gnu.org [this message]
2022-11-18 12:04 ` jakub at gcc dot gnu.org
2022-11-21  9:30 ` cvs-commit at gcc dot gnu.org
2022-11-21  9:33 ` cvs-commit at gcc dot gnu.org
2022-11-21  9:35 ` cvs-commit at gcc dot gnu.org
2022-11-21  9:36 ` jakub at gcc dot gnu.org
2022-11-22  5:16 ` crazylht at gmail dot com
2022-11-28  1:03 ` cvs-commit at gcc dot gnu.org
2022-11-28  1:03 ` crazylht at gmail dot com
2022-11-29 14:41 ` jakub at gcc dot gnu.org
2023-05-03 15:19 ` cvs-commit at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-107748-4-BwAk8vg0CF@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).