public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/101311] New: GCC refuses to use SSE registers to carry out an explicit XOR on a float.
@ 2021-07-03 21:00 the4naves at gmail dot com
2021-07-03 21:08 ` [Bug target/101311] " the4naves at gmail dot com
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: the4naves at gmail dot com @ 2021-07-03 21:00 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101311
Bug ID: 101311
Summary: GCC refuses to use SSE registers to carry out an
explicit XOR on a float.
Product: gcc
Version: 11.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: the4naves at gmail dot com
Target Milestone: ---
// ---------------
int func(float a, float b) {
float tmp = a * b;
*reinterpret_cast<int*>(&tmp) ^= 0x80000000;
return tmp;
}
int main() {
return func(2, 4);
}
// ---------------
Compiling this with `g++ test.cpp -O3 -Wall -Wextra -fno-strict-aliasing
-fwrapv -fno-aggressive-loop-optimizations -fsanitize=undefined` (removing the
various strict flags achieves the same thing) gives no compile warnings and
successfully returns `248` (-8) when run.
Looking at the assembly for `func`, GCC generates:
# ---------------
mulss xmm0, xmm1
mov eax, -2147483648
movd DWORD PTR [rsp-20], xmm0
add eax, DWORD PTR [rsp-20]
moved xmm0, eax
cvttss2si eax, xmm0
ret
# ---------------
I find a couple of things odd with this:
- Memory is used as a temporary buffer. There shouldn't be any latency
between the write and read due to store forwarding, but that cache line is
going to have to be written to memory at some point.
- Necessitating the previous point, GCC uses eax to carry out the XOR,
requiring a move to and from the register.
- GCC seems to favor an `add` instead of `xor`. I've seen it mentioned that
an add should be slightly faster due to consecutive instructions not being
blocked in the pipeline, but I'm don't see why a `xor` would be (don't quote me
on this though).
Replacing the explicit XOR with a negation (`tmp = -tmp`) generates much more
sensible assembly (.LC0 contains the xor constant):
# ---------------
mulss xmm0, xmm1
xorps xmm0, XMMWORD PTR .LC0[rip]
cvttss2si eax, xmm0
ret
# ---------------
To be fair, in my example, negation is easily just the better method, but it
seems silly that GCC goes to such lengths in the first snippet as to not use
`xorps` (which as far as I can tell is just as fast as `add`). It looks like
maybe GCC is confused by the cast to int (and as such doesn't want to use the
xmm regs)?
Exact version is 11.1.0 under x86_64-linux-gnu, but I was able to reproduce
this as far back as 4.9.
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/101311] GCC refuses to use SSE registers to carry out an explicit XOR on a float.
2021-07-03 21:00 [Bug target/101311] New: GCC refuses to use SSE registers to carry out an explicit XOR on a float the4naves at gmail dot com
@ 2021-07-03 21:08 ` the4naves at gmail dot com
2021-07-03 21:27 ` pinskia at gcc dot gnu.org
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: the4naves at gmail dot com @ 2021-07-03 21:08 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101311
--- Comment #1 from Josh Nave <the4naves at gmail dot com> ---
Additionally, other instructions could have been used (such as `pxor`) which
are less float-centric (and maybe faster?).
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/101311] GCC refuses to use SSE registers to carry out an explicit XOR on a float.
2021-07-03 21:00 [Bug target/101311] New: GCC refuses to use SSE registers to carry out an explicit XOR on a float the4naves at gmail dot com
2021-07-03 21:08 ` [Bug target/101311] " the4naves at gmail dot com
@ 2021-07-03 21:27 ` pinskia at gcc dot gnu.org
2021-07-05 6:33 ` rguenth at gcc dot gnu.org
2021-11-28 6:49 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-03 21:27 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101311
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
Severity|normal |enhancement
Last reconfirmed| |2021-07-03
Ever confirmed|0 |1
--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So for aarch64 it is xor:
(insn 7 4 8 2 (set (reg:SF 101)
(mult:SF (reg/v:SF 98 [ a ])
(reg/v:SF 99 [ b ]))) "t66.c":2:19 966 {mulsf3}
(expr_list:REG_DEAD (reg/v:SF 99 [ b ])
(expr_list:REG_DEAD (reg/v:SF 98 [ a ])
(nil))))
(insn 8 7 9 2 (set (reg:SI 102)
(xor:SI (subreg:SI (reg:SF 101) 0)
(const_int -2147483648 [0xffffffff80000000]))) "t66.c":3:35 490
{xorsi3}
(expr_list:REG_DEAD (reg:SF 101)
(nil)))
But on x86_64 it is plus:
(insn 8 7 9 2 (parallel [
(set (reg:SI 92)
(plus:SI (subreg:SI (reg:SF 91) 0)
(const_int -2147483648 [0xffffffff80000000])))
(clobber (reg:CC 17 flags))
]) "t87.c":3:35 207 {*addsi_1}
(expr_list:REG_DEAD (reg:SF 91)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil))))
It is xor until fwprop1 on x86_64.
Where it changes:
(insn 8 7 9 2 (parallel [
(set (reg:SI 92)
(xor:SI (subreg:SI (reg:SF 91) 0)
(const_int -2147483648 [0xffffffff80000000])))
(clobber (reg:CC 17 flags))
]) "t87.c":3:35 529 {*xorsi_1}
(nil))
into:
(insn 8 7 9 2 (parallel [
(set (reg:SI 92)
(plus:SI (subreg:SI (reg:SF 91) 0)
(const_int -2147483648 [0xffffffff80000000])))
(clobber (reg:CC 17 flags))
]) "t87.c":3:35 207 {*addsi_1}
(expr_list:REG_DEAD (reg:SF 91)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil))))
But even if I change 0x80000000 to 0x80000001 (to force it to stay XOR), I
still don't get the SSE instruction.
Note aarch64 gets it right though:
.cfi_startproc
fmul s0, s0, s1
movi v1.2s, 0x80, lsl 24
eor v0.8b, v0.8b, v1.8b
fcvtzs w0, s0
ret
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/101311] GCC refuses to use SSE registers to carry out an explicit XOR on a float.
2021-07-03 21:00 [Bug target/101311] New: GCC refuses to use SSE registers to carry out an explicit XOR on a float the4naves at gmail dot com
2021-07-03 21:08 ` [Bug target/101311] " the4naves at gmail dot com
2021-07-03 21:27 ` pinskia at gcc dot gnu.org
@ 2021-07-05 6:33 ` rguenth at gcc dot gnu.org
2021-11-28 6:49 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-07-05 6:33 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101311
--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
float func(float a)
{
union { float f; unsigned u; } u;
u.f = a;
u.u ^= 0x80000000;
return u.f;
}
is the example w/o TBAA issue and down to the point WRT refusing negate via
XOR.
movd %xmm0, %eax
addl $-2147483648, %eax
movd %eax, %xmm0
ret
The core issue (besides transforming XOR to ADD) is that *xorsi_1 does not
have a xmm alternative:
(insn 7 6 11 2 (parallel [
(set (subreg:SI (reg:SF 84 [ <retval> ]) 0)
(xor:SI (subreg:SI (reg:SF 88) 0)
(const_int -2147483647 [0xffffffff80000001])))
(clobber (reg:CC 17 flags))
]) "t.c":6:12 529 {*xorsi_1}
and this we are forced to allocate a GPR. The XOR to ADD is done via
the *lea<mode>_general_4 splitters I think. Not sure if adding xmm
alternatives really makes sense but STV doesn't consider the above since
there are subregs involved already and it checks for REG_P instead of
REG_OR_SUBREG_P.
It might be a idea to specifically split the subreg case into yet another
insn variant ...
^ permalink raw reply [flat|nested] 5+ messages in thread
* [Bug target/101311] GCC refuses to use SSE registers to carry out an explicit XOR on a float.
2021-07-03 21:00 [Bug target/101311] New: GCC refuses to use SSE registers to carry out an explicit XOR on a float the4naves at gmail dot com
` (2 preceding siblings ...)
2021-07-05 6:33 ` rguenth at gcc dot gnu.org
@ 2021-11-28 6:49 ` pinskia at gcc dot gnu.org
3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-11-28 6:49 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101311
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |DUPLICATE
Status|NEW |RESOLVED
--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Dup of bug 98962.
*** This bug has been marked as a duplicate of bug 98962 ***
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-11-28 6:49 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-03 21:00 [Bug target/101311] New: GCC refuses to use SSE registers to carry out an explicit XOR on a float the4naves at gmail dot com
2021-07-03 21:08 ` [Bug target/101311] " the4naves at gmail dot com
2021-07-03 21:27 ` pinskia at gcc dot gnu.org
2021-07-05 6:33 ` rguenth at gcc dot gnu.org
2021-11-28 6:49 ` pinskia at gcc dot gnu.org
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).