public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/101311] New: GCC refuses to use SSE registers to carry out an explicit XOR on a float.
@ 2021-07-03 21:00 the4naves at gmail dot com
  2021-07-03 21:08 ` [Bug target/101311] " the4naves at gmail dot com
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: the4naves at gmail dot com @ 2021-07-03 21:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101311

            Bug ID: 101311
           Summary: GCC refuses to use SSE registers to carry out an
                    explicit XOR on a float.
           Product: gcc
           Version: 11.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: the4naves at gmail dot com
  Target Milestone: ---

// ---------------
int func(float a, float b) {
    float tmp = a * b;
    *reinterpret_cast<int*>(&tmp) ^= 0x80000000;

    return tmp;
}

int main() {
    return func(2, 4);
}
// ---------------

Compiling this with `g++ test.cpp -O3 -Wall -Wextra -fno-strict-aliasing
-fwrapv -fno-aggressive-loop-optimizations -fsanitize=undefined` (removing the
various strict flags achieves the same thing) gives no compile warnings and
successfully returns `248` (-8) when run.

Looking at the assembly for `func`, GCC generates:
# ---------------
mulss      xmm0, xmm1
mov        eax, -2147483648
movd       DWORD PTR [rsp-20], xmm0
add        eax, DWORD PTR [rsp-20]
moved      xmm0, eax
cvttss2si  eax, xmm0
ret
# ---------------

I find a couple of things odd with this:
  - Memory is used as a temporary buffer. There shouldn't be any latency
between the write and read due to store forwarding, but that cache line is
going to have to be written to memory at some point.
 - Necessitating the previous point, GCC uses eax to carry out the XOR,
requiring a move to and from the register.
  - GCC seems to favor an `add` instead of `xor`. I've seen it mentioned that
an add should be slightly faster due to consecutive instructions not being
blocked in the pipeline, but I'm don't see why a `xor` would be (don't quote me
on this though).

Replacing the explicit XOR with a negation (`tmp = -tmp`) generates much more
sensible assembly (.LC0 contains the xor constant):
# ---------------
mulss      xmm0, xmm1
xorps      xmm0, XMMWORD PTR .LC0[rip]
cvttss2si  eax, xmm0
ret
# ---------------

To be fair, in my example, negation is easily just the better method, but it
seems silly that GCC goes to such lengths in the first snippet as to not use
`xorps` (which as far as I can tell is just as fast as `add`). It looks like
maybe GCC is confused by the cast to int (and as such doesn't want to use the
xmm regs)?

Exact version is 11.1.0 under x86_64-linux-gnu, but I was able to reproduce
this as far back as 4.9.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/101311] GCC refuses to use SSE registers to carry out an explicit XOR on a float.
  2021-07-03 21:00 [Bug target/101311] New: GCC refuses to use SSE registers to carry out an explicit XOR on a float the4naves at gmail dot com
@ 2021-07-03 21:08 ` the4naves at gmail dot com
  2021-07-03 21:27 ` pinskia at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: the4naves at gmail dot com @ 2021-07-03 21:08 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101311

--- Comment #1 from Josh Nave <the4naves at gmail dot com> ---
Additionally, other instructions could have been used (such as `pxor`) which
are less float-centric (and maybe faster?).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/101311] GCC refuses to use SSE registers to carry out an explicit XOR on a float.
  2021-07-03 21:00 [Bug target/101311] New: GCC refuses to use SSE registers to carry out an explicit XOR on a float the4naves at gmail dot com
  2021-07-03 21:08 ` [Bug target/101311] " the4naves at gmail dot com
@ 2021-07-03 21:27 ` pinskia at gcc dot gnu.org
  2021-07-05  6:33 ` rguenth at gcc dot gnu.org
  2021-11-28  6:49 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-07-03 21:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101311

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
           Severity|normal                      |enhancement
   Last reconfirmed|                            |2021-07-03
     Ever confirmed|0                           |1

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
So for aarch64 it is xor:
(insn 7 4 8 2 (set (reg:SF 101)
        (mult:SF (reg/v:SF 98 [ a ])
            (reg/v:SF 99 [ b ]))) "t66.c":2:19 966 {mulsf3}
     (expr_list:REG_DEAD (reg/v:SF 99 [ b ])
        (expr_list:REG_DEAD (reg/v:SF 98 [ a ])
            (nil))))
(insn 8 7 9 2 (set (reg:SI 102)
        (xor:SI (subreg:SI (reg:SF 101) 0)
            (const_int -2147483648 [0xffffffff80000000]))) "t66.c":3:35 490
{xorsi3}
     (expr_list:REG_DEAD (reg:SF 101)
        (nil)))

But on x86_64 it is plus:
(insn 8 7 9 2 (parallel [
            (set (reg:SI 92)
                (plus:SI (subreg:SI (reg:SF 91) 0)
                    (const_int -2147483648 [0xffffffff80000000])))
            (clobber (reg:CC 17 flags))
        ]) "t87.c":3:35 207 {*addsi_1}
     (expr_list:REG_DEAD (reg:SF 91)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))

It is xor until fwprop1 on x86_64.

Where it changes:
(insn 8 7 9 2 (parallel [
            (set (reg:SI 92)
                (xor:SI (subreg:SI (reg:SF 91) 0)
                    (const_int -2147483648 [0xffffffff80000000])))
            (clobber (reg:CC 17 flags))
        ]) "t87.c":3:35 529 {*xorsi_1}
     (nil))

into:
(insn 8 7 9 2 (parallel [
            (set (reg:SI 92)
                (plus:SI (subreg:SI (reg:SF 91) 0)
                    (const_int -2147483648 [0xffffffff80000000])))
            (clobber (reg:CC 17 flags))
        ]) "t87.c":3:35 207 {*addsi_1}
     (expr_list:REG_DEAD (reg:SF 91)
        (expr_list:REG_UNUSED (reg:CC 17 flags)
            (nil))))

But even if I change 0x80000000 to 0x80000001 (to force it to stay XOR), I
still don't get the SSE instruction.

Note aarch64 gets it right though:
        .cfi_startproc
        fmul    s0, s0, s1
        movi    v1.2s, 0x80, lsl 24
        eor     v0.8b, v0.8b, v1.8b
        fcvtzs  w0, s0
        ret

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/101311] GCC refuses to use SSE registers to carry out an explicit XOR on a float.
  2021-07-03 21:00 [Bug target/101311] New: GCC refuses to use SSE registers to carry out an explicit XOR on a float the4naves at gmail dot com
  2021-07-03 21:08 ` [Bug target/101311] " the4naves at gmail dot com
  2021-07-03 21:27 ` pinskia at gcc dot gnu.org
@ 2021-07-05  6:33 ` rguenth at gcc dot gnu.org
  2021-11-28  6:49 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: rguenth at gcc dot gnu.org @ 2021-07-05  6:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101311

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
float func(float a)
{
  union { float f; unsigned u; } u;
  u.f = a;
  u.u ^= 0x80000000;
  return u.f;
}

is the example w/o TBAA issue and down to the point WRT refusing negate via
XOR.

        movd    %xmm0, %eax
        addl    $-2147483648, %eax
        movd    %eax, %xmm0
        ret

The core issue (besides transforming XOR to ADD) is that *xorsi_1 does not
have a xmm alternative:

(insn 7 6 11 2 (parallel [
            (set (subreg:SI (reg:SF 84 [ <retval> ]) 0) 
                (xor:SI (subreg:SI (reg:SF 88) 0)
                    (const_int -2147483647 [0xffffffff80000001])))
            (clobber (reg:CC 17 flags))
        ]) "t.c":6:12 529 {*xorsi_1}

and this we are forced to allocate a GPR.  The XOR to ADD is done via
the *lea<mode>_general_4 splitters I think.  Not sure if adding xmm
alternatives really makes sense but STV doesn't consider the above since
there are subregs involved already and it checks for REG_P instead of
REG_OR_SUBREG_P.

It might be a idea to specifically split the subreg case into yet another
insn variant ...

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/101311] GCC refuses to use SSE registers to carry out an explicit XOR on a float.
  2021-07-03 21:00 [Bug target/101311] New: GCC refuses to use SSE registers to carry out an explicit XOR on a float the4naves at gmail dot com
                   ` (2 preceding siblings ...)
  2021-07-05  6:33 ` rguenth at gcc dot gnu.org
@ 2021-11-28  6:49 ` pinskia at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: pinskia at gcc dot gnu.org @ 2021-11-28  6:49 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101311

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |DUPLICATE
             Status|NEW                         |RESOLVED

--- Comment #4 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Dup of bug 98962.

*** This bug has been marked as a duplicate of bug 98962 ***

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-11-28  6:49 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-03 21:00 [Bug target/101311] New: GCC refuses to use SSE registers to carry out an explicit XOR on a float the4naves at gmail dot com
2021-07-03 21:08 ` [Bug target/101311] " the4naves at gmail dot com
2021-07-03 21:27 ` pinskia at gcc dot gnu.org
2021-07-05  6:33 ` rguenth at gcc dot gnu.org
2021-11-28  6:49 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).