public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/109463] New: suboptimal sequence for converting 64-bit unsigned int to float
@ 2023-04-10 10:45 elronnd at elronnd dot net
  2023-04-10 10:54 ` [Bug target/109463] " pinskia at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: elronnd at elronnd dot net @ 2023-04-10 10:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109463

            Bug ID: 109463
           Summary: suboptimal sequence for converting 64-bit unsigned int
                    to float
           Product: gcc
           Version: 12.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: elronnd at elronnd dot net
  Target Milestone: ---

double f(uint64_t x) { return x; } gives:

test   rdi,rdi
js     10 <f+0x10>
pxor   xmm0,xmm0
cvtsi2sd xmm0,rdi
ret
nop
10:
mov    rax,rdi
and    edi,0x1
pxor   xmm0,xmm0
shr    rax,1
or     rax,rdi
cvtsi2sd xmm0,rax
addsd  xmm0,xmm0
ret

In particular, the sequence:

mov    rax,rdi
and    edi,0x1
shr    rax,1
or     rax,rdi
cvtsi2sd xmm0,rax

Can be replaced with:

movzx  eax,dil
shr    rdi,1
or     rdi,rax
cvtsi2sd xmm0,rdi

Since all 9 low bits of rdi are below the sticky bit, oring them together in
any order suffices to round correctly.

Alternatively, in order to avoid clobbering rdi, use the following sequence:

mov    rax,rdi
shr    rax,1
or     al,dil
cvtsi2sd xmm0,rax

(The penalty for partial register access appears to be very cheap or
nonexistent on recent uarchs.)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/109463] suboptimal sequence for converting 64-bit unsigned int to float
  2023-04-10 10:45 [Bug c/109463] New: suboptimal sequence for converting 64-bit unsigned int to float elronnd at elronnd dot net
@ 2023-04-10 10:54 ` pinskia at gcc dot gnu.org
  2023-04-10 10:56 ` pinskia at gcc dot gnu.org
  2023-04-10 10:57 ` elronnd at elronnd dot net
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-04-10 10:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109463

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement
           Keywords|                            |missed-optimization

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
clang/LLVM produces:

        movq    %rdi, %xmm1
        punpckldq       .LCPI1_0(%rip), %xmm1   # xmm1 =
xmm1[0],mem[0],xmm1[1],mem[1]
        subpd   .LCPI1_1(%rip), %xmm1
        movapd  %xmm1, %xmm0
        unpckhpd        %xmm1, %xmm0                    # xmm0 =
xmm0[1],xmm1[1]
        addsd   %xmm1, %xmm0
        retq

LCPI1_0 being:
.LCPI1_1:
        .quad   0x4330000000000000              # double 4503599627370496
        .quad   0x4530000000000000              # double 1.9342813113834067E+25


note clang even produces that even if you say the top bit is not set via:
double f(unsigned long x) { if (x >>63) __builtin_unreachable(); return x; }

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/109463] suboptimal sequence for converting 64-bit unsigned int to float
  2023-04-10 10:45 [Bug c/109463] New: suboptimal sequence for converting 64-bit unsigned int to float elronnd at elronnd dot net
  2023-04-10 10:54 ` [Bug target/109463] " pinskia at gcc dot gnu.org
@ 2023-04-10 10:56 ` pinskia at gcc dot gnu.org
  2023-04-10 10:57 ` elronnd at elronnd dot net
  2 siblings, 0 replies; 4+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-04-10 10:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109463

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
It might be the case that having the top bit set for an 64bit unsigned integer
is not often enough to optimize for ...

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug target/109463] suboptimal sequence for converting 64-bit unsigned int to float
  2023-04-10 10:45 [Bug c/109463] New: suboptimal sequence for converting 64-bit unsigned int to float elronnd at elronnd dot net
  2023-04-10 10:54 ` [Bug target/109463] " pinskia at gcc dot gnu.org
  2023-04-10 10:56 ` pinskia at gcc dot gnu.org
@ 2023-04-10 10:57 ` elronnd at elronnd dot net
  2 siblings, 0 replies; 4+ messages in thread
From: elronnd at elronnd dot net @ 2023-04-10 10:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109463

--- Comment #3 from elronnd at elronnd dot net ---
Yes, I think the gcc approach of branching is definitely better.  But it's
still a good idea to optimise for size in the cold path.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-04-10 10:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-10 10:45 [Bug c/109463] New: suboptimal sequence for converting 64-bit unsigned int to float elronnd at elronnd dot net
2023-04-10 10:54 ` [Bug target/109463] " pinskia at gcc dot gnu.org
2023-04-10 10:56 ` pinskia at gcc dot gnu.org
2023-04-10 10:57 ` elronnd at elronnd dot net

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).