public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/105468] New: Suboptimal code generation for access of function parameters and return values of type __float128 on x86-64 Windows target.
@ 2022-05-03 19:27 already5chosen at yahoo dot com
  2022-05-03 19:28 ` [Bug target/105468] " already5chosen at yahoo dot com
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: already5chosen at yahoo dot com @ 2022-05-03 19:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105468

            Bug ID: 105468
           Summary: Suboptimal code generation for access of function
                    parameters and return values of type __float128 on
                    x86-64 Windows target.
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: already5chosen at yahoo dot com
  Target Milestone: ---

Created attachment 52921
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52921&action=edit
example routine

>From point of view of x86-64 Window ABI __float128 is yet another structure of
size 16, 100% identical to the rest of them.
Like the rest of them, when it past by value into subroutine, it has to be
stored by caller in temporary location (typically, caller's stack) and the
pointer to the temporary is passed either in GPR or also on stack.
The same for return value - caller allocates temporary storage on its stack and
passes pointer to it to callee in register RCX. Then a callee puts a return
value where it was said.
There is absolutely no "floating-pointness" or "SIMDness" about it.

However, gcc compiler often (although not always) treats __float128 as if it
was somehow related to floating-point side of the machine. I'd guess, it's
somehow related to System V being a primary target and according to System V
x86-64 ABI
__float128 values passed in and out in XMM registers.

In practice it leads to ugly code generation, less so in user code that uses
__float128 for arithmetic, more so in library-type code that attempt to process
__float128 objects with accordance to their binary layout.

Example 1. Access to individual words of __float128 function parameter
  vmovdqu (%rdx),%xmm2
  vmovdqu %xmm2,0x20(%rsp)
  mov     0x28(%rsp),%rdx
  mov     0x20(%rsp),%rcx
instead of simple:
  mov     %rcx,%r12
  mov     0x8(%rdx),%rcx
  mov    (%rdx),%rdx

Example 2. Function returns __float128 value composed of a pair of 64-bit words
  mov     %rax,0x20(%rsp)
  mov     %rdi,0x28(%rsp)
  vmovdqu 0x20(%rsp),%xmm3
  mov     %r12,%rax
  vmovdqu %xmm3,(%r12)
instead of simple:
  mov     %rax,(%r12)
  mov     %r12,%rax
  mov     %rdi,0x8(%r12)


If it was just ugly I wouldn't complain. Unfortunately, sometimes it's also
quite slow, and some of the best modern CPUs are among the worst affected.
As expected, an exact impact varies. From measurable (2-4 clocks) to quite high
(40 clocks).
The highest impact was measured on AMD Zen3 CPU, but in other situations the
same CPU was among the least affected.
Intel CPUs (Ivy Bridge, Haswell, Skylake) showed impact in range from 2 to 21
clock, with both the lowest and the highest values seen on old Ivy Bridge cores
while on newer Skylake  the impact was relatively consistent (6-10 clocks). I
didn't measure yet on the newest Intel CPUs (Ice/Tiger Lake and Alder Lake).

Below, I attached the example code (which is not a mere dummy example, but
actually quite good implementation of sqrtq()) compiled in two variants:
normally and by tricking compiler into thinking that __float128 is "just
another structure". Which, as said above, it is, at least under Win64, but
appears compiler thinks otherwise. Also provided: 3 test benches that measure
the difference in speed between the "normal" and the "tricky" variants in
various surroundings.
Pay attention, the trick is valid only on Windows, don't try it on Linux.

I hope that this particular (i.e. Windows) variant of the problem can be fixed
with relatively little effort.

On the other hand, System V x86-64 target is different matter. Here the impact
is on average smaller, but in worst cases it is about the same as the worst
case one on Windows, and since the problem there is fundamental (a stupid ABI)
I don't believe that there could be an easy fix.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/105468] Suboptimal code generation for access of function parameters and return values of type __float128 on x86-64 Windows target.
  2022-05-03 19:27 [Bug target/105468] New: Suboptimal code generation for access of function parameters and return values of type __float128 on x86-64 Windows target already5chosen at yahoo dot com
@ 2022-05-03 19:28 ` already5chosen at yahoo dot com
  2022-05-03 19:30 ` already5chosen at yahoo dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: already5chosen at yahoo dot com @ 2022-05-03 19:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105468

--- Comment #1 from Michael_S <already5chosen at yahoo dot com> ---
Created attachment 52922
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52922&action=edit
test bench that demonstrates maximal impact on Zen3

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/105468] Suboptimal code generation for access of function parameters and return values of type __float128 on x86-64 Windows target.
  2022-05-03 19:27 [Bug target/105468] New: Suboptimal code generation for access of function parameters and return values of type __float128 on x86-64 Windows target already5chosen at yahoo dot com
  2022-05-03 19:28 ` [Bug target/105468] " already5chosen at yahoo dot com
@ 2022-05-03 19:30 ` already5chosen at yahoo dot com
  2022-05-03 19:31 ` already5chosen at yahoo dot com
  2022-05-03 19:31 ` already5chosen at yahoo dot com
  3 siblings, 0 replies; 5+ messages in thread
From: already5chosen at yahoo dot com @ 2022-05-03 19:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105468

--- Comment #2 from Michael_S <already5chosen at yahoo dot com> ---
Created attachment 52923
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52923&action=edit
test bench that shows lower impact on Zen3, but higher impact on some Intel
CPUs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/105468] Suboptimal code generation for access of function parameters and return values of type __float128 on x86-64 Windows target.
  2022-05-03 19:27 [Bug target/105468] New: Suboptimal code generation for access of function parameters and return values of type __float128 on x86-64 Windows target already5chosen at yahoo dot com
  2022-05-03 19:28 ` [Bug target/105468] " already5chosen at yahoo dot com
  2022-05-03 19:30 ` already5chosen at yahoo dot com
@ 2022-05-03 19:31 ` already5chosen at yahoo dot com
  2022-05-03 19:31 ` already5chosen at yahoo dot com
  3 siblings, 0 replies; 5+ messages in thread
From: already5chosen at yahoo dot com @ 2022-05-03 19:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105468

--- Comment #3 from Michael_S <already5chosen at yahoo dot com> ---
Created attachment 52924
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52924&action=edit
Another test bench that shows lower impact on Zen3, but higher impact on some
Intel CPUs

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/105468] Suboptimal code generation for access of function parameters and return values of type __float128 on x86-64 Windows target.
  2022-05-03 19:27 [Bug target/105468] New: Suboptimal code generation for access of function parameters and return values of type __float128 on x86-64 Windows target already5chosen at yahoo dot com
                   ` (2 preceding siblings ...)
  2022-05-03 19:31 ` already5chosen at yahoo dot com
@ 2022-05-03 19:31 ` already5chosen at yahoo dot com
  3 siblings, 0 replies; 5+ messages in thread
From: already5chosen at yahoo dot com @ 2022-05-03 19:31 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105468

--- Comment #4 from Michael_S <already5chosen at yahoo dot com> ---
Created attachment 52925
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52925&action=edit
build script

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-05-03 19:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-03 19:27 [Bug target/105468] New: Suboptimal code generation for access of function parameters and return values of type __float128 on x86-64 Windows target already5chosen at yahoo dot com
2022-05-03 19:28 ` [Bug target/105468] " already5chosen at yahoo dot com
2022-05-03 19:30 ` already5chosen at yahoo dot com
2022-05-03 19:31 ` already5chosen at yahoo dot com
2022-05-03 19:31 ` already5chosen at yahoo dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).