public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/110013] New: [i386] vector_size(8) on 32-bit ABI
@ 2023-05-28  4:29 husseydevin at gmail dot com
  2023-05-28  4:44 ` [Bug target/110013] [i386] vector_size(8) on 32-bit ABI emits broken MMX husseydevin at gmail dot com
  2023-05-28  5:21 ` husseydevin at gmail dot com
  0 siblings, 2 replies; 3+ messages in thread
From: husseydevin at gmail dot com @ 2023-05-28  4:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110013

            Bug ID: 110013
           Summary: [i386] vector_size(8) on 32-bit ABI
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: husseydevin at gmail dot com
  Target Milestone: ---

Closely related to bug 86541, which was fixed on x64 only.

On 32-bit, GCC passes any vector_size(8) vectors to external functions in MMX
registers, similar to how it passes 16 byte vectors in SSE registers. 

This appears to be the only time that GCC will ever naturally generate an MMX
instruction.

This is only good if and only if you are using MMX intrinsics and are manually
handling _mm_empty().

Otherwise, if, say, you are porting over NEON code (where I found this issue)
using the vector_size intrinsics, this can cause some sneaky issues if your
function fails to inline:
1. Things will likely break because GCC doesn't handle MMX and x87 properly
   - Example of broken code (works with -mno-mmx):
https://godbolt.org/z/xafWPohKb
2. You will have a nasty performance toll, more than just a cdecl call, as GCC
doesn't actually know what to do with an MMX register and just spills it into
memory.
   - This especially can be seen when v2sf is used and it places the floats
into MMX registers.

There are two options. The first is to use the weird ABI that Clang seems to
use:

| Type             | SIMD | Params | Return  |
| float            | base | stack  | ST0:ST1 |
| float            | SSE  | XMM0-2 | XMM0    |
| double           | all  | stack  | ST0     |
| long long/__m64  | all  | stack  | EAX:EDX |
| int, short, char | base | stack  | stack   |
| int, short, char | SSE2 | stack  | XMM0    |

However, since the current ABIs aren't 100% compatible anyways, I think that a
much simpler solution is to just convert to SSE like x64 does, falling back to
the stack if SSE is not available.

Changing the ABI to this also allows us to port MMX with SSE (bug 86541) to
32-bit mode. If you REALLY need MMX intrinsics, you can't inline, and you don't
have SSE2, you can cope with a stack spill.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/110013] [i386] vector_size(8) on 32-bit ABI emits broken MMX
  2023-05-28  4:29 [Bug target/110013] New: [i386] vector_size(8) on 32-bit ABI husseydevin at gmail dot com
@ 2023-05-28  4:44 ` husseydevin at gmail dot com
  2023-05-28  5:21 ` husseydevin at gmail dot com
  1 sibling, 0 replies; 3+ messages in thread
From: husseydevin at gmail dot com @ 2023-05-28  4:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110013

--- Comment #1 from Devin Hussey <husseydevin at gmail dot com> ---
As a side note, the official psABI does say that function call parameters use
MM0-MM2, if Clang follows its own rules then it means that the supposed
stability of the ABI is meaningless.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug target/110013] [i386] vector_size(8) on 32-bit ABI emits broken MMX
  2023-05-28  4:29 [Bug target/110013] New: [i386] vector_size(8) on 32-bit ABI husseydevin at gmail dot com
  2023-05-28  4:44 ` [Bug target/110013] [i386] vector_size(8) on 32-bit ABI emits broken MMX husseydevin at gmail dot com
@ 2023-05-28  5:21 ` husseydevin at gmail dot com
  1 sibling, 0 replies; 3+ messages in thread
From: husseydevin at gmail dot com @ 2023-05-28  5:21 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110013

--- Comment #2 from Devin Hussey <husseydevin at gmail dot com> ---
Scratch that. There is a somewhat easy way to fix this following psABI AND
using MMX with SSE.

Upon calling a function, we can have the following sequence

func:
    movdq2q  mm0, xmm0
    movq     mm1, [esp + n]
    call     mmx_func
    movq2dq  xmm0, mm0
    emms

Then, this prologue:

mmx_func:
    movq2dq   xmm0, mm0
    movq2dq   xmm1, mm1
    emms
    ...
    movdq2q   mm0, xmm0
    ret

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-05-28  5:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-28  4:29 [Bug target/110013] New: [i386] vector_size(8) on 32-bit ABI husseydevin at gmail dot com
2023-05-28  4:44 ` [Bug target/110013] [i386] vector_size(8) on 32-bit ABI emits broken MMX husseydevin at gmail dot com
2023-05-28  5:21 ` husseydevin at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).