From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 9183F3858D3C; Sun, 28 May 2023 04:29:09 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9183F3858D3C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1685248149; bh=CXkdOV/ireGiRmzC3uADwO81V21Nx88C/MYgFa7Skxc=; h=From:To:Subject:Date:From; b=CPoNhyeV41bD7ZBqOp0dmwgDKDSBo++cJExVpqF47hJOUrxanXZTRuxtc8tVNPe2u pivbU4EGb2wP8uHeLqM1DIyj0eL5ruaaJ1LToyXuxaNEfWELKRTKEx4pKDE79oJU8K 2OhRX2R/DhYony29aqcveX7cL1i99Zm1Xp9sbfTk= From: "husseydevin at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/110013] New: [i386] vector_size(8) on 32-bit ABI Date: Sun, 28 May 2023 04:29:08 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 14.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: husseydevin at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D110013 Bug ID: 110013 Summary: [i386] vector_size(8) on 32-bit ABI Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: husseydevin at gmail dot com Target Milestone: --- Closely related to bug 86541, which was fixed on x64 only. On 32-bit, GCC passes any vector_size(8) vectors to external functions in M= MX registers, similar to how it passes 16 byte vectors in SSE registers.=20 This appears to be the only time that GCC will ever naturally generate an M= MX instruction. This is only good if and only if you are using MMX intrinsics and are manua= lly handling _mm_empty(). Otherwise, if, say, you are porting over NEON code (where I found this issu= e) using the vector_size intrinsics, this can cause some sneaky issues if your function fails to inline: 1. Things will likely break because GCC doesn't handle MMX and x87 properly - Example of broken code (works with -mno-mmx): https://godbolt.org/z/xafWPohKb 2. You will have a nasty performance toll, more than just a cdecl call, as = GCC doesn't actually know what to do with an MMX register and just spills it in= to memory. - This especially can be seen when v2sf is used and it places the floats into MMX registers. There are two options. The first is to use the weird ABI that Clang seems to use: | Type | SIMD | Params | Return | | float | base | stack | ST0:ST1 | | float | SSE | XMM0-2 | XMM0 | | double | all | stack | ST0 | | long long/__m64 | all | stack | EAX:EDX | | int, short, char | base | stack | stack | | int, short, char | SSE2 | stack | XMM0 | However, since the current ABIs aren't 100% compatible anyways, I think tha= t a much simpler solution is to just convert to SSE like x64 does, falling back= to the stack if SSE is not available. Changing the ABI to this also allows us to port MMX with SSE (bug 86541) to 32-bit mode. If you REALLY need MMX intrinsics, you can't inline, and you d= on't have SSE2, you can cope with a stack spill.=