From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
 id 41324385742C; Wed, 27 Apr 2022 12:37:17 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 41324385742C
From: "peter at cordes dot ca" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/65146] alignment of _Atomic structure member is not
 correct
Date: Wed, 27 Apr 2022 12:37:16 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: changed
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 4.9.2
X-Bugzilla-Keywords: ABI
X-Bugzilla-Severity: normal
X-Bugzilla-Who: peter at cordes dot ca
X-Bugzilla-Status: ASSIGNED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P1
X-Bugzilla-Assigned-To: jakub at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: 
Message-ID: <bug-65146-4-AqHPyeShy1@http.gcc.gnu.org/bugzilla/>
In-Reply-To: <bug-65146-4@http.gcc.gnu.org/bugzilla/>
References: <bug-65146-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
X-BeenThere: gcc-bugs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-bugs mailing list <gcc-bugs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-bugs/>
List-Post: <mailto:gcc-bugs@gcc.gnu.org>
List-Help: <mailto:gcc-bugs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-bugs>,
 <mailto:gcc-bugs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Wed, 27 Apr 2022 12:37:17 -0000

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D65146
--- Comment #25 from Peter Cordes <peter at cordes dot ca> ---
(In reply to CVS Commits from comment #24)
> The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:
>=20
> https://gcc.gnu.org/g:04df5e7de2f3dd652a9cddc1c9adfbdf45947ae6
>=20
> commit r11-2909-g04df5e7de2f3dd652a9cddc1c9adfbdf45947ae6
> Author: Jakub Jelinek <jakub@redhat.com>
> Date:   Thu Aug 27 18:44:40 2020 +0200
>=20
>     ia32: Fix alignment of _Atomic fields [PR65146]
>=20=20=20=20=20
>     For _Atomic fields, lowering the alignment of long long or double etc.
>     fields on ia32 is undesirable, because then one really can't perform
> atomic
>     operations on those using cmpxchg8b.


Just for the record, the description of this bugfix incorrectly mentioned
cmpxchg8b being a problem.  lock cmpxchg8b is *always* atomic, even if that
means the CPU has to take a bus lock (disastrously expensive affecting all
cores system-wide) instead of just delaying MESI response for one line
exclusively owned in this core's private cache (aka cache lock).

The correctness problem is __atomic_load_n / __atomic_store_n compiling to
actual 8-byte pure loads / pure stores using SSE2 movq, SSE1 movlps, or x87
fild/fistp (bouncing through the stack), such as

  movq  %xmm0, (%eax)

That's where correctness depends on Intel and AMD's atomicity guarantees wh=
ich
are conditional on alignment.

(And if AVX is supported, same deal for 16-byte load/store.  Although we can
and should use movaps for that, which bakes alignment checking into the
instruction.  Intel did recently document that CPUs with AVX guarantee
atomicity of 16-byte aligned loads/stores, retroactive to all CPUs with AVX=
.=20
It's about time, but yay.)

>     Not sure about iamcu_alignment change, I know next to nothing about IA
> MCU,
>     but unless it doesn't have cmpxchg8b instruction, it would surprise me
> if we
>     don't want to do it as well.


I had to google iamcu.  Apparently it's Pentium-like, but only has soft-FP =
(so
I assume no MMX or SSE as well as no x87).

If that leaves it no way to do 8-byte load/store except (lock) cmpxchg8b, t=
hat
may mean there's no need for alignment, unless cache-line-split lock is sti=
ll a
performance issue.  If it's guaranteed unicore as well, we can even omit the
lock prefix and cmpxchg8b will still be an atomic RMW (or load or store) wr=
t.
interrupts.  (And being unicore would likely mean much less system-wide
overhead for a split lock.)=