Hi All, The AArch64 implementation of 128-bit atomics is broken. For 128-bit atomics we rely on pthread barriers to correct guard the address in the pointer to get correct memory ordering. However for 128-bit atomics the address under the lock is different from the original pointer. This means that one of the values under the atomic operation is not protected properly and so we fail during when the user has requested sequential consistency as there's no barrier to enforce this requirement. As such users have resorted to adding an #ifdef GCC #endif around the use of these atomics. This corrects the issue by issuing a barrier only when __ATOMIC_SEQ_CST was requested. To remedy this performance hit I think we should revisit using a similar approach to out-line-atomics for the 128-bit atomics. Note that I believe I need the empty file due to the include_next chain but I am not entirely sure. I have hand verified that the barriers are inserted for atomic seq cst. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? and for backporting to GCC 12, 11 and 10? Thanks, Tamar libatomic/ChangeLog: PR target/102218 * config/aarch64/aarch64-config.h: New file. * config/aarch64/host-config.h: New file. --- inline copy of patch -- diff --git a/libatomic/config/aarch64/aarch64-config.h b/libatomic/config/aarch64/aarch64-config.h new file mode 100644 index 0000000000000000000000000000000000000000..d3474fa8ff80cb0c3ddbf8c48acd931d2339d33d --- /dev/null +++ b/libatomic/config/aarch64/aarch64-config.h @@ -0,0 +1,23 @@ +/* Copyright (C) 2022 Free Software Foundation, Inc. + + This file is part of the GNU Atomic Library (libatomic). + + Libatomic is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + diff --git a/libatomic/config/aarch64/host-config.h b/libatomic/config/aarch64/host-config.h new file mode 100644 index 0000000000000000000000000000000000000000..f445a47d25ef5cc51cd2167069500245d07bf1bc --- /dev/null +++ b/libatomic/config/aarch64/host-config.h @@ -0,0 +1,46 @@ +/* Copyright (C) 2022 Free Software Foundation, Inc. + + This file is part of the GNU Atomic Library (libatomic). + + Libatomic is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY + WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS + FOR A PARTICULAR PURPOSE. See the GNU General Public License for + more details. + + Under Section 7 of GPL version 3, you are granted additional + permissions described in the GCC Runtime Library Exception, version + 3.1, as published by the Free Software Foundation. + + You should have received a copy of the GNU General Public License and + a copy of the GCC Runtime Library Exception along with this program; + see the files COPYING3 and COPYING.RUNTIME respectively. If not, see + . */ + +/* Avoiding the DMB (or kernel helper) can be a good thing. */ +#define WANT_SPECIALCASE_RELAXED + +/* Glibc, at least, uses acq_rel in its pthread mutex + implementation. If the user is asking for seq_cst, + this is insufficient. */ + +static inline void __attribute__((always_inline, artificial)) +pre_seq_barrier(int model) +{ + if (model == __ATOMIC_SEQ_CST) + __atomic_thread_fence (__ATOMIC_SEQ_CST); +} + +static inline void __attribute__((always_inline, artificial)) +post_seq_barrier(int model) +{ + pre_seq_barrier(model); +} + +#define pre_post_seq_barrier 1 + +#include_next --