* [PATCH] libgcc: Add a weak stub for __sync_synchronize @ 2020-11-03 15:08 Bernd Edlinger 2020-11-17 5:43 ` [PING] " Bernd Edlinger 2020-11-17 12:44 ` Richard Earnshaw (lists) 0 siblings, 2 replies; 17+ messages in thread From: Bernd Edlinger @ 2020-11-03 15:08 UTC (permalink / raw) To: gcc-patches, Ramana Radhakrishnan, Richard Earnshaw [-- Attachment #1: Type: text/plain, Size: 545 bytes --] Hi, this fixes a problem with a missing symbol __sync_synchronize which happens when newlib is used together with libstdc++ for the non-threaded simulator target arm-none-eabi. There are several questions on stackoverflow about this issue. I would like to add a weak symbol for this target, since this is only a default implementation and not meant to override a possibly more sophisticated synchronization function from the c-runtime. Regression tested successfully on arm-none-eabi with newlib-3.3.0. Is it OK for trunk? Thanks Bernd. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Add-a-weak-stub-for-__sync_synchronize.patch --] [-- Type: text/x-patch; name="0001-Add-a-weak-stub-for-__sync_synchronize.patch", Size: 3334 bytes --] From f8a3df552f4b98308096659c66b4c8ea68580f25 Mon Sep 17 00:00:00 2001 From: Bernd Edlinger <bernd.edlinger@hotmail.de> Date: Mon, 2 Nov 2020 11:43:44 +0100 Subject: [PATCH] libgcc: Add a weak stub for __sync_synchronize This patch adds a default implementation for __sync_synchronize, which prevents many unresolved symbol errors on arm-none-eabi. This happens often in C++ programs even without any threading. libgcc: 2020-11-02 Bernd Edlinger <bernd.edlinger@hotmail.de> * config.host: Use t-eabi for arm-none-eabi. * config/arm/t-eabi: New. * config/arm/eabi-sync.c: New. --- libgcc/config.host | 2 +- libgcc/config/arm/eabi-sync.c | 39 +++++++++++++++++++++++++++++++++++++++ libgcc/config/arm/t-eabi | 1 + 3 files changed, 41 insertions(+), 1 deletion(-) create mode 100644 libgcc/config/arm/eabi-sync.c create mode 100644 libgcc/config/arm/t-eabi diff --git a/libgcc/config.host b/libgcc/config.host index fd8e55e..e25abf4 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -495,7 +495,7 @@ arm*-*-eabi* | arm*-*-symbianelf* | arm*-*-rtems*) tm_file="$tm_file arm/bpabi-lib.h" case ${host} in arm*-*-eabi* | arm*-*-rtems*) - tmake_file="${tmake_file} arm/t-bpabi t-crtfm" + tmake_file="${tmake_file} arm/t-bpabi t-crtfm arm/t-eabi" extra_parts="crtbegin.o crtend.o crti.o crtn.o" ;; arm*-*-symbianelf*) diff --git a/libgcc/config/arm/eabi-sync.c b/libgcc/config/arm/eabi-sync.c new file mode 100644 index 0000000..bffdd4a --- /dev/null +++ b/libgcc/config/arm/eabi-sync.c @@ -0,0 +1,39 @@ +/* Copyright (C) 2020 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +<http://www.gnu.org/licenses/>. */ + +void __attribute__ ((weak)) +__sync_synchronize (void) +{ +#if defined (__ARM_ARCH_6__) || defined (__ARM_ARCH_6J__) \ + || defined (__ARM_ARCH_6K__) || defined (__ARM_ARCH_6T2__) \ + || defined (__ARM_ARCH_6Z__) || defined (__ARM_ARCH_6ZK__) \ + || defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__) +#if defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__) + __asm __volatile ("dmb" : : : "memory"); +#else + __asm __volatile ("mcr p15, 0, r0, c7, c10, 5" : : : "memory"); +#endif +#else + __asm __volatile ("nop" : : : "memory"); +#endif +} diff --git a/libgcc/config/arm/t-eabi b/libgcc/config/arm/t-eabi new file mode 100644 index 0000000..556032f --- /dev/null +++ b/libgcc/config/arm/t-eabi @@ -0,0 +1 @@ +LIB2ADD_ST += $(srcdir)/config/arm/eabi-sync.c -- 1.9.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PING] [PATCH] libgcc: Add a weak stub for __sync_synchronize 2020-11-03 15:08 [PATCH] libgcc: Add a weak stub for __sync_synchronize Bernd Edlinger @ 2020-11-17 5:43 ` Bernd Edlinger 2020-11-17 12:44 ` Richard Earnshaw (lists) 1 sibling, 0 replies; 17+ messages in thread From: Bernd Edlinger @ 2020-11-17 5:43 UTC (permalink / raw) To: gcc-patches, Ramana Radhakrishnan, Richard Earnshaw Ping... I'd like to ping for this patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557886.html Thanks Bernd. On 11/3/20 4:08 PM, Bernd Edlinger wrote: > Hi, > > this fixes a problem with a missing symbol __sync_synchronize > which happens when newlib is used together with libstdc++ for > the non-threaded simulator target arm-none-eabi. > > There are several questions on stackoverflow about this issue. > > I would like to add a weak symbol for this target, since this > is only a default implementation and not meant to override a > possibly more sophisticated synchronization function from the > c-runtime. > > > Regression tested successfully on arm-none-eabi with newlib-3.3.0. > > Is it OK for trunk? > > > Thanks > Bernd. > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] libgcc: Add a weak stub for __sync_synchronize 2020-11-03 15:08 [PATCH] libgcc: Add a weak stub for __sync_synchronize Bernd Edlinger 2020-11-17 5:43 ` [PING] " Bernd Edlinger @ 2020-11-17 12:44 ` Richard Earnshaw (lists) 2020-11-17 15:18 ` Bernd Edlinger 1 sibling, 1 reply; 17+ messages in thread From: Richard Earnshaw (lists) @ 2020-11-17 12:44 UTC (permalink / raw) To: Bernd Edlinger, gcc-patches, Ramana Radhakrishnan On 03/11/2020 15:08, Bernd Edlinger wrote: > Hi, > > this fixes a problem with a missing symbol __sync_synchronize > which happens when newlib is used together with libstdc++ for > the non-threaded simulator target arm-none-eabi. > > There are several questions on stackoverflow about this issue. > > I would like to add a weak symbol for this target, since this > is only a default implementation and not meant to override a > possibly more sophisticated synchronization function from the > c-runtime. > > > Regression tested successfully on arm-none-eabi with newlib-3.3.0. > > Is it OK for trunk? > > > Thanks > Bernd. > I seem to recall that this was a deliberate decision - you can't guess this correctly, at least when trying to build portable code - you just have to know which runtime you will be using. I think Ramana had some changes in the works at one point to address (some) of this, but I'm not sure what happened to them. Ramana? +#if defined (__ARM_ARCH_6__) || defined (__ARM_ARCH_6J__) \ + || defined (__ARM_ARCH_6K__) || defined (__ARM_ARCH_6T2__) \ + || defined (__ARM_ARCH_6Z__) || defined (__ARM_ARCH_6ZK__) \ + || defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__) +#if defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__) Ug, no! Use the ACLE macros to avoid this sort of mess. R. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] libgcc: Add a weak stub for __sync_synchronize 2020-11-17 12:44 ` Richard Earnshaw (lists) @ 2020-11-17 15:18 ` Bernd Edlinger 2020-11-17 15:41 ` Richard Earnshaw (lists) 0 siblings, 1 reply; 17+ messages in thread From: Bernd Edlinger @ 2020-11-17 15:18 UTC (permalink / raw) To: Richard Earnshaw (lists), gcc-patches, Ramana Radhakrishnan [-- Attachment #1: Type: text/plain, Size: 2842 bytes --] On 11/17/20 1:44 PM, Richard Earnshaw (lists) wrote: > On 03/11/2020 15:08, Bernd Edlinger wrote: >> Hi, >> >> this fixes a problem with a missing symbol __sync_synchronize >> which happens when newlib is used together with libstdc++ for >> the non-threaded simulator target arm-none-eabi. >> >> There are several questions on stackoverflow about this issue. >> >> I would like to add a weak symbol for this target, since this >> is only a default implementation and not meant to override a >> possibly more sophisticated synchronization function from the >> c-runtime. >> >> >> Regression tested successfully on arm-none-eabi with newlib-3.3.0. >> >> Is it OK for trunk? >> >> >> Thanks >> Bernd. >> > > I seem to recall that this was a deliberate decision - you can't guess > this correctly, at least when trying to build portable code - you just > have to know which runtime you will be using. > Therefore I suggest to use the weak attribute. It is on purpose not implementing all of the atomics. The use case, is a C++ program which initializes a local static variable. $ cat test.cc #include <string> main(int argc, char **argv) { static std::string x = "test"; return 0; } compiles to this: sub sp, sp, #20 str r0, [fp, #-24] str r1, [fp, #-28] ldr r3, .L14 ldrb r4, [r3] bl __sync_synchronize and r3, r4, #255 and r3, r3, #1 cmp r3, #0 moveq r3, #1 movne r3, #0 and r3, r3, #255 cmp r3, #0 beq .L8 ldr r0, .L14 bl __cxa_guard_acquire mov r3, r0 so __sync_synchronize is not defined in newlib since the target (arm-sim) is known to be not multi-threaded, but __cxa_guard_acquire is also not a thread safe function, because __GTHREADS is not defined by libgcc, since it is known at configure time, that the target does not support threads. So libstdc++ does not try to use a mutex or any atomics either, because it is not compiled with __GTHREADS. I can further narrow down the patch by only defining this function when __GTHREADS is not defined, to make it more clear. > I think Ramana had some changes in the works at one point to address > (some) of this, but I'm not sure what happened to them. Ramana? > > > +#if defined (__ARM_ARCH_6__) || defined (__ARM_ARCH_6J__) \ > + || defined (__ARM_ARCH_6K__) || defined (__ARM_ARCH_6T2__) \ > + || defined (__ARM_ARCH_6Z__) || defined (__ARM_ARCH_6ZK__) \ > + || defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__) > +#if defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__) > > Ug, no! Use the ACLE macros to avoid this sort of mess. > Ah, thanks, copy-paste from freebsd-atomic.c :) I've attached the updated patch. Is it OK? Thanks Bernd. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-libgcc-Add-a-weak-stub-for-__sync_synchronize.patch --] [-- Type: text/x-patch; name="0001-libgcc-Add-a-weak-stub-for-__sync_synchronize.patch", Size: 3087 bytes --] From ca44e1fcb4b991306cbcde6293d20c77ce74ad68 Mon Sep 17 00:00:00 2001 From: Bernd Edlinger <bernd.edlinger@hotmail.de> Date: Mon, 2 Nov 2020 11:43:44 +0100 Subject: [PATCH] libgcc: Add a weak stub for __sync_synchronize This patch adds a default implementation for __sync_synchronize, which prevents many unresolved symbol errors on arm-none-eabi. This happens often in C++ programs even without any threading. libgcc: 2020-11-17 Bernd Edlinger <bernd.edlinger@hotmail.de> * config.host: Use t-eabi for arm-none-eabi. * config/arm/t-eabi: New. * config/arm/eabi-sync.c: New. --- libgcc/config.host | 2 +- libgcc/config/arm/eabi-sync.c | 38 ++++++++++++++++++++++++++++++++++++++ libgcc/config/arm/t-eabi | 1 + 3 files changed, 40 insertions(+), 1 deletion(-) create mode 100644 libgcc/config/arm/eabi-sync.c create mode 100644 libgcc/config/arm/t-eabi diff --git a/libgcc/config.host b/libgcc/config.host index 66af834..eaf74f1 100644 --- a/libgcc/config.host +++ b/libgcc/config.host @@ -496,7 +496,7 @@ arm*-*-eabi* | arm*-*-symbianelf* | arm*-*-rtems*) tm_file="$tm_file arm/bpabi-lib.h" case ${host} in arm*-*-eabi* | arm*-*-rtems*) - tmake_file="${tmake_file} arm/t-bpabi t-crtfm" + tmake_file="${tmake_file} arm/t-bpabi t-crtfm arm/t-eabi" extra_parts="crtbegin.o crtend.o crti.o crtn.o" ;; arm*-*-symbianelf*) diff --git a/libgcc/config/arm/eabi-sync.c b/libgcc/config/arm/eabi-sync.c new file mode 100644 index 0000000..c37bacf --- /dev/null +++ b/libgcc/config/arm/eabi-sync.c @@ -0,0 +1,38 @@ +/* Copyright (C) 2020 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify it under +the terms of the GNU General Public License as published by the Free +Software Foundation; either version 3, or (at your option) any later +version. + +GCC is distributed in the hope that it will be useful, but WITHOUT ANY +WARRANTY; without even the implied warranty of MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License +for more details. + +Under Section 7 of GPL version 3, you are granted additional +permissions described in the GCC Runtime Library Exception, version +3.1, as published by the Free Software Foundation. + +You should have received a copy of the GNU General Public License and +a copy of the GCC Runtime Library Exception along with this program; +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see +<http://www.gnu.org/licenses/>. */ + +#include "gthr.h" + +#ifndef __GTHREADS +void __attribute__ ((weak)) +__sync_synchronize (void) +{ +#if __ARM_ARCH >= 7 + __asm __volatile ("dmb" : : : "memory"); +#elif ARM_ARCH == 6 + __asm __volatile ("mcr p15, 0, r0, c7, c10, 5" : : : "memory"); +#else + __asm __volatile ("nop" : : : "memory"); +#endif +} +#endif diff --git a/libgcc/config/arm/t-eabi b/libgcc/config/arm/t-eabi new file mode 100644 index 0000000..556032f --- /dev/null +++ b/libgcc/config/arm/t-eabi @@ -0,0 +1 @@ +LIB2ADD_ST += $(srcdir)/config/arm/eabi-sync.c -- 1.9.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] libgcc: Add a weak stub for __sync_synchronize 2020-11-17 15:18 ` Bernd Edlinger @ 2020-11-17 15:41 ` Richard Earnshaw (lists) 2020-11-17 15:51 ` Christophe Lyon ` (2 more replies) 0 siblings, 3 replies; 17+ messages in thread From: Richard Earnshaw (lists) @ 2020-11-17 15:41 UTC (permalink / raw) To: Bernd Edlinger, gcc-patches, Ramana Radhakrishnan On 17/11/2020 15:18, Bernd Edlinger wrote: > On 11/17/20 1:44 PM, Richard Earnshaw (lists) wrote: >> On 03/11/2020 15:08, Bernd Edlinger wrote: >>> Hi, >>> >>> this fixes a problem with a missing symbol __sync_synchronize >>> which happens when newlib is used together with libstdc++ for >>> the non-threaded simulator target arm-none-eabi. >>> >>> There are several questions on stackoverflow about this issue. >>> >>> I would like to add a weak symbol for this target, since this >>> is only a default implementation and not meant to override a >>> possibly more sophisticated synchronization function from the >>> c-runtime. >>> >>> >>> Regression tested successfully on arm-none-eabi with newlib-3.3.0. >>> >>> Is it OK for trunk? >>> >>> >>> Thanks >>> Bernd. >>> >> >> I seem to recall that this was a deliberate decision - you can't guess >> this correctly, at least when trying to build portable code - you just >> have to know which runtime you will be using. >> > > Therefore I suggest to use the weak attribute. It is on purpose not > implementing all of the atomics. > > The use case, is a C++ program which initializes a local static variable. > > $ cat test.cc > #include <string> > main(int argc, char **argv) > { > static std::string x = "test"; > return 0; > } > > compiles to this: > sub sp, sp, #20 > str r0, [fp, #-24] > str r1, [fp, #-28] > ldr r3, .L14 > ldrb r4, [r3] > bl __sync_synchronize > and r3, r4, #255 > and r3, r3, #1 > cmp r3, #0 > moveq r3, #1 > movne r3, #0 > and r3, r3, #255 > cmp r3, #0 > beq .L8 > ldr r0, .L14 > bl __cxa_guard_acquire > mov r3, r0 > > so __sync_synchronize is not defined in newlib since the target (arm-sim) > is known to be not multi-threaded, > but __cxa_guard_acquire is also not a thread safe function, > because __GTHREADS is not defined by libgcc, since it is known > at configure time, that the target does not support threads. > So libstdc++ does not try to use a mutex or any atomics either, > because it is not compiled with __GTHREADS. > > I can further narrow down the patch by only defining this function when > __GTHREADS is not defined, to make it more clear. > > >> I think Ramana had some changes in the works at one point to address >> (some) of this, but I'm not sure what happened to them. Ramana? >> >> >> +#if defined (__ARM_ARCH_6__) || defined (__ARM_ARCH_6J__) \ >> + || defined (__ARM_ARCH_6K__) || defined (__ARM_ARCH_6T2__) \ >> + || defined (__ARM_ARCH_6Z__) || defined (__ARM_ARCH_6ZK__) \ >> + || defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__) >> +#if defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__) >> >> Ug, no! Use the ACLE macros to avoid this sort of mess. >> > > Ah, thanks, copy-paste from freebsd-atomic.c :) > > > I've attached the updated patch. > Is it OK? > > > Thanks > Bernd. > libgcc is *still* the wrong place for this. It belongs in the system library (eg newlib, or glibc, or whatever), which knows about the system it's running on. (Sorry, I should have said this before, but I've context-switched this out since it's been a long time since it came up). This hack will just lead to silent code failure of the worst kind (non-reproducable, racy) at runtime. R. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] libgcc: Add a weak stub for __sync_synchronize 2020-11-17 15:41 ` Richard Earnshaw (lists) @ 2020-11-17 15:51 ` Christophe Lyon 2020-11-17 17:17 ` Bernd Edlinger 2020-11-22 8:05 ` [PATCH] Avoid atomic for guard acquire when that is expensive Bernd Edlinger 2 siblings, 0 replies; 17+ messages in thread From: Christophe Lyon @ 2020-11-17 15:51 UTC (permalink / raw) To: Richard Earnshaw (lists) Cc: Bernd Edlinger, gcc-patches, Ramana Radhakrishnan On Tue, 17 Nov 2020 at 16:41, Richard Earnshaw (lists) via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > On 17/11/2020 15:18, Bernd Edlinger wrote: > > On 11/17/20 1:44 PM, Richard Earnshaw (lists) wrote: > >> On 03/11/2020 15:08, Bernd Edlinger wrote: > >>> Hi, > >>> > >>> this fixes a problem with a missing symbol __sync_synchronize > >>> which happens when newlib is used together with libstdc++ for > >>> the non-threaded simulator target arm-none-eabi. > >>> > >>> There are several questions on stackoverflow about this issue. > >>> > >>> I would like to add a weak symbol for this target, since this > >>> is only a default implementation and not meant to override a > >>> possibly more sophisticated synchronization function from the > >>> c-runtime. > >>> > >>> > >>> Regression tested successfully on arm-none-eabi with newlib-3.3.0. > >>> > >>> Is it OK for trunk? > >>> > >>> > >>> Thanks > >>> Bernd. > >>> > >> > >> I seem to recall that this was a deliberate decision - you can't guess > >> this correctly, at least when trying to build portable code - you just > >> have to know which runtime you will be using. > >> > > > > Therefore I suggest to use the weak attribute. It is on purpose not > > implementing all of the atomics. > > > > The use case, is a C++ program which initializes a local static variable. > > > > $ cat test.cc > > #include <string> > > main(int argc, char **argv) > > { > > static std::string x = "test"; > > return 0; > > } > > > > compiles to this: > > sub sp, sp, #20 > > str r0, [fp, #-24] > > str r1, [fp, #-28] > > ldr r3, .L14 > > ldrb r4, [r3] > > bl __sync_synchronize > > and r3, r4, #255 > > and r3, r3, #1 > > cmp r3, #0 > > moveq r3, #1 > > movne r3, #0 > > and r3, r3, #255 > > cmp r3, #0 > > beq .L8 > > ldr r0, .L14 > > bl __cxa_guard_acquire > > mov r3, r0 > > > > so __sync_synchronize is not defined in newlib since the target (arm-sim) > > is known to be not multi-threaded, > > but __cxa_guard_acquire is also not a thread safe function, > > because __GTHREADS is not defined by libgcc, since it is known > > at configure time, that the target does not support threads. > > So libstdc++ does not try to use a mutex or any atomics either, > > because it is not compiled with __GTHREADS. > > > > I can further narrow down the patch by only defining this function when > > __GTHREADS is not defined, to make it more clear. > > > > > >> I think Ramana had some changes in the works at one point to address > >> (some) of this, but I'm not sure what happened to them. Ramana? > >> > >> > >> +#if defined (__ARM_ARCH_6__) || defined (__ARM_ARCH_6J__) \ > >> + || defined (__ARM_ARCH_6K__) || defined (__ARM_ARCH_6T2__) \ > >> + || defined (__ARM_ARCH_6Z__) || defined (__ARM_ARCH_6ZK__) \ > >> + || defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__) > >> +#if defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__) > >> > >> Ug, no! Use the ACLE macros to avoid this sort of mess. > >> > > > > Ah, thanks, copy-paste from freebsd-atomic.c :) > > > > > > I've attached the updated patch. > > Is it OK? > > > > > > Thanks > > Bernd. > > > > libgcc is *still* the wrong place for this. It belongs in the system > library (eg newlib, or glibc, or whatever), which knows about the system > it's running on. (Sorry, I should have said this before, but I've > context-switched this out since it's been a long time since it came up). > > This hack will just lead to silent code failure of the worst kind > (non-reproducable, racy) at runtime. > I haven't fully re-read the thread, but I think this is related to an old discussion, not very well archived in: https://gcc.gnu.org/pipermail/gcc-patches/2016-November/462299.html There's a pointer to a newlib patch from Ramana. > R. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] libgcc: Add a weak stub for __sync_synchronize 2020-11-17 15:41 ` Richard Earnshaw (lists) 2020-11-17 15:51 ` Christophe Lyon @ 2020-11-17 17:17 ` Bernd Edlinger 2020-11-22 8:05 ` [PATCH] Avoid atomic for guard acquire when that is expensive Bernd Edlinger 2 siblings, 0 replies; 17+ messages in thread From: Bernd Edlinger @ 2020-11-17 17:17 UTC (permalink / raw) To: Richard Earnshaw (lists), gcc-patches, Ramana Radhakrishnan On 11/17/20 4:41 PM, Richard Earnshaw (lists) wrote: > > libgcc is *still* the wrong place for this. It belongs in the system > library (eg newlib, or glibc, or whatever), which knows about the system > it's running on. (Sorry, I should have said this before, but I've > context-switched this out since it's been a long time since it came up). > No problem. I just saw it from the other end. It is odd that this problem does not go away even if gcc is configured with --disable-threads, which should be the default for arm-none-eabi anyway. If we assume a threaded environment then it is still libgcc which does not define __GTHREADS in libgcc/gthr.h, and libstdc++'s __cxa_guard_acquire is not making use of functions like __gthread_mutex_lock. But that appears to be this way by design. Of course the race is not fixed if you ask newlib to implement just this __sync_synchronize function. > This hack will just lead to silent code failure of the worst kind > (non-reproducable, racy) at runtime. > So in a arm-none-eabi system with armv6 or higher where the intrinsic __sync_synchronize is not a library call but an instruction we have exactly this worst kind scenario, already. It is however possible that the default of -fthreadsafe_statics is inappropriate for --disable-threads ? Bernd. > R. > ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH] Avoid atomic for guard acquire when that is expensive 2020-11-17 15:41 ` Richard Earnshaw (lists) 2020-11-17 15:51 ` Christophe Lyon 2020-11-17 17:17 ` Bernd Edlinger @ 2020-11-22 8:05 ` Bernd Edlinger 2020-11-24 22:10 ` Jason Merrill 2020-11-30 20:08 ` [PING] " Bernd Edlinger 2 siblings, 2 replies; 17+ messages in thread From: Bernd Edlinger @ 2020-11-22 8:05 UTC (permalink / raw) To: Richard Earnshaw (lists), gcc-patches, Ramana Radhakrishnan, Jason Merrill, Nathan Sidwell, Christophe Lyon [-- Attachment #1: Type: text/plain, Size: 755 bytes --] Hi, this avoids the need to use -fno-threadsafe-statics on arm-none-eabi or working around that problem by supplying a dummy __sync_synchronize function which might just lead to silent code failure of the worst kind (non-reproducable, racy) at runtime, as was pointed out on previous discussions here. When the atomic access involves a call to __sync_synchronize it is better to call __cxa_guard_acquire unconditionally, since it handles the atomics too, or is a non-threaded implementation when there is no gthread support for this target. This fixes also a bug for the ARM EABI big-endian target, that is, previously the wrong bit was checked. Regression tested successfully on arm-none-eabi with newlib-3.3.0. Is it OK for trunk? Thanks Bernd. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Avoid-atomic-for-guard-acquire-when-that-is-expensiv.patch --] [-- Type: text/x-patch; name="0001-Avoid-atomic-for-guard-acquire-when-that-is-expensiv.patch", Size: 5984 bytes --] From 9fd070407408cf10789f5e9eb5ddda2fb3798e6f Mon Sep 17 00:00:00 2001 From: Bernd Edlinger <bernd.edlinger@hotmail.de> Date: Sun, 22 Nov 2020 08:11:14 +0100 Subject: [PATCH] Avoid atomic for guard acquire when that is expensive When the atomic access involves a call to __sync_synchronize it is better to call __cxa_guard_acquire unconditionally, since it handles the atomics too, or is a non-threaded implementation when there is no gthread support for this target. This fixes also a bug for the ARM EABI big-endian target, that is, previously the wrong bit was checked. gcc: 2020-11-22 Bernd Edlinger <bernd.edlinger@hotmail.de> * target.def (guard_atomic_expensive): New hook. * doc/tm.texi: Document TARGET_CXX_GUARD_ATOMIC_EXPENSIVE. * doc/tm.texi.in: Likewise. * config/arm/arm.c (arm_cxx_guard_atomic_expensive): New callback. gcc/cp: 2020-11-22 Bernd Edlinger <bernd.edlinger@hotmail.de> * decl2.c: (build_atomic_load_byte): Rename to... (build_atomic_load_type): ... and add new parameter type. (get_guard_cond): Skip the atomic here if that is expensive. Use the correct type for the atomic load on certain targets. --- gcc/config/arm/arm.c | 13 +++++++++++++ gcc/cp/decl2.c | 12 ++++++++---- gcc/doc/tm.texi | 7 +++++++ gcc/doc/tm.texi.in | 2 ++ gcc/target.def | 10 ++++++++++ 5 files changed, 40 insertions(+), 4 deletions(-) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 04190b1..04ca1fe 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -235,6 +235,7 @@ static rtx arm_dwarf_register_span (rtx); static tree arm_cxx_guard_type (void); static bool arm_cxx_guard_mask_bit (void); +static bool arm_cxx_guard_atomic_expensive (void); static tree arm_get_cookie_size (tree); static bool arm_cookie_has_size (void); static bool arm_cxx_cdtor_returns_this (void); @@ -593,6 +594,9 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_CXX_GUARD_MASK_BIT #define TARGET_CXX_GUARD_MASK_BIT arm_cxx_guard_mask_bit +#undef TARGET_CXX_GUARD_ATOMIC_EXPENSIVE +#define TARGET_CXX_GUARD_ATOMIC_EXPENSIVE arm_cxx_guard_atomic_expensive + #undef TARGET_CXX_GET_COOKIE_SIZE #define TARGET_CXX_GET_COOKIE_SIZE arm_get_cookie_size @@ -28882,6 +28886,15 @@ arm_cxx_guard_mask_bit (void) } +/* Return true if atomics involve a call to __sync_synchronize. */ + +static bool +arm_cxx_guard_atomic_expensive (void) +{ + return TARGET_AAPCS_BASED && !TARGET_HAVE_DMB && !TARGET_HAVE_DMB_MCR; +} + + /* The EABI specifies that all array cookies are 8 bytes long. */ static tree diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index 1bc7b7e..e2b29a6 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -3300,15 +3300,15 @@ get_guard (tree decl) /* Return an atomic load of src with the appropriate memory model. */ static tree -build_atomic_load_byte (tree src, HOST_WIDE_INT model) +build_atomic_load_type (tree src, HOST_WIDE_INT model, tree type) { - tree ptr_type = build_pointer_type (char_type_node); + tree ptr_type = build_pointer_type (type); tree mem_model = build_int_cst (integer_type_node, model); tree t, addr, val; unsigned int size; int fncode; - size = tree_to_uhwi (TYPE_SIZE_UNIT (char_type_node)); + size = tree_to_uhwi (TYPE_SIZE_UNIT (type)); fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1; t = builtin_decl_implicit ((enum built_in_function) fncode); @@ -3350,8 +3350,12 @@ get_guard_cond (tree guard, bool thread_safe) if (!thread_safe) guard = get_guard_bits (guard); + else if (targetm.cxx.guard_atomic_expensive ()) + guard = integer_zero_node; else - guard = build_atomic_load_byte (guard, MEMMODEL_ACQUIRE); + guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, + targetm.cxx.guard_mask_bit () + ? TREE_TYPE (guard) : char_type_node); /* Mask off all but the low bit. */ if (targetm.cxx.guard_mask_bit ()) diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 2b88f78..92215cf 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -10728,6 +10728,13 @@ This hook determines how guard variables are used. It should return @code{true} indicates that only the least significant bit should be used. @end deftypefn +@deftypefn {Target Hook} bool TARGET_CXX_GUARD_ATOMIC_EXPENSIVE (void) +This hook determines if the guard atomic is expensive. It should return +@code{false} (the default) if the atomic is inexpensive. A return value of +@code{true} indicates that the atomic is expensive i.e., involves a call to +__sync_synchronize. In this case let __cxa_guard_acquire handle the atomics. +@end deftypefn + @deftypefn {Target Hook} tree TARGET_CXX_GET_COOKIE_SIZE (tree @var{type}) This hook returns the size of the cookie to use when allocating an array whose elements have the indicated @var{type}. Assumes that it is already diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 897f289..ce1d837 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -7321,6 +7321,8 @@ floating-point support; they are not included in this mechanism. @hook TARGET_CXX_GUARD_MASK_BIT +@hook TARGET_CXX_GUARD_ATOMIC_EXPENSIVE + @hook TARGET_CXX_GET_COOKIE_SIZE @hook TARGET_CXX_COOKIE_HAS_SIZE diff --git a/gcc/target.def b/gcc/target.def index 810d554..0c02d5c 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -6160,6 +6160,16 @@ DEFHOOK bool, (void), hook_bool_void_false) +/* Return true if the guard atomic is expensive. */ +DEFHOOK +(guard_atomic_expensive, + "This hook determines if the guard atomic is expensive. It should return\n\ +@code{false} (the default) if the atomic is inexpensive. A return value of\n\ +@code{true} indicates that the atomic is expensive i.e., involves a call to\n\ +__sync_synchronize. In this case let __cxa_guard_acquire handle the atomics.", + bool, (void), + hook_bool_void_false) + /* Returns the size of the array cookie for an array of type. */ DEFHOOK (get_cookie_size, -- 1.9.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Avoid atomic for guard acquire when that is expensive 2020-11-22 8:05 ` [PATCH] Avoid atomic for guard acquire when that is expensive Bernd Edlinger @ 2020-11-24 22:10 ` Jason Merrill 2020-12-01 18:28 ` Bernd Edlinger 2020-11-30 20:08 ` [PING] " Bernd Edlinger 1 sibling, 1 reply; 17+ messages in thread From: Jason Merrill @ 2020-11-24 22:10 UTC (permalink / raw) To: Bernd Edlinger, Richard Earnshaw (lists), gcc-patches, Ramana Radhakrishnan, Nathan Sidwell, Christophe Lyon On 11/22/20 3:05 AM, Bernd Edlinger wrote: > Hi, > > this avoids the need to use -fno-threadsafe-statics on > arm-none-eabi or working around that problem by supplying > a dummy __sync_synchronize function which might > just lead to silent code failure of the worst kind > (non-reproducable, racy) at runtime, as was pointed out > on previous discussions here. > > When the atomic access involves a call to __sync_synchronize > it is better to call __cxa_guard_acquire unconditionally, > since it handles the atomics too, or is a non-threaded > implementation when there is no gthread support for this target. > > This fixes also a bug for the ARM EABI big-endian target, > that is, previously the wrong bit was checked. Instead of a new target macro, can't you follow fold_builtin_atomic_always_lock_free/can_atomic_load_p? Jason ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Avoid atomic for guard acquire when that is expensive 2020-11-24 22:10 ` Jason Merrill @ 2020-12-01 18:28 ` Bernd Edlinger 2020-12-02 18:57 ` Jason Merrill 0 siblings, 1 reply; 17+ messages in thread From: Bernd Edlinger @ 2020-12-01 18:28 UTC (permalink / raw) To: Jason Merrill, Richard Earnshaw (lists), gcc-patches, Ramana Radhakrishnan, Nathan Sidwell, Christophe Lyon [-- Attachment #1: Type: text/plain, Size: 974 bytes --] On 11/24/20 11:10 PM, Jason Merrill wrote: > On 11/22/20 3:05 AM, Bernd Edlinger wrote: >> Hi, >> >> this avoids the need to use -fno-threadsafe-statics on >> arm-none-eabi or working around that problem by supplying >> a dummy __sync_synchronize function which might >> just lead to silent code failure of the worst kind >> (non-reproducable, racy) at runtime, as was pointed out >> on previous discussions here. >> >> When the atomic access involves a call to __sync_synchronize >> it is better to call __cxa_guard_acquire unconditionally, >> since it handles the atomics too, or is a non-threaded >> implementation when there is no gthread support for this target. >> >> This fixes also a bug for the ARM EABI big-endian target, >> that is, previously the wrong bit was checked. > > Instead of a new target macro, can't you follow fold_builtin_atomic_always_lock_free/can_atomic_load_p? > Yes, thanks, that should work too. Would you like this better? Thanks Bernd. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Avoid-atomic-for-guard-acquire-when-that-is-expensiv.patch --] [-- Type: text/x-patch; name="0001-Avoid-atomic-for-guard-acquire-when-that-is-expensiv.patch", Size: 3074 bytes --] From 863b023786f7e8321979bcdb33c027d37d5e155b Mon Sep 17 00:00:00 2001 From: Bernd Edlinger <bernd.edlinger@hotmail.de> Date: Tue, 1 Dec 2020 18:54:48 +0100 Subject: [PATCH] Avoid atomic for guard acquire when that is expensive When the atomic access involves a call to __sync_synchronize it is better to call __cxa_guard_acquire unconditionally, since it handles the atomics too, or is a non-threaded implementation when there is no gthread support for this target. This fixes also a bug for the ARM EABI big-endian target, that is, previously the wrong bit was checked. 2020-11-22 Bernd Edlinger <bernd.edlinger@hotmail.de> * decl2.c: (is_atomic_expensive_p): New helper function. (build_atomic_load_byte): Rename to... (build_atomic_load_type): ... and add new parameter type. (get_guard_cond): Skip the atomic here if that is expensive. Use the correct type for the atomic load on certain targets. --- gcc/cp/decl2.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index 1bc7b7e..5001e8f 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -48,6 +48,7 @@ along with GCC; see the file COPYING3. If not see #include "intl.h" #include "c-family/c-ada-spec.h" #include "asan.h" +#include "optabs-query.h" /* Id for dumping the raw trees. */ int raw_dump_id; @@ -3297,18 +3298,34 @@ get_guard (tree decl) return guard; } +/* Returns true if accessing the GUARD atomic is expensive, + i.e. involves a call to __sync_synchronize or similar. + In this case let __cxa_guard_acquire handle the atomics. */ + +static bool +is_atomic_expensive_p (machine_mode mode) +{ + if (!flag_inline_atomics) + return false; + + if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode)) + return false; + + return true; +} + /* Return an atomic load of src with the appropriate memory model. */ static tree -build_atomic_load_byte (tree src, HOST_WIDE_INT model) +build_atomic_load_type (tree src, HOST_WIDE_INT model, tree type) { - tree ptr_type = build_pointer_type (char_type_node); + tree ptr_type = build_pointer_type (type); tree mem_model = build_int_cst (integer_type_node, model); tree t, addr, val; unsigned int size; int fncode; - size = tree_to_uhwi (TYPE_SIZE_UNIT (char_type_node)); + size = tree_to_uhwi (TYPE_SIZE_UNIT (type)); fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1; t = builtin_decl_implicit ((enum built_in_function) fncode); @@ -3351,7 +3368,15 @@ get_guard_cond (tree guard, bool thread_safe) if (!thread_safe) guard = get_guard_bits (guard); else - guard = build_atomic_load_byte (guard, MEMMODEL_ACQUIRE); + { + tree type = targetm.cxx.guard_mask_bit () + ? TREE_TYPE (guard) : char_type_node; + + if (is_atomic_expensive_p (TYPE_MODE (type))) + guard = integer_zero_node; + else + guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type); + } /* Mask off all but the low bit. */ if (targetm.cxx.guard_mask_bit ()) -- 1.9.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Avoid atomic for guard acquire when that is expensive 2020-12-01 18:28 ` Bernd Edlinger @ 2020-12-02 18:57 ` Jason Merrill 2020-12-05 12:37 ` Bernd Edlinger 0 siblings, 1 reply; 17+ messages in thread From: Jason Merrill @ 2020-12-02 18:57 UTC (permalink / raw) To: Bernd Edlinger, Richard Earnshaw (lists), gcc-patches, Ramana Radhakrishnan, Nathan Sidwell, Christophe Lyon On 12/1/20 1:28 PM, Bernd Edlinger wrote: > On 11/24/20 11:10 PM, Jason Merrill wrote: >> On 11/22/20 3:05 AM, Bernd Edlinger wrote: >>> Hi, >>> >>> this avoids the need to use -fno-threadsafe-statics on >>> arm-none-eabi or working around that problem by supplying >>> a dummy __sync_synchronize function which might >>> just lead to silent code failure of the worst kind >>> (non-reproducable, racy) at runtime, as was pointed out >>> on previous discussions here. >>> >>> When the atomic access involves a call to __sync_synchronize >>> it is better to call __cxa_guard_acquire unconditionally, >>> since it handles the atomics too, or is a non-threaded >>> implementation when there is no gthread support for this target. >>> >>> This fixes also a bug for the ARM EABI big-endian target, >>> that is, previously the wrong bit was checked. >> >> Instead of a new target macro, can't you follow fold_builtin_atomic_always_lock_free/can_atomic_load_p? > > Yes, thanks, that should work too. > Would you like this better? > +is_atomic_expensive_p (machine_mode mode) > +{ > + if (!flag_inline_atomics) > + return false; Why not true? > + if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode)) > + return false; This also seems backwards; I'd think we want to return false if either of those tests are true. Or maybe just can_atomic_load_p, and not bother about compare-and-swap. > + tree type = targetm.cxx.guard_mask_bit () > + ? TREE_TYPE (guard) : char_type_node; > + > + if (is_atomic_expensive_p (TYPE_MODE (type))) > + guard = integer_zero_node; > + else > + guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type); It should still work to load a single byte, it just needs to be the least-significant byte. And this isn't an EABI issue; it looks like the non-EABI code is also broken for big-endian targets, both the atomic load and the normal load in get_guard_bits. Jason ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Avoid atomic for guard acquire when that is expensive 2020-12-02 18:57 ` Jason Merrill @ 2020-12-05 12:37 ` Bernd Edlinger 2020-12-07 15:04 ` Jason Merrill 0 siblings, 1 reply; 17+ messages in thread From: Bernd Edlinger @ 2020-12-05 12:37 UTC (permalink / raw) To: Jason Merrill, Richard Earnshaw (lists), gcc-patches, Ramana Radhakrishnan, Nathan Sidwell, Christophe Lyon [-- Attachment #1: Type: text/plain, Size: 3096 bytes --] On 12/2/20 7:57 PM, Jason Merrill wrote: > On 12/1/20 1:28 PM, Bernd Edlinger wrote: >> On 11/24/20 11:10 PM, Jason Merrill wrote: >>> On 11/22/20 3:05 AM, Bernd Edlinger wrote: >>>> Hi, >>>> >>>> this avoids the need to use -fno-threadsafe-statics on >>>> arm-none-eabi or working around that problem by supplying >>>> a dummy __sync_synchronize function which might >>>> just lead to silent code failure of the worst kind >>>> (non-reproducable, racy) at runtime, as was pointed out >>>> on previous discussions here. >>>> >>>> When the atomic access involves a call to __sync_synchronize >>>> it is better to call __cxa_guard_acquire unconditionally, >>>> since it handles the atomics too, or is a non-threaded >>>> implementation when there is no gthread support for this target. >>>> >>>> This fixes also a bug for the ARM EABI big-endian target, >>>> that is, previously the wrong bit was checked. >>> >>> Instead of a new target macro, can't you follow fold_builtin_atomic_always_lock_free/can_atomic_load_p? >> >> Yes, thanks, that should work too. >> Would you like this better? > >> +is_atomic_expensive_p (machine_mode mode) >> +{ >> + if (!flag_inline_atomics) >> + return false; > > Why not true? > Ooops... Yes, I ought to return true here. I must have made a mistake when I tested the last version of this patch, sorry for the confusion. >> + if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode)) >> + return false; > > This also seems backwards; I'd think we want to return false if either of those tests are true. Or maybe just can_atomic_load_p, and not bother about compare-and-swap. > Yes, you are right. Unfortuately can_atomic_load_p is too weak, since it does not cover the memory barrier. And can_compare_and_swap_p (..., false) is actually a bit too strong, but if it returns true, we should be able to use any atomic without need for a library call. >> + tree type = targetm.cxx.guard_mask_bit () >> + ? TREE_TYPE (guard) : char_type_node; >> + >> + if (is_atomic_expensive_p (TYPE_MODE (type))) >> + guard = integer_zero_node; >> + else >> + guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type); > > It should still work to load a single byte, it just needs to be the least-significant byte. And this isn't an EABI issue; it looks like the non-EABI code is also broken for big-endian targets, both the atomic load and the normal load in get_guard_bits. > I think the non-EABI code is always using bit 0 in the first byte, by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1). Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise. For all other targets when _GLIBCXX_USE_FUTEX is defined, __cxa_guard_XXX accesses the value as int* while the memory is a 64-bit long, so I could imagine that is an aliasing violation. But nothing that needs to be fixed immediately. Attached is the corrected patch. Tested again on arm-none-eabi with arm-sim. Is it OK for trunk? Thanks Bernd. > Jason > [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Avoid-atomic-for-guard-acquire-when-that-is-expensiv.patch --] [-- Type: text/x-patch; name="0001-Avoid-atomic-for-guard-acquire-when-that-is-expensiv.patch", Size: 3073 bytes --] From c1a6dcaa906113cba0dc88e36460a65aba35ec38 Mon Sep 17 00:00:00 2001 From: Bernd Edlinger <bernd.edlinger@hotmail.de> Date: Tue, 1 Dec 2020 18:54:48 +0100 Subject: [PATCH] Avoid atomic for guard acquire when that is expensive When the atomic access involves a call to __sync_synchronize it is better to call __cxa_guard_acquire unconditionally, since it handles the atomics too, or is a non-threaded implementation when there is no gthread support for this target. This fixes also a bug for the ARM EABI big-endian target, that is, previously the wrong bit was checked. 2020-11-22 Bernd Edlinger <bernd.edlinger@hotmail.de> * decl2.c: (is_atomic_expensive_p): New helper function. (build_atomic_load_byte): Rename to... (build_atomic_load_type): ... and add new parameter type. (get_guard_cond): Skip the atomic here if that is expensive. Use the correct type for the atomic load on certain targets. --- gcc/cp/decl2.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-) diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index 1bc7b7e..c7ddcc9 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -48,6 +48,7 @@ along with GCC; see the file COPYING3. If not see #include "intl.h" #include "c-family/c-ada-spec.h" #include "asan.h" +#include "optabs-query.h" /* Id for dumping the raw trees. */ int raw_dump_id; @@ -3297,18 +3298,34 @@ get_guard (tree decl) return guard; } +/* Returns true if accessing the GUARD atomic is expensive, + i.e. involves a call to __sync_synchronize or similar. + In this case let __cxa_guard_acquire handle the atomics. */ + +static bool +is_atomic_expensive_p (machine_mode mode) +{ + if (!flag_inline_atomics) + return true; + + if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode)) + return true; + + return false; +} + /* Return an atomic load of src with the appropriate memory model. */ static tree -build_atomic_load_byte (tree src, HOST_WIDE_INT model) +build_atomic_load_type (tree src, HOST_WIDE_INT model, tree type) { - tree ptr_type = build_pointer_type (char_type_node); + tree ptr_type = build_pointer_type (type); tree mem_model = build_int_cst (integer_type_node, model); tree t, addr, val; unsigned int size; int fncode; - size = tree_to_uhwi (TYPE_SIZE_UNIT (char_type_node)); + size = tree_to_uhwi (TYPE_SIZE_UNIT (type)); fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1; t = builtin_decl_implicit ((enum built_in_function) fncode); @@ -3351,7 +3368,15 @@ get_guard_cond (tree guard, bool thread_safe) if (!thread_safe) guard = get_guard_bits (guard); else - guard = build_atomic_load_byte (guard, MEMMODEL_ACQUIRE); + { + tree type = targetm.cxx.guard_mask_bit () + ? TREE_TYPE (guard) : char_type_node; + + if (is_atomic_expensive_p (TYPE_MODE (type))) + guard = integer_zero_node; + else + guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type); + } /* Mask off all but the low bit. */ if (targetm.cxx.guard_mask_bit ()) -- 1.9.1 ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Avoid atomic for guard acquire when that is expensive 2020-12-05 12:37 ` Bernd Edlinger @ 2020-12-07 15:04 ` Jason Merrill 2020-12-07 16:17 ` Bernd Edlinger 0 siblings, 1 reply; 17+ messages in thread From: Jason Merrill @ 2020-12-07 15:04 UTC (permalink / raw) To: Bernd Edlinger, Richard Earnshaw (lists), gcc-patches, Ramana Radhakrishnan, Nathan Sidwell, Christophe Lyon On 12/5/20 7:37 AM, Bernd Edlinger wrote: > On 12/2/20 7:57 PM, Jason Merrill wrote: >> On 12/1/20 1:28 PM, Bernd Edlinger wrote: >>> On 11/24/20 11:10 PM, Jason Merrill wrote: >>>> On 11/22/20 3:05 AM, Bernd Edlinger wrote: >>>>> Hi, >>>>> >>>>> this avoids the need to use -fno-threadsafe-statics on >>>>> arm-none-eabi or working around that problem by supplying >>>>> a dummy __sync_synchronize function which might >>>>> just lead to silent code failure of the worst kind >>>>> (non-reproducable, racy) at runtime, as was pointed out >>>>> on previous discussions here. >>>>> >>>>> When the atomic access involves a call to __sync_synchronize >>>>> it is better to call __cxa_guard_acquire unconditionally, >>>>> since it handles the atomics too, or is a non-threaded >>>>> implementation when there is no gthread support for this target. >>>>> >>>>> This fixes also a bug for the ARM EABI big-endian target, >>>>> that is, previously the wrong bit was checked. >>>> >>>> Instead of a new target macro, can't you follow fold_builtin_atomic_always_lock_free/can_atomic_load_p? >>> >>> Yes, thanks, that should work too. >>> Would you like this better? >> >>> +is_atomic_expensive_p (machine_mode mode) >>> +{ >>> + if (!flag_inline_atomics) >>> + return false; >> >> Why not true? > > Ooops... > Yes, I ought to return true here. > I must have made a mistake when I tested the last version of this patch, > sorry for the confusion. > >>> + if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode)) >>> + return false; >> >> This also seems backwards; I'd think we want to return false if either of those tests are true. Or maybe just can_atomic_load_p, and not bother about compare-and-swap. > > Yes, you are right. > Unfortuately can_atomic_load_p is too weak, since it does not cover > the memory barrier. > > And can_compare_and_swap_p (..., false) is actually a bit too strong, > but if it returns true, we should be able to use any atomic without > need for a library call. >>> + tree type = targetm.cxx.guard_mask_bit () >>> + ? TREE_TYPE (guard) : char_type_node; >>> + >>> + if (is_atomic_expensive_p (TYPE_MODE (type))) >>> + guard = integer_zero_node; >>> + else >>> + guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type); >> >> It should still work to load a single byte, it just needs to be the least-significant byte. I still don't think you need to load the whole word to check the guard bit. > And this isn't an EABI issue; it looks like the non-EABI code is also broken for big-endian targets, both the atomic load and the normal load in get_guard_bits. >> > > I think the non-EABI code is always using bit 0 in the first byte, > by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1). Except that set_guard sets the least-significant bit on all targets. > Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise. > > For all other targets when _GLIBCXX_USE_FUTEX is defined, > __cxa_guard_XXX accesses the value as int* while the memory > is a 64-bit long, so I could imagine that is an aliasing violation. > > > But nothing that needs to be fixed immediately. Agreed. > Attached is the corrected patch. > > Tested again on arm-none-eabi with arm-sim. > Is it OK for trunk? > > Thanks > Bernd. > >> Jason >> ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Avoid atomic for guard acquire when that is expensive 2020-12-07 15:04 ` Jason Merrill @ 2020-12-07 16:17 ` Bernd Edlinger 2020-12-08 19:50 ` Jason Merrill 0 siblings, 1 reply; 17+ messages in thread From: Bernd Edlinger @ 2020-12-07 16:17 UTC (permalink / raw) To: Jason Merrill, Richard Earnshaw (lists), gcc-patches, Ramana Radhakrishnan, Nathan Sidwell, Christophe Lyon On 12/7/20 4:04 PM, Jason Merrill wrote: > On 12/5/20 7:37 AM, Bernd Edlinger wrote: >> On 12/2/20 7:57 PM, Jason Merrill wrote: >>> On 12/1/20 1:28 PM, Bernd Edlinger wrote:>>>> + tree type = targetm.cxx.guard_mask_bit () >>>> + ? TREE_TYPE (guard) : char_type_node; >>>> + >>>> + if (is_atomic_expensive_p (TYPE_MODE (type))) >>>> + guard = integer_zero_node; >>>> + else >>>> + guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type); >>> >>> It should still work to load a single byte, it just needs to be the least-significant byte. > > I still don't think you need to load the whole word to check the guard bit. > Of course that is also possible. But I would not expect an address offset and a byte access to be cheaper than a word access. I just saw that get_guard_bits does the same thing: access the first byte if !targetm.cxx.guard_mask_bit () and access the whole word otherwise, which is only there for ARM EABI. >> And this isn't an EABI issue; it looks like the non-EABI code is also broken for big-endian targets, both the atomic load and the normal load in get_guard_bits. >>> >> >> I think the non-EABI code is always using bit 0 in the first byte, >> by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1). > > Except that set_guard sets the least-significant bit on all targets. > Hmm, as I said, get_guard_bits gets the guard as a word if !targetm.cxx.guard_mask_bit (), and as the first byte otherwise. Of course it could get the third byte, if !targetm.cxx.guard_mask_bit () && BYTES_BIG_ENDIAN, but it would be more complicated this way, wouldn't it? Bernd. >> Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise. >> >> For all other targets when _GLIBCXX_USE_FUTEX is defined, >> __cxa_guard_XXX accesses the value as int* while the memory >> is a 64-bit long, so I could imagine that is an aliasing violation. >> >> >> But nothing that needs to be fixed immediately. > > Agreed. > >> Attached is the corrected patch. >> >> Tested again on arm-none-eabi with arm-sim. >> Is it OK for trunk? >> >> Thanks >> Bernd. >> >>> Jason >>> > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH] Avoid atomic for guard acquire when that is expensive 2020-12-07 16:17 ` Bernd Edlinger @ 2020-12-08 19:50 ` Jason Merrill 0 siblings, 0 replies; 17+ messages in thread From: Jason Merrill @ 2020-12-08 19:50 UTC (permalink / raw) To: Bernd Edlinger, Richard Earnshaw (lists), gcc-patches, Ramana Radhakrishnan, Nathan Sidwell, Christophe Lyon On 12/7/20 11:17 AM, Bernd Edlinger wrote: > On 12/7/20 4:04 PM, Jason Merrill wrote: >> On 12/5/20 7:37 AM, Bernd Edlinger wrote: >>> On 12/2/20 7:57 PM, Jason Merrill wrote: >>>> On 12/1/20 1:28 PM, Bernd Edlinger wrote:>>>> + tree type = targetm.cxx.guard_mask_bit () >>>>> + ? TREE_TYPE (guard) : char_type_node; >>>>> + >>>>> + if (is_atomic_expensive_p (TYPE_MODE (type))) >>>>> + guard = integer_zero_node; >>>>> + else >>>>> + guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type); >>>> >>>> It should still work to load a single byte, it just needs to be the least-significant byte. >> >> I still don't think you need to load the whole word to check the guard bit. > > Of course that is also possible. But I would not expect an > address offset and a byte access to be cheaper than a word access. Fair point. > I just saw that get_guard_bits does the same thing: > access the first byte if !targetm.cxx.guard_mask_bit () > and access the whole word otherwise, which is only > there for ARM EABI. >>> And this isn't an EABI issue; it looks like the non-EABI code is also broken for big-endian targets, both the atomic load and the normal load in get_guard_bits. >>> >>> I think the non-EABI code is always using bit 0 in the first byte, >>> by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1). >> >> Except that set_guard sets the least-significant bit on all targets. > > Hmm, as I said, get_guard_bits gets the guard as a word if !targetm.cxx.guard_mask_bit (), > and as the first byte otherwise. Of course it could get the third byte, > if !targetm.cxx.guard_mask_bit () && BYTES_BIG_ENDIAN, but it would be more complicated > this way, wouldn't it? Ah, yes, I was overlooking that set_guard uses get_guard_bits. The patch is OK. >>> Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise. >>> >>> For all other targets when _GLIBCXX_USE_FUTEX is defined, >>> __cxa_guard_XXX accesses the value as int* while the memory >>> is a 64-bit long, so I could imagine that is an aliasing violation. >>> >>> >>> But nothing that needs to be fixed immediately. >> >> Agreed. >> >>> Attached is the corrected patch. >>> >>> Tested again on arm-none-eabi with arm-sim. >>> Is it OK for trunk? >>> >>> Thanks >>> Bernd. >>> >>>> Jason >>>> >> > ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PING] [PATCH] Avoid atomic for guard acquire when that is expensive 2020-11-22 8:05 ` [PATCH] Avoid atomic for guard acquire when that is expensive Bernd Edlinger 2020-11-24 22:10 ` Jason Merrill @ 2020-11-30 20:08 ` Bernd Edlinger 2020-11-30 20:54 ` Jason Merrill 1 sibling, 1 reply; 17+ messages in thread From: Bernd Edlinger @ 2020-11-30 20:08 UTC (permalink / raw) To: Richard Earnshaw (lists), gcc-patches, Ramana Radhakrishnan, Jason Merrill, Nathan Sidwell, Christophe Lyon Hi, I'd like to ping for this patch: https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559882.html Thanks Bernd. On 11/22/20 9:05 AM, Bernd Edlinger wrote: > Hi, > > this avoids the need to use -fno-threadsafe-statics on > arm-none-eabi or working around that problem by supplying > a dummy __sync_synchronize function which might > just lead to silent code failure of the worst kind > (non-reproducable, racy) at runtime, as was pointed out > on previous discussions here. > > When the atomic access involves a call to __sync_synchronize > it is better to call __cxa_guard_acquire unconditionally, > since it handles the atomics too, or is a non-threaded > implementation when there is no gthread support for this target. > > This fixes also a bug for the ARM EABI big-endian target, > that is, previously the wrong bit was checked. > > > Regression tested successfully on arm-none-eabi with newlib-3.3.0. > > Is it OK for trunk? > > > Thanks > Bernd. > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PING] [PATCH] Avoid atomic for guard acquire when that is expensive 2020-11-30 20:08 ` [PING] " Bernd Edlinger @ 2020-11-30 20:54 ` Jason Merrill 0 siblings, 0 replies; 17+ messages in thread From: Jason Merrill @ 2020-11-30 20:54 UTC (permalink / raw) To: Bernd Edlinger, Richard Earnshaw (lists), gcc-patches, Ramana Radhakrishnan, Nathan Sidwell, Christophe Lyon On 11/30/20 3:08 PM, Bernd Edlinger wrote: > Hi, > > I'd like to ping for this patch: I reviewed it on the 24th: https://gcc.gnu.org/pipermail/gcc-patches/2020-November/560118.html > > https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559882.html > > Thanks > Bernd. > > On 11/22/20 9:05 AM, Bernd Edlinger wrote: >> Hi, >> >> this avoids the need to use -fno-threadsafe-statics on >> arm-none-eabi or working around that problem by supplying >> a dummy __sync_synchronize function which might >> just lead to silent code failure of the worst kind >> (non-reproducable, racy) at runtime, as was pointed out >> on previous discussions here. >> >> When the atomic access involves a call to __sync_synchronize >> it is better to call __cxa_guard_acquire unconditionally, >> since it handles the atomics too, or is a non-threaded >> implementation when there is no gthread support for this target. >> >> This fixes also a bug for the ARM EABI big-endian target, >> that is, previously the wrong bit was checked. >> >> >> Regression tested successfully on arm-none-eabi with newlib-3.3.0. >> >> Is it OK for trunk? >> >> >> Thanks >> Bernd. >> > ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2020-12-08 19:50 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-11-03 15:08 [PATCH] libgcc: Add a weak stub for __sync_synchronize Bernd Edlinger 2020-11-17 5:43 ` [PING] " Bernd Edlinger 2020-11-17 12:44 ` Richard Earnshaw (lists) 2020-11-17 15:18 ` Bernd Edlinger 2020-11-17 15:41 ` Richard Earnshaw (lists) 2020-11-17 15:51 ` Christophe Lyon 2020-11-17 17:17 ` Bernd Edlinger 2020-11-22 8:05 ` [PATCH] Avoid atomic for guard acquire when that is expensive Bernd Edlinger 2020-11-24 22:10 ` Jason Merrill 2020-12-01 18:28 ` Bernd Edlinger 2020-12-02 18:57 ` Jason Merrill 2020-12-05 12:37 ` Bernd Edlinger 2020-12-07 15:04 ` Jason Merrill 2020-12-07 16:17 ` Bernd Edlinger 2020-12-08 19:50 ` Jason Merrill 2020-11-30 20:08 ` [PING] " Bernd Edlinger 2020-11-30 20:54 ` Jason Merrill
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).