public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] libgcc: Add a weak stub for __sync_synchronize
@ 2020-11-03 15:08 Bernd Edlinger
  2020-11-17  5:43 ` [PING] " Bernd Edlinger
  2020-11-17 12:44 ` Richard Earnshaw (lists)
  0 siblings, 2 replies; 17+ messages in thread
From: Bernd Edlinger @ 2020-11-03 15:08 UTC (permalink / raw)
  To: gcc-patches, Ramana Radhakrishnan, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 545 bytes --]

Hi,

this fixes a problem with a missing symbol __sync_synchronize
which happens when newlib is used together with libstdc++ for
the non-threaded simulator target arm-none-eabi.

There are several questions on stackoverflow about this issue.

I would like to add a weak symbol for this target, since this
is only a default implementation and not meant to override a
possibly more sophisticated synchronization function from the
c-runtime.


Regression tested successfully on arm-none-eabi with newlib-3.3.0.

Is it OK for trunk?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Add-a-weak-stub-for-__sync_synchronize.patch --]
[-- Type: text/x-patch; name="0001-Add-a-weak-stub-for-__sync_synchronize.patch", Size: 3334 bytes --]

From f8a3df552f4b98308096659c66b4c8ea68580f25 Mon Sep 17 00:00:00 2001
From: Bernd Edlinger <bernd.edlinger@hotmail.de>
Date: Mon, 2 Nov 2020 11:43:44 +0100
Subject: [PATCH] libgcc: Add a weak stub for __sync_synchronize

This patch adds a default implementation for __sync_synchronize,
which prevents many unresolved symbol errors on arm-none-eabi.
This happens often in C++ programs even without any threading.

libgcc:
2020-11-02  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* config.host: Use t-eabi for arm-none-eabi.
	* config/arm/t-eabi: New.
	* config/arm/eabi-sync.c: New.
---
 libgcc/config.host            |  2 +-
 libgcc/config/arm/eabi-sync.c | 39 +++++++++++++++++++++++++++++++++++++++
 libgcc/config/arm/t-eabi      |  1 +
 3 files changed, 41 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/arm/eabi-sync.c
 create mode 100644 libgcc/config/arm/t-eabi

diff --git a/libgcc/config.host b/libgcc/config.host
index fd8e55e..e25abf4 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -495,7 +495,7 @@ arm*-*-eabi* | arm*-*-symbianelf* | arm*-*-rtems*)
 	tm_file="$tm_file arm/bpabi-lib.h"
 	case ${host} in
 	arm*-*-eabi* | arm*-*-rtems*)
-	  tmake_file="${tmake_file} arm/t-bpabi t-crtfm"
+	  tmake_file="${tmake_file} arm/t-bpabi t-crtfm arm/t-eabi"
 	  extra_parts="crtbegin.o crtend.o crti.o crtn.o"
 	  ;;
 	arm*-*-symbianelf*)
diff --git a/libgcc/config/arm/eabi-sync.c b/libgcc/config/arm/eabi-sync.c
new file mode 100644
index 0000000..bffdd4a
--- /dev/null
+++ b/libgcc/config/arm/eabi-sync.c
@@ -0,0 +1,39 @@
+/* Copyright (C) 2020 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+void __attribute__ ((weak))
+__sync_synchronize (void)
+{
+#if defined (__ARM_ARCH_6__) || defined (__ARM_ARCH_6J__)       \
+    || defined (__ARM_ARCH_6K__) || defined (__ARM_ARCH_6T2__)  \
+    || defined (__ARM_ARCH_6Z__) || defined (__ARM_ARCH_6ZK__)  \
+    || defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__)
+#if defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__)
+    __asm __volatile ("dmb" : : : "memory");
+#else
+    __asm __volatile ("mcr p15, 0, r0, c7, c10, 5" : : : "memory");
+#endif
+#else
+    __asm __volatile ("nop" : : : "memory");
+#endif
+}
diff --git a/libgcc/config/arm/t-eabi b/libgcc/config/arm/t-eabi
new file mode 100644
index 0000000..556032f
--- /dev/null
+++ b/libgcc/config/arm/t-eabi
@@ -0,0 +1 @@
+LIB2ADD_ST += $(srcdir)/config/arm/eabi-sync.c
-- 
1.9.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PING] [PATCH] libgcc: Add a weak stub for __sync_synchronize
  2020-11-03 15:08 [PATCH] libgcc: Add a weak stub for __sync_synchronize Bernd Edlinger
@ 2020-11-17  5:43 ` Bernd Edlinger
  2020-11-17 12:44 ` Richard Earnshaw (lists)
  1 sibling, 0 replies; 17+ messages in thread
From: Bernd Edlinger @ 2020-11-17  5:43 UTC (permalink / raw)
  To: gcc-patches, Ramana Radhakrishnan, Richard Earnshaw

Ping...

I'd like to ping for this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2020-November/557886.html

Thanks
Bernd.

On 11/3/20 4:08 PM, Bernd Edlinger wrote:
> Hi,
> 
> this fixes a problem with a missing symbol __sync_synchronize
> which happens when newlib is used together with libstdc++ for
> the non-threaded simulator target arm-none-eabi.
> 
> There are several questions on stackoverflow about this issue.
> 
> I would like to add a weak symbol for this target, since this
> is only a default implementation and not meant to override a
> possibly more sophisticated synchronization function from the
> c-runtime.
> 
> 
> Regression tested successfully on arm-none-eabi with newlib-3.3.0.
> 
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libgcc: Add a weak stub for __sync_synchronize
  2020-11-03 15:08 [PATCH] libgcc: Add a weak stub for __sync_synchronize Bernd Edlinger
  2020-11-17  5:43 ` [PING] " Bernd Edlinger
@ 2020-11-17 12:44 ` Richard Earnshaw (lists)
  2020-11-17 15:18   ` Bernd Edlinger
  1 sibling, 1 reply; 17+ messages in thread
From: Richard Earnshaw (lists) @ 2020-11-17 12:44 UTC (permalink / raw)
  To: Bernd Edlinger, gcc-patches, Ramana Radhakrishnan

On 03/11/2020 15:08, Bernd Edlinger wrote:
> Hi,
> 
> this fixes a problem with a missing symbol __sync_synchronize
> which happens when newlib is used together with libstdc++ for
> the non-threaded simulator target arm-none-eabi.
> 
> There are several questions on stackoverflow about this issue.
> 
> I would like to add a weak symbol for this target, since this
> is only a default implementation and not meant to override a
> possibly more sophisticated synchronization function from the
> c-runtime.
> 
> 
> Regression tested successfully on arm-none-eabi with newlib-3.3.0.
> 
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.
> 

I seem to recall that this was a deliberate decision - you can't guess
this correctly, at least when trying to build portable code - you just
have to know which runtime you will be using.

I think Ramana had some changes in the works at one point to address
(some) of this, but I'm not sure what happened to them.  Ramana?


+#if defined (__ARM_ARCH_6__) || defined (__ARM_ARCH_6J__)       \
+    || defined (__ARM_ARCH_6K__) || defined (__ARM_ARCH_6T2__)  \
+    || defined (__ARM_ARCH_6Z__) || defined (__ARM_ARCH_6ZK__)  \
+    || defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__)
+#if defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__)

Ug, no!  Use the ACLE macros to avoid this sort of mess.

R.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libgcc: Add a weak stub for __sync_synchronize
  2020-11-17 12:44 ` Richard Earnshaw (lists)
@ 2020-11-17 15:18   ` Bernd Edlinger
  2020-11-17 15:41     ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 17+ messages in thread
From: Bernd Edlinger @ 2020-11-17 15:18 UTC (permalink / raw)
  To: Richard Earnshaw (lists), gcc-patches, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 2842 bytes --]

On 11/17/20 1:44 PM, Richard Earnshaw (lists) wrote:
> On 03/11/2020 15:08, Bernd Edlinger wrote:
>> Hi,
>>
>> this fixes a problem with a missing symbol __sync_synchronize
>> which happens when newlib is used together with libstdc++ for
>> the non-threaded simulator target arm-none-eabi.
>>
>> There are several questions on stackoverflow about this issue.
>>
>> I would like to add a weak symbol for this target, since this
>> is only a default implementation and not meant to override a
>> possibly more sophisticated synchronization function from the
>> c-runtime.
>>
>>
>> Regression tested successfully on arm-none-eabi with newlib-3.3.0.
>>
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>>
> 
> I seem to recall that this was a deliberate decision - you can't guess
> this correctly, at least when trying to build portable code - you just
> have to know which runtime you will be using.
> 

Therefore I suggest to use the weak attribute.  It is on purpose not
implementing all of the atomics.

The use case, is a C++ program which initializes a local static variable.

$ cat test.cc
#include <string>
main(int argc, char **argv)
{
  static std::string x = "test";
  return 0;
}

compiles to this:
        sub     sp, sp, #20
        str     r0, [fp, #-24]
        str     r1, [fp, #-28]
        ldr     r3, .L14
        ldrb    r4, [r3]
        bl      __sync_synchronize
        and     r3, r4, #255
        and     r3, r3, #1
        cmp     r3, #0
        moveq   r3, #1
        movne   r3, #0
        and     r3, r3, #255
        cmp     r3, #0
        beq     .L8
        ldr     r0, .L14
        bl      __cxa_guard_acquire
        mov     r3, r0

so __sync_synchronize is not defined in newlib since the target (arm-sim)
is known to be not multi-threaded,
but __cxa_guard_acquire is also not a thread safe function,
because __GTHREADS is not defined by libgcc, since it is known
at configure time, that the target does not support threads.
So libstdc++ does not try to use a mutex or any atomics either,
because it is not compiled with __GTHREADS.

I can further narrow down the patch by only defining this function when
__GTHREADS is not defined, to make it more clear.


> I think Ramana had some changes in the works at one point to address
> (some) of this, but I'm not sure what happened to them.  Ramana?
> 
> 
> +#if defined (__ARM_ARCH_6__) || defined (__ARM_ARCH_6J__)       \
> +    || defined (__ARM_ARCH_6K__) || defined (__ARM_ARCH_6T2__)  \
> +    || defined (__ARM_ARCH_6Z__) || defined (__ARM_ARCH_6ZK__)  \
> +    || defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__)
> +#if defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__)
> 
> Ug, no!  Use the ACLE macros to avoid this sort of mess.
> 

Ah, thanks, copy-paste from freebsd-atomic.c :)


I've attached the updated patch.
Is it OK?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-libgcc-Add-a-weak-stub-for-__sync_synchronize.patch --]
[-- Type: text/x-patch; name="0001-libgcc-Add-a-weak-stub-for-__sync_synchronize.patch", Size: 3087 bytes --]

From ca44e1fcb4b991306cbcde6293d20c77ce74ad68 Mon Sep 17 00:00:00 2001
From: Bernd Edlinger <bernd.edlinger@hotmail.de>
Date: Mon, 2 Nov 2020 11:43:44 +0100
Subject: [PATCH] libgcc: Add a weak stub for __sync_synchronize

This patch adds a default implementation for __sync_synchronize,
which prevents many unresolved symbol errors on arm-none-eabi.
This happens often in C++ programs even without any threading.

libgcc:
2020-11-17  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* config.host: Use t-eabi for arm-none-eabi.
	* config/arm/t-eabi: New.
	* config/arm/eabi-sync.c: New.
---
 libgcc/config.host            |  2 +-
 libgcc/config/arm/eabi-sync.c | 38 ++++++++++++++++++++++++++++++++++++++
 libgcc/config/arm/t-eabi      |  1 +
 3 files changed, 40 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/arm/eabi-sync.c
 create mode 100644 libgcc/config/arm/t-eabi

diff --git a/libgcc/config.host b/libgcc/config.host
index 66af834..eaf74f1 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -496,7 +496,7 @@ arm*-*-eabi* | arm*-*-symbianelf* | arm*-*-rtems*)
 	tm_file="$tm_file arm/bpabi-lib.h"
 	case ${host} in
 	arm*-*-eabi* | arm*-*-rtems*)
-	  tmake_file="${tmake_file} arm/t-bpabi t-crtfm"
+	  tmake_file="${tmake_file} arm/t-bpabi t-crtfm arm/t-eabi"
 	  extra_parts="crtbegin.o crtend.o crti.o crtn.o"
 	  ;;
 	arm*-*-symbianelf*)
diff --git a/libgcc/config/arm/eabi-sync.c b/libgcc/config/arm/eabi-sync.c
new file mode 100644
index 0000000..c37bacf
--- /dev/null
+++ b/libgcc/config/arm/eabi-sync.c
@@ -0,0 +1,38 @@
+/* Copyright (C) 2020 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+#include "gthr.h"
+
+#ifndef __GTHREADS
+void __attribute__ ((weak))
+__sync_synchronize (void)
+{
+#if __ARM_ARCH >= 7
+    __asm __volatile ("dmb" : : : "memory");
+#elif ARM_ARCH == 6
+    __asm __volatile ("mcr p15, 0, r0, c7, c10, 5" : : : "memory");
+#else
+    __asm __volatile ("nop" : : : "memory");
+#endif
+}
+#endif
diff --git a/libgcc/config/arm/t-eabi b/libgcc/config/arm/t-eabi
new file mode 100644
index 0000000..556032f
--- /dev/null
+++ b/libgcc/config/arm/t-eabi
@@ -0,0 +1 @@
+LIB2ADD_ST += $(srcdir)/config/arm/eabi-sync.c
-- 
1.9.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libgcc: Add a weak stub for __sync_synchronize
  2020-11-17 15:18   ` Bernd Edlinger
@ 2020-11-17 15:41     ` Richard Earnshaw (lists)
  2020-11-17 15:51       ` Christophe Lyon
                         ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Richard Earnshaw (lists) @ 2020-11-17 15:41 UTC (permalink / raw)
  To: Bernd Edlinger, gcc-patches, Ramana Radhakrishnan

On 17/11/2020 15:18, Bernd Edlinger wrote:
> On 11/17/20 1:44 PM, Richard Earnshaw (lists) wrote:
>> On 03/11/2020 15:08, Bernd Edlinger wrote:
>>> Hi,
>>>
>>> this fixes a problem with a missing symbol __sync_synchronize
>>> which happens when newlib is used together with libstdc++ for
>>> the non-threaded simulator target arm-none-eabi.
>>>
>>> There are several questions on stackoverflow about this issue.
>>>
>>> I would like to add a weak symbol for this target, since this
>>> is only a default implementation and not meant to override a
>>> possibly more sophisticated synchronization function from the
>>> c-runtime.
>>>
>>>
>>> Regression tested successfully on arm-none-eabi with newlib-3.3.0.
>>>
>>> Is it OK for trunk?
>>>
>>>
>>> Thanks
>>> Bernd.
>>>
>>
>> I seem to recall that this was a deliberate decision - you can't guess
>> this correctly, at least when trying to build portable code - you just
>> have to know which runtime you will be using.
>>
> 
> Therefore I suggest to use the weak attribute.  It is on purpose not
> implementing all of the atomics.
> 
> The use case, is a C++ program which initializes a local static variable.
> 
> $ cat test.cc
> #include <string>
> main(int argc, char **argv)
> {
>   static std::string x = "test";
>   return 0;
> }
> 
> compiles to this:
>         sub     sp, sp, #20
>         str     r0, [fp, #-24]
>         str     r1, [fp, #-28]
>         ldr     r3, .L14
>         ldrb    r4, [r3]
>         bl      __sync_synchronize
>         and     r3, r4, #255
>         and     r3, r3, #1
>         cmp     r3, #0
>         moveq   r3, #1
>         movne   r3, #0
>         and     r3, r3, #255
>         cmp     r3, #0
>         beq     .L8
>         ldr     r0, .L14
>         bl      __cxa_guard_acquire
>         mov     r3, r0
> 
> so __sync_synchronize is not defined in newlib since the target (arm-sim)
> is known to be not multi-threaded,
> but __cxa_guard_acquire is also not a thread safe function,
> because __GTHREADS is not defined by libgcc, since it is known
> at configure time, that the target does not support threads.
> So libstdc++ does not try to use a mutex or any atomics either,
> because it is not compiled with __GTHREADS.
> 
> I can further narrow down the patch by only defining this function when
> __GTHREADS is not defined, to make it more clear.
> 
> 
>> I think Ramana had some changes in the works at one point to address
>> (some) of this, but I'm not sure what happened to them.  Ramana?
>>
>>
>> +#if defined (__ARM_ARCH_6__) || defined (__ARM_ARCH_6J__)       \
>> +    || defined (__ARM_ARCH_6K__) || defined (__ARM_ARCH_6T2__)  \
>> +    || defined (__ARM_ARCH_6Z__) || defined (__ARM_ARCH_6ZK__)  \
>> +    || defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__)
>> +#if defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__)
>>
>> Ug, no!  Use the ACLE macros to avoid this sort of mess.
>>
> 
> Ah, thanks, copy-paste from freebsd-atomic.c :)
> 
> 
> I've attached the updated patch.
> Is it OK?
> 
> 
> Thanks
> Bernd.
> 

libgcc is *still* the wrong place for this.  It belongs in the system
library (eg newlib, or glibc, or whatever), which knows about the system
it's running on.  (Sorry, I should have said this before, but I've
context-switched this out since it's been a long time since it came up).

This hack will just lead to silent code failure of the worst kind
(non-reproducable, racy) at runtime.

R.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libgcc: Add a weak stub for __sync_synchronize
  2020-11-17 15:41     ` Richard Earnshaw (lists)
@ 2020-11-17 15:51       ` Christophe Lyon
  2020-11-17 17:17       ` Bernd Edlinger
  2020-11-22  8:05       ` [PATCH] Avoid atomic for guard acquire when that is expensive Bernd Edlinger
  2 siblings, 0 replies; 17+ messages in thread
From: Christophe Lyon @ 2020-11-17 15:51 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Bernd Edlinger, gcc-patches, Ramana Radhakrishnan

On Tue, 17 Nov 2020 at 16:41, Richard Earnshaw (lists) via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> On 17/11/2020 15:18, Bernd Edlinger wrote:
> > On 11/17/20 1:44 PM, Richard Earnshaw (lists) wrote:
> >> On 03/11/2020 15:08, Bernd Edlinger wrote:
> >>> Hi,
> >>>
> >>> this fixes a problem with a missing symbol __sync_synchronize
> >>> which happens when newlib is used together with libstdc++ for
> >>> the non-threaded simulator target arm-none-eabi.
> >>>
> >>> There are several questions on stackoverflow about this issue.
> >>>
> >>> I would like to add a weak symbol for this target, since this
> >>> is only a default implementation and not meant to override a
> >>> possibly more sophisticated synchronization function from the
> >>> c-runtime.
> >>>
> >>>
> >>> Regression tested successfully on arm-none-eabi with newlib-3.3.0.
> >>>
> >>> Is it OK for trunk?
> >>>
> >>>
> >>> Thanks
> >>> Bernd.
> >>>
> >>
> >> I seem to recall that this was a deliberate decision - you can't guess
> >> this correctly, at least when trying to build portable code - you just
> >> have to know which runtime you will be using.
> >>
> >
> > Therefore I suggest to use the weak attribute.  It is on purpose not
> > implementing all of the atomics.
> >
> > The use case, is a C++ program which initializes a local static variable.
> >
> > $ cat test.cc
> > #include <string>
> > main(int argc, char **argv)
> > {
> >   static std::string x = "test";
> >   return 0;
> > }
> >
> > compiles to this:
> >         sub     sp, sp, #20
> >         str     r0, [fp, #-24]
> >         str     r1, [fp, #-28]
> >         ldr     r3, .L14
> >         ldrb    r4, [r3]
> >         bl      __sync_synchronize
> >         and     r3, r4, #255
> >         and     r3, r3, #1
> >         cmp     r3, #0
> >         moveq   r3, #1
> >         movne   r3, #0
> >         and     r3, r3, #255
> >         cmp     r3, #0
> >         beq     .L8
> >         ldr     r0, .L14
> >         bl      __cxa_guard_acquire
> >         mov     r3, r0
> >
> > so __sync_synchronize is not defined in newlib since the target (arm-sim)
> > is known to be not multi-threaded,
> > but __cxa_guard_acquire is also not a thread safe function,
> > because __GTHREADS is not defined by libgcc, since it is known
> > at configure time, that the target does not support threads.
> > So libstdc++ does not try to use a mutex or any atomics either,
> > because it is not compiled with __GTHREADS.
> >
> > I can further narrow down the patch by only defining this function when
> > __GTHREADS is not defined, to make it more clear.
> >
> >
> >> I think Ramana had some changes in the works at one point to address
> >> (some) of this, but I'm not sure what happened to them.  Ramana?
> >>
> >>
> >> +#if defined (__ARM_ARCH_6__) || defined (__ARM_ARCH_6J__)       \
> >> +    || defined (__ARM_ARCH_6K__) || defined (__ARM_ARCH_6T2__)  \
> >> +    || defined (__ARM_ARCH_6Z__) || defined (__ARM_ARCH_6ZK__)  \
> >> +    || defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__)
> >> +#if defined (__ARM_ARCH_7__) || defined (__ARM_ARCH_7A__)
> >>
> >> Ug, no!  Use the ACLE macros to avoid this sort of mess.
> >>
> >
> > Ah, thanks, copy-paste from freebsd-atomic.c :)
> >
> >
> > I've attached the updated patch.
> > Is it OK?
> >
> >
> > Thanks
> > Bernd.
> >
>
> libgcc is *still* the wrong place for this.  It belongs in the system
> library (eg newlib, or glibc, or whatever), which knows about the system
> it's running on.  (Sorry, I should have said this before, but I've
> context-switched this out since it's been a long time since it came up).
>
> This hack will just lead to silent code failure of the worst kind
> (non-reproducable, racy) at runtime.
>

I haven't fully re-read the thread, but I think this is related to an
old discussion,
not very well archived in:
https://gcc.gnu.org/pipermail/gcc-patches/2016-November/462299.html

There's a pointer to a newlib patch from Ramana.

> R.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] libgcc: Add a weak stub for __sync_synchronize
  2020-11-17 15:41     ` Richard Earnshaw (lists)
  2020-11-17 15:51       ` Christophe Lyon
@ 2020-11-17 17:17       ` Bernd Edlinger
  2020-11-22  8:05       ` [PATCH] Avoid atomic for guard acquire when that is expensive Bernd Edlinger
  2 siblings, 0 replies; 17+ messages in thread
From: Bernd Edlinger @ 2020-11-17 17:17 UTC (permalink / raw)
  To: Richard Earnshaw (lists), gcc-patches, Ramana Radhakrishnan

On 11/17/20 4:41 PM, Richard Earnshaw (lists) wrote:
> 
> libgcc is *still* the wrong place for this.  It belongs in the system
> library (eg newlib, or glibc, or whatever), which knows about the system
> it's running on.  (Sorry, I should have said this before, but I've
> context-switched this out since it's been a long time since it came up).
> 

No problem.  I just saw it from the other end.

It is odd that this problem does not go away even if gcc is configured
with --disable-threads, which should be the default for arm-none-eabi
anyway.

If we assume a threaded environment then it is still libgcc
which does not define __GTHREADS in libgcc/gthr.h, and libstdc++'s
__cxa_guard_acquire is not making use of functions like __gthread_mutex_lock.
But that appears to be this way by design.

Of course the race is not fixed if you ask newlib to implement just this
__sync_synchronize function.

> This hack will just lead to silent code failure of the worst kind
> (non-reproducable, racy) at runtime.
> 

So in a arm-none-eabi system with armv6 or higher where the intrinsic
__sync_synchronize is not a library call but an instruction
we have exactly this worst kind scenario, already.

It is however possible that the default of -fthreadsafe_statics
is inappropriate for --disable-threads ?


Bernd.


> R.
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] Avoid atomic for guard acquire when that is expensive
  2020-11-17 15:41     ` Richard Earnshaw (lists)
  2020-11-17 15:51       ` Christophe Lyon
  2020-11-17 17:17       ` Bernd Edlinger
@ 2020-11-22  8:05       ` Bernd Edlinger
  2020-11-24 22:10         ` Jason Merrill
  2020-11-30 20:08         ` [PING] " Bernd Edlinger
  2 siblings, 2 replies; 17+ messages in thread
From: Bernd Edlinger @ 2020-11-22  8:05 UTC (permalink / raw)
  To: Richard Earnshaw (lists),
	gcc-patches, Ramana Radhakrishnan, Jason Merrill, Nathan Sidwell,
	Christophe Lyon

[-- Attachment #1: Type: text/plain, Size: 755 bytes --]

Hi,

this avoids the need to use -fno-threadsafe-statics on
arm-none-eabi or working around that problem by supplying
a dummy __sync_synchronize function which might
just lead to silent code failure of the worst kind
(non-reproducable, racy) at runtime, as was pointed out
on previous discussions here.

When the atomic access involves a call to __sync_synchronize
it is better to call __cxa_guard_acquire unconditionally,
since it handles the atomics too, or is a non-threaded
implementation when there is no gthread support for this target.

This fixes also a bug for the ARM EABI big-endian target,
that is, previously the wrong bit was checked.


Regression tested successfully on arm-none-eabi with newlib-3.3.0.

Is it OK for trunk?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Avoid-atomic-for-guard-acquire-when-that-is-expensiv.patch --]
[-- Type: text/x-patch; name="0001-Avoid-atomic-for-guard-acquire-when-that-is-expensiv.patch", Size: 5984 bytes --]

From 9fd070407408cf10789f5e9eb5ddda2fb3798e6f Mon Sep 17 00:00:00 2001
From: Bernd Edlinger <bernd.edlinger@hotmail.de>
Date: Sun, 22 Nov 2020 08:11:14 +0100
Subject: [PATCH] Avoid atomic for guard acquire when that is expensive

When the atomic access involves a call to __sync_synchronize
it is better to call __cxa_guard_acquire unconditionally,
since it handles the atomics too, or is a non-threaded
implementation when there is no gthread support for this target.

This fixes also a bug for the ARM EABI big-endian target,
that is, previously the wrong bit was checked.

gcc:
2020-11-22  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* target.def (guard_atomic_expensive): New hook.
	* doc/tm.texi: Document TARGET_CXX_GUARD_ATOMIC_EXPENSIVE.
	* doc/tm.texi.in: Likewise.
	* config/arm/arm.c (arm_cxx_guard_atomic_expensive): New callback.

gcc/cp:
2020-11-22  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* decl2.c: (build_atomic_load_byte): Rename to...
	(build_atomic_load_type): ... and add new parameter type.
	(get_guard_cond): Skip the atomic here if that is expensive.
	Use the correct type for the atomic load on certain targets.
---
 gcc/config/arm/arm.c | 13 +++++++++++++
 gcc/cp/decl2.c       | 12 ++++++++----
 gcc/doc/tm.texi      |  7 +++++++
 gcc/doc/tm.texi.in   |  2 ++
 gcc/target.def       | 10 ++++++++++
 5 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 04190b1..04ca1fe 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -235,6 +235,7 @@ static rtx arm_dwarf_register_span (rtx);
 
 static tree arm_cxx_guard_type (void);
 static bool arm_cxx_guard_mask_bit (void);
+static bool arm_cxx_guard_atomic_expensive (void);
 static tree arm_get_cookie_size (tree);
 static bool arm_cookie_has_size (void);
 static bool arm_cxx_cdtor_returns_this (void);
@@ -593,6 +594,9 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_CXX_GUARD_MASK_BIT
 #define TARGET_CXX_GUARD_MASK_BIT arm_cxx_guard_mask_bit
 
+#undef TARGET_CXX_GUARD_ATOMIC_EXPENSIVE
+#define TARGET_CXX_GUARD_ATOMIC_EXPENSIVE arm_cxx_guard_atomic_expensive
+
 #undef TARGET_CXX_GET_COOKIE_SIZE
 #define TARGET_CXX_GET_COOKIE_SIZE arm_get_cookie_size
 
@@ -28882,6 +28886,15 @@ arm_cxx_guard_mask_bit (void)
 }
 
 
+/* Return true if atomics involve a call to __sync_synchronize.  */
+
+static bool
+arm_cxx_guard_atomic_expensive (void)
+{
+  return TARGET_AAPCS_BASED && !TARGET_HAVE_DMB && !TARGET_HAVE_DMB_MCR;
+}
+
+
 /* The EABI specifies that all array cookies are 8 bytes long.  */
 
 static tree
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 1bc7b7e..e2b29a6 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -3300,15 +3300,15 @@ get_guard (tree decl)
 /* Return an atomic load of src with the appropriate memory model.  */
 
 static tree
-build_atomic_load_byte (tree src, HOST_WIDE_INT model)
+build_atomic_load_type (tree src, HOST_WIDE_INT model, tree type)
 {
-  tree ptr_type = build_pointer_type (char_type_node);
+  tree ptr_type = build_pointer_type (type);
   tree mem_model = build_int_cst (integer_type_node, model);
   tree t, addr, val;
   unsigned int size;
   int fncode;
 
-  size = tree_to_uhwi (TYPE_SIZE_UNIT (char_type_node));
+  size = tree_to_uhwi (TYPE_SIZE_UNIT (type));
 
   fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1;
   t = builtin_decl_implicit ((enum built_in_function) fncode);
@@ -3350,8 +3350,12 @@ get_guard_cond (tree guard, bool thread_safe)
 
   if (!thread_safe)
     guard = get_guard_bits (guard);
+  else if (targetm.cxx.guard_atomic_expensive ())
+    guard = integer_zero_node;
   else
-    guard = build_atomic_load_byte (guard, MEMMODEL_ACQUIRE);
+    guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE,
+				    targetm.cxx.guard_mask_bit ()
+				    ? TREE_TYPE (guard) : char_type_node);
 
   /* Mask off all but the low bit.  */
   if (targetm.cxx.guard_mask_bit ())
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 2b88f78..92215cf 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -10728,6 +10728,13 @@ This hook determines how guard variables are used.  It should return
 @code{true} indicates that only the least significant bit should be used.
 @end deftypefn
 
+@deftypefn {Target Hook} bool TARGET_CXX_GUARD_ATOMIC_EXPENSIVE (void)
+This hook determines if the guard atomic is expensive.  It should return
+@code{false} (the default) if the atomic is inexpensive.  A return value of
+@code{true} indicates that the atomic is expensive i.e., involves a call to
+__sync_synchronize.  In this case let __cxa_guard_acquire handle the atomics.
+@end deftypefn
+
 @deftypefn {Target Hook} tree TARGET_CXX_GET_COOKIE_SIZE (tree @var{type})
 This hook returns the size of the cookie to use when allocating an array
 whose elements have the indicated @var{type}.  Assumes that it is already
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 897f289..ce1d837 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -7321,6 +7321,8 @@ floating-point support; they are not included in this mechanism.
 
 @hook TARGET_CXX_GUARD_MASK_BIT
 
+@hook TARGET_CXX_GUARD_ATOMIC_EXPENSIVE
+
 @hook TARGET_CXX_GET_COOKIE_SIZE
 
 @hook TARGET_CXX_COOKIE_HAS_SIZE
diff --git a/gcc/target.def b/gcc/target.def
index 810d554..0c02d5c 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -6160,6 +6160,16 @@ DEFHOOK
  bool, (void),
  hook_bool_void_false)
 
+/* Return true if the guard atomic is expensive.  */
+DEFHOOK
+(guard_atomic_expensive,
+ "This hook determines if the guard atomic is expensive.  It should return\n\
+@code{false} (the default) if the atomic is inexpensive.  A return value of\n\
+@code{true} indicates that the atomic is expensive i.e., involves a call to\n\
+__sync_synchronize.  In this case let __cxa_guard_acquire handle the atomics.",
+ bool, (void),
+ hook_bool_void_false)
+
 /* Returns the size of the array cookie for an array of type.  */
 DEFHOOK
 (get_cookie_size,
-- 
1.9.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Avoid atomic for guard acquire when that is expensive
  2020-11-22  8:05       ` [PATCH] Avoid atomic for guard acquire when that is expensive Bernd Edlinger
@ 2020-11-24 22:10         ` Jason Merrill
  2020-12-01 18:28           ` Bernd Edlinger
  2020-11-30 20:08         ` [PING] " Bernd Edlinger
  1 sibling, 1 reply; 17+ messages in thread
From: Jason Merrill @ 2020-11-24 22:10 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Earnshaw (lists),
	gcc-patches, Ramana Radhakrishnan, Nathan Sidwell,
	Christophe Lyon

On 11/22/20 3:05 AM, Bernd Edlinger wrote:
> Hi,
> 
> this avoids the need to use -fno-threadsafe-statics on
> arm-none-eabi or working around that problem by supplying
> a dummy __sync_synchronize function which might
> just lead to silent code failure of the worst kind
> (non-reproducable, racy) at runtime, as was pointed out
> on previous discussions here.
> 
> When the atomic access involves a call to __sync_synchronize
> it is better to call __cxa_guard_acquire unconditionally,
> since it handles the atomics too, or is a non-threaded
> implementation when there is no gthread support for this target.
> 
> This fixes also a bug for the ARM EABI big-endian target,
> that is, previously the wrong bit was checked.

Instead of a new target macro, can't you follow 
fold_builtin_atomic_always_lock_free/can_atomic_load_p?

Jason


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PING] [PATCH] Avoid atomic for guard acquire when that is expensive
  2020-11-22  8:05       ` [PATCH] Avoid atomic for guard acquire when that is expensive Bernd Edlinger
  2020-11-24 22:10         ` Jason Merrill
@ 2020-11-30 20:08         ` Bernd Edlinger
  2020-11-30 20:54           ` Jason Merrill
  1 sibling, 1 reply; 17+ messages in thread
From: Bernd Edlinger @ 2020-11-30 20:08 UTC (permalink / raw)
  To: Richard Earnshaw (lists),
	gcc-patches, Ramana Radhakrishnan, Jason Merrill, Nathan Sidwell,
	Christophe Lyon

Hi,

I'd like to ping for this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559882.html

Thanks
Bernd.

On 11/22/20 9:05 AM, Bernd Edlinger wrote:
> Hi,
> 
> this avoids the need to use -fno-threadsafe-statics on
> arm-none-eabi or working around that problem by supplying
> a dummy __sync_synchronize function which might
> just lead to silent code failure of the worst kind
> (non-reproducable, racy) at runtime, as was pointed out
> on previous discussions here.
> 
> When the atomic access involves a call to __sync_synchronize
> it is better to call __cxa_guard_acquire unconditionally,
> since it handles the atomics too, or is a non-threaded
> implementation when there is no gthread support for this target.
> 
> This fixes also a bug for the ARM EABI big-endian target,
> that is, previously the wrong bit was checked.
> 
> 
> Regression tested successfully on arm-none-eabi with newlib-3.3.0.
> 
> Is it OK for trunk?
> 
> 
> Thanks
> Bernd.
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PING] [PATCH] Avoid atomic for guard acquire when that is expensive
  2020-11-30 20:08         ` [PING] " Bernd Edlinger
@ 2020-11-30 20:54           ` Jason Merrill
  0 siblings, 0 replies; 17+ messages in thread
From: Jason Merrill @ 2020-11-30 20:54 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Earnshaw (lists),
	gcc-patches, Ramana Radhakrishnan, Nathan Sidwell,
	Christophe Lyon

On 11/30/20 3:08 PM, Bernd Edlinger wrote:
> Hi,
> 
> I'd like to ping for this patch:

I reviewed it on the 24th:

https://gcc.gnu.org/pipermail/gcc-patches/2020-November/560118.html

> 
> https://gcc.gnu.org/pipermail/gcc-patches/2020-November/559882.html
> 
> Thanks
> Bernd.
> 
> On 11/22/20 9:05 AM, Bernd Edlinger wrote:
>> Hi,
>>
>> this avoids the need to use -fno-threadsafe-statics on
>> arm-none-eabi or working around that problem by supplying
>> a dummy __sync_synchronize function which might
>> just lead to silent code failure of the worst kind
>> (non-reproducable, racy) at runtime, as was pointed out
>> on previous discussions here.
>>
>> When the atomic access involves a call to __sync_synchronize
>> it is better to call __cxa_guard_acquire unconditionally,
>> since it handles the atomics too, or is a non-threaded
>> implementation when there is no gthread support for this target.
>>
>> This fixes also a bug for the ARM EABI big-endian target,
>> that is, previously the wrong bit was checked.
>>
>>
>> Regression tested successfully on arm-none-eabi with newlib-3.3.0.
>>
>> Is it OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>>
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Avoid atomic for guard acquire when that is expensive
  2020-11-24 22:10         ` Jason Merrill
@ 2020-12-01 18:28           ` Bernd Edlinger
  2020-12-02 18:57             ` Jason Merrill
  0 siblings, 1 reply; 17+ messages in thread
From: Bernd Edlinger @ 2020-12-01 18:28 UTC (permalink / raw)
  To: Jason Merrill, Richard Earnshaw (lists),
	gcc-patches, Ramana Radhakrishnan, Nathan Sidwell,
	Christophe Lyon

[-- Attachment #1: Type: text/plain, Size: 974 bytes --]

On 11/24/20 11:10 PM, Jason Merrill wrote:
> On 11/22/20 3:05 AM, Bernd Edlinger wrote:
>> Hi,
>>
>> this avoids the need to use -fno-threadsafe-statics on
>> arm-none-eabi or working around that problem by supplying
>> a dummy __sync_synchronize function which might
>> just lead to silent code failure of the worst kind
>> (non-reproducable, racy) at runtime, as was pointed out
>> on previous discussions here.
>>
>> When the atomic access involves a call to __sync_synchronize
>> it is better to call __cxa_guard_acquire unconditionally,
>> since it handles the atomics too, or is a non-threaded
>> implementation when there is no gthread support for this target.
>>
>> This fixes also a bug for the ARM EABI big-endian target,
>> that is, previously the wrong bit was checked.
> 
> Instead of a new target macro, can't you follow fold_builtin_atomic_always_lock_free/can_atomic_load_p?
> 

Yes, thanks, that should work too.
Would you like this better?


Thanks
Bernd.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Avoid-atomic-for-guard-acquire-when-that-is-expensiv.patch --]
[-- Type: text/x-patch; name="0001-Avoid-atomic-for-guard-acquire-when-that-is-expensiv.patch", Size: 3074 bytes --]

From 863b023786f7e8321979bcdb33c027d37d5e155b Mon Sep 17 00:00:00 2001
From: Bernd Edlinger <bernd.edlinger@hotmail.de>
Date: Tue, 1 Dec 2020 18:54:48 +0100
Subject: [PATCH] Avoid atomic for guard acquire when that is expensive

When the atomic access involves a call to __sync_synchronize
it is better to call __cxa_guard_acquire unconditionally,
since it handles the atomics too, or is a non-threaded
implementation when there is no gthread support for this target.

This fixes also a bug for the ARM EABI big-endian target,
that is, previously the wrong bit was checked.

2020-11-22  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* decl2.c: (is_atomic_expensive_p): New helper function.
	(build_atomic_load_byte): Rename to...
	(build_atomic_load_type): ... and add new parameter type.
	(get_guard_cond): Skip the atomic here if that is expensive.
	Use the correct type for the atomic load on certain targets.
---
 gcc/cp/decl2.c | 33 +++++++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 1bc7b7e..5001e8f 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "c-family/c-ada-spec.h"
 #include "asan.h"
+#include "optabs-query.h"
 
 /* Id for dumping the raw trees.  */
 int raw_dump_id;
@@ -3297,18 +3298,34 @@ get_guard (tree decl)
   return guard;
 }
 
+/* Returns true if accessing the GUARD atomic is expensive,
+   i.e. involves a call to __sync_synchronize or similar.
+   In this case let __cxa_guard_acquire handle the atomics.  */
+
+static bool
+is_atomic_expensive_p (machine_mode mode)
+{
+  if (!flag_inline_atomics)
+    return false;
+
+  if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode))
+    return false;
+
+  return true;
+}
+
 /* Return an atomic load of src with the appropriate memory model.  */
 
 static tree
-build_atomic_load_byte (tree src, HOST_WIDE_INT model)
+build_atomic_load_type (tree src, HOST_WIDE_INT model, tree type)
 {
-  tree ptr_type = build_pointer_type (char_type_node);
+  tree ptr_type = build_pointer_type (type);
   tree mem_model = build_int_cst (integer_type_node, model);
   tree t, addr, val;
   unsigned int size;
   int fncode;
 
-  size = tree_to_uhwi (TYPE_SIZE_UNIT (char_type_node));
+  size = tree_to_uhwi (TYPE_SIZE_UNIT (type));
 
   fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1;
   t = builtin_decl_implicit ((enum built_in_function) fncode);
@@ -3351,7 +3368,15 @@ get_guard_cond (tree guard, bool thread_safe)
   if (!thread_safe)
     guard = get_guard_bits (guard);
   else
-    guard = build_atomic_load_byte (guard, MEMMODEL_ACQUIRE);
+    {
+      tree type = targetm.cxx.guard_mask_bit ()
+		  ? TREE_TYPE (guard) : char_type_node;
+
+      if (is_atomic_expensive_p (TYPE_MODE (type)))
+	guard = integer_zero_node;
+      else
+	guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type);
+    }
 
   /* Mask off all but the low bit.  */
   if (targetm.cxx.guard_mask_bit ())
-- 
1.9.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Avoid atomic for guard acquire when that is expensive
  2020-12-01 18:28           ` Bernd Edlinger
@ 2020-12-02 18:57             ` Jason Merrill
  2020-12-05 12:37               ` Bernd Edlinger
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Merrill @ 2020-12-02 18:57 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Earnshaw (lists),
	gcc-patches, Ramana Radhakrishnan, Nathan Sidwell,
	Christophe Lyon

On 12/1/20 1:28 PM, Bernd Edlinger wrote:
> On 11/24/20 11:10 PM, Jason Merrill wrote:
>> On 11/22/20 3:05 AM, Bernd Edlinger wrote:
>>> Hi,
>>>
>>> this avoids the need to use -fno-threadsafe-statics on
>>> arm-none-eabi or working around that problem by supplying
>>> a dummy __sync_synchronize function which might
>>> just lead to silent code failure of the worst kind
>>> (non-reproducable, racy) at runtime, as was pointed out
>>> on previous discussions here.
>>>
>>> When the atomic access involves a call to __sync_synchronize
>>> it is better to call __cxa_guard_acquire unconditionally,
>>> since it handles the atomics too, or is a non-threaded
>>> implementation when there is no gthread support for this target.
>>>
>>> This fixes also a bug for the ARM EABI big-endian target,
>>> that is, previously the wrong bit was checked.
>>
>> Instead of a new target macro, can't you follow fold_builtin_atomic_always_lock_free/can_atomic_load_p?
> 
> Yes, thanks, that should work too.
> Would you like this better?

> +is_atomic_expensive_p (machine_mode mode)
> +{
> +  if (!flag_inline_atomics)
> +    return false;

Why not true?

> +  if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode))
> +    return false;

This also seems backwards; I'd think we want to return false if either 
of those tests are true.  Or maybe just can_atomic_load_p, and not 
bother about compare-and-swap.

> +      tree type = targetm.cxx.guard_mask_bit ()
> +		  ? TREE_TYPE (guard) : char_type_node;
> +
> +      if (is_atomic_expensive_p (TYPE_MODE (type)))
> +	guard = integer_zero_node;
> +      else
> +	guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type);

It should still work to load a single byte, it just needs to be the 
least-significant byte.  And this isn't an EABI issue; it looks like the 
non-EABI code is also broken for big-endian targets, both the atomic 
load and the normal load in get_guard_bits.

Jason


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Avoid atomic for guard acquire when that is expensive
  2020-12-02 18:57             ` Jason Merrill
@ 2020-12-05 12:37               ` Bernd Edlinger
  2020-12-07 15:04                 ` Jason Merrill
  0 siblings, 1 reply; 17+ messages in thread
From: Bernd Edlinger @ 2020-12-05 12:37 UTC (permalink / raw)
  To: Jason Merrill, Richard Earnshaw (lists),
	gcc-patches, Ramana Radhakrishnan, Nathan Sidwell,
	Christophe Lyon

[-- Attachment #1: Type: text/plain, Size: 3096 bytes --]

On 12/2/20 7:57 PM, Jason Merrill wrote:
> On 12/1/20 1:28 PM, Bernd Edlinger wrote:
>> On 11/24/20 11:10 PM, Jason Merrill wrote:
>>> On 11/22/20 3:05 AM, Bernd Edlinger wrote:
>>>> Hi,
>>>>
>>>> this avoids the need to use -fno-threadsafe-statics on
>>>> arm-none-eabi or working around that problem by supplying
>>>> a dummy __sync_synchronize function which might
>>>> just lead to silent code failure of the worst kind
>>>> (non-reproducable, racy) at runtime, as was pointed out
>>>> on previous discussions here.
>>>>
>>>> When the atomic access involves a call to __sync_synchronize
>>>> it is better to call __cxa_guard_acquire unconditionally,
>>>> since it handles the atomics too, or is a non-threaded
>>>> implementation when there is no gthread support for this target.
>>>>
>>>> This fixes also a bug for the ARM EABI big-endian target,
>>>> that is, previously the wrong bit was checked.
>>>
>>> Instead of a new target macro, can't you follow fold_builtin_atomic_always_lock_free/can_atomic_load_p?
>>
>> Yes, thanks, that should work too.
>> Would you like this better?
> 
>> +is_atomic_expensive_p (machine_mode mode)
>> +{
>> +  if (!flag_inline_atomics)
>> +    return false;
> 
> Why not true?
> 

Ooops...
Yes, I ought to return true here.
I must have made a mistake when I tested the last version of this patch,
sorry for the confusion.

>> +  if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode))
>> +    return false;
> 
> This also seems backwards; I'd think we want to return false if either of those tests are true.  Or maybe just can_atomic_load_p, and not bother about compare-and-swap.
> 


Yes, you are right.
Unfortuately can_atomic_load_p is too weak, since it does not cover
the memory barrier.

And can_compare_and_swap_p (..., false) is actually a bit too strong,
but if it returns true, we should be able to use any atomic without
need for a library call.

>> +      tree type = targetm.cxx.guard_mask_bit ()
>> +          ? TREE_TYPE (guard) : char_type_node;
>> +
>> +      if (is_atomic_expensive_p (TYPE_MODE (type)))
>> +    guard = integer_zero_node;
>> +      else
>> +    guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type);
> 
> It should still work to load a single byte, it just needs to be the least-significant byte.  And this isn't an EABI issue; it looks like the non-EABI code is also broken for big-endian targets, both the atomic load and the normal load in get_guard_bits.
> 

I think the non-EABI code is always using bit 0 in the first byte,
by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1).

Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise.

For all other targets when _GLIBCXX_USE_FUTEX is defined,
__cxa_guard_XXX accesses the value as int* while the memory
is a 64-bit long, so I could imagine that is an aliasing violation.


But nothing that needs to be fixed immediately.


Attached is the corrected patch.

Tested again on arm-none-eabi with arm-sim.
Is it OK for trunk?

Thanks
Bernd.

> Jason
> 

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Avoid-atomic-for-guard-acquire-when-that-is-expensiv.patch --]
[-- Type: text/x-patch; name="0001-Avoid-atomic-for-guard-acquire-when-that-is-expensiv.patch", Size: 3073 bytes --]

From c1a6dcaa906113cba0dc88e36460a65aba35ec38 Mon Sep 17 00:00:00 2001
From: Bernd Edlinger <bernd.edlinger@hotmail.de>
Date: Tue, 1 Dec 2020 18:54:48 +0100
Subject: [PATCH] Avoid atomic for guard acquire when that is expensive

When the atomic access involves a call to __sync_synchronize
it is better to call __cxa_guard_acquire unconditionally,
since it handles the atomics too, or is a non-threaded
implementation when there is no gthread support for this target.

This fixes also a bug for the ARM EABI big-endian target,
that is, previously the wrong bit was checked.

2020-11-22  Bernd Edlinger  <bernd.edlinger@hotmail.de>

	* decl2.c: (is_atomic_expensive_p): New helper function.
	(build_atomic_load_byte): Rename to...
	(build_atomic_load_type): ... and add new parameter type.
	(get_guard_cond): Skip the atomic here if that is expensive.
	Use the correct type for the atomic load on certain targets.
---
 gcc/cp/decl2.c | 33 +++++++++++++++++++++++++++++----
 1 file changed, 29 insertions(+), 4 deletions(-)

diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index 1bc7b7e..c7ddcc9 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "intl.h"
 #include "c-family/c-ada-spec.h"
 #include "asan.h"
+#include "optabs-query.h"
 
 /* Id for dumping the raw trees.  */
 int raw_dump_id;
@@ -3297,18 +3298,34 @@ get_guard (tree decl)
   return guard;
 }
 
+/* Returns true if accessing the GUARD atomic is expensive,
+   i.e. involves a call to __sync_synchronize or similar.
+   In this case let __cxa_guard_acquire handle the atomics.  */
+
+static bool
+is_atomic_expensive_p (machine_mode mode)
+{
+  if (!flag_inline_atomics)
+    return true;
+
+  if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode))
+    return true;
+
+  return false;
+}
+
 /* Return an atomic load of src with the appropriate memory model.  */
 
 static tree
-build_atomic_load_byte (tree src, HOST_WIDE_INT model)
+build_atomic_load_type (tree src, HOST_WIDE_INT model, tree type)
 {
-  tree ptr_type = build_pointer_type (char_type_node);
+  tree ptr_type = build_pointer_type (type);
   tree mem_model = build_int_cst (integer_type_node, model);
   tree t, addr, val;
   unsigned int size;
   int fncode;
 
-  size = tree_to_uhwi (TYPE_SIZE_UNIT (char_type_node));
+  size = tree_to_uhwi (TYPE_SIZE_UNIT (type));
 
   fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1;
   t = builtin_decl_implicit ((enum built_in_function) fncode);
@@ -3351,7 +3368,15 @@ get_guard_cond (tree guard, bool thread_safe)
   if (!thread_safe)
     guard = get_guard_bits (guard);
   else
-    guard = build_atomic_load_byte (guard, MEMMODEL_ACQUIRE);
+    {
+      tree type = targetm.cxx.guard_mask_bit ()
+		  ? TREE_TYPE (guard) : char_type_node;
+
+      if (is_atomic_expensive_p (TYPE_MODE (type)))
+	guard = integer_zero_node;
+      else
+	guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type);
+    }
 
   /* Mask off all but the low bit.  */
   if (targetm.cxx.guard_mask_bit ())
-- 
1.9.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Avoid atomic for guard acquire when that is expensive
  2020-12-05 12:37               ` Bernd Edlinger
@ 2020-12-07 15:04                 ` Jason Merrill
  2020-12-07 16:17                   ` Bernd Edlinger
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Merrill @ 2020-12-07 15:04 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Earnshaw (lists),
	gcc-patches, Ramana Radhakrishnan, Nathan Sidwell,
	Christophe Lyon

On 12/5/20 7:37 AM, Bernd Edlinger wrote:
> On 12/2/20 7:57 PM, Jason Merrill wrote:
>> On 12/1/20 1:28 PM, Bernd Edlinger wrote:
>>> On 11/24/20 11:10 PM, Jason Merrill wrote:
>>>> On 11/22/20 3:05 AM, Bernd Edlinger wrote:
>>>>> Hi,
>>>>>
>>>>> this avoids the need to use -fno-threadsafe-statics on
>>>>> arm-none-eabi or working around that problem by supplying
>>>>> a dummy __sync_synchronize function which might
>>>>> just lead to silent code failure of the worst kind
>>>>> (non-reproducable, racy) at runtime, as was pointed out
>>>>> on previous discussions here.
>>>>>
>>>>> When the atomic access involves a call to __sync_synchronize
>>>>> it is better to call __cxa_guard_acquire unconditionally,
>>>>> since it handles the atomics too, or is a non-threaded
>>>>> implementation when there is no gthread support for this target.
>>>>>
>>>>> This fixes also a bug for the ARM EABI big-endian target,
>>>>> that is, previously the wrong bit was checked.
>>>>
>>>> Instead of a new target macro, can't you follow fold_builtin_atomic_always_lock_free/can_atomic_load_p?
>>>
>>> Yes, thanks, that should work too.
>>> Would you like this better?
>>
>>> +is_atomic_expensive_p (machine_mode mode)
>>> +{
>>> +  if (!flag_inline_atomics)
>>> +    return false;
>>
>> Why not true?
> 
> Ooops...
> Yes, I ought to return true here.
> I must have made a mistake when I tested the last version of this patch,
> sorry for the confusion.
> 
>>> +  if (!can_compare_and_swap_p (mode, false) || !can_atomic_load_p (mode))
>>> +    return false;
>>
>> This also seems backwards; I'd think we want to return false if either of those tests are true.  Or maybe just can_atomic_load_p, and not bother about compare-and-swap.
> 
> Yes, you are right.
> Unfortuately can_atomic_load_p is too weak, since it does not cover
> the memory barrier.
> 
> And can_compare_and_swap_p (..., false) is actually a bit too strong,
> but if it returns true, we should be able to use any atomic without
> need for a library call.

>>> +      tree type = targetm.cxx.guard_mask_bit ()
>>> +          ? TREE_TYPE (guard) : char_type_node;
>>> +
>>> +      if (is_atomic_expensive_p (TYPE_MODE (type)))
>>> +    guard = integer_zero_node;
>>> +      else
>>> +    guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type);
>>
>> It should still work to load a single byte, it just needs to be the least-significant byte.

I still don't think you need to load the whole word to check the guard bit.

> And this isn't an EABI issue; it looks like the non-EABI code is also broken for big-endian targets, both the atomic load and the normal load in get_guard_bits.
>>
> 
> I think the non-EABI code is always using bit 0 in the first byte,
> by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1).

Except that set_guard sets the least-significant bit on all targets.

> Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise.
> 
> For all other targets when _GLIBCXX_USE_FUTEX is defined,
> __cxa_guard_XXX accesses the value as int* while the memory
> is a 64-bit long, so I could imagine that is an aliasing violation.
> 
> 
> But nothing that needs to be fixed immediately.

Agreed.

> Attached is the corrected patch.
> 
> Tested again on arm-none-eabi with arm-sim.
> Is it OK for trunk?
> 
> Thanks
> Bernd.
> 
>> Jason
>>


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Avoid atomic for guard acquire when that is expensive
  2020-12-07 15:04                 ` Jason Merrill
@ 2020-12-07 16:17                   ` Bernd Edlinger
  2020-12-08 19:50                     ` Jason Merrill
  0 siblings, 1 reply; 17+ messages in thread
From: Bernd Edlinger @ 2020-12-07 16:17 UTC (permalink / raw)
  To: Jason Merrill, Richard Earnshaw (lists),
	gcc-patches, Ramana Radhakrishnan, Nathan Sidwell,
	Christophe Lyon

On 12/7/20 4:04 PM, Jason Merrill wrote:
> On 12/5/20 7:37 AM, Bernd Edlinger wrote:
>> On 12/2/20 7:57 PM, Jason Merrill wrote:
>>> On 12/1/20 1:28 PM, Bernd Edlinger wrote:>>>> +      tree type = targetm.cxx.guard_mask_bit ()
>>>> +          ? TREE_TYPE (guard) : char_type_node;
>>>> +
>>>> +      if (is_atomic_expensive_p (TYPE_MODE (type)))
>>>> +    guard = integer_zero_node;
>>>> +      else
>>>> +    guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type);
>>>
>>> It should still work to load a single byte, it just needs to be the least-significant byte.
> 
> I still don't think you need to load the whole word to check the guard bit.
> 

Of course that is also possible.  But I would not expect an
address offset and a byte access to be cheaper than a word access.

I just saw that get_guard_bits does the same thing:
access the first byte if !targetm.cxx.guard_mask_bit ()
and access the whole word otherwise, which is only
there for ARM EABI.

>> And this isn't an EABI issue; it looks like the non-EABI code is also broken for big-endian targets, both the atomic load and the normal load in get_guard_bits.
>>>
>>
>> I think the non-EABI code is always using bit 0 in the first byte,
>> by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1).
> 
> Except that set_guard sets the least-significant bit on all targets.
> 

Hmm, as I said, get_guard_bits gets the guard as a word if !targetm.cxx.guard_mask_bit (),
and as the first byte otherwise.  Of course it could get the third byte,
if !targetm.cxx.guard_mask_bit () && BYTES_BIG_ENDIAN, but it would be more complicated
this way, wouldn't it?


Bernd.

>> Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise.
>>
>> For all other targets when _GLIBCXX_USE_FUTEX is defined,
>> __cxa_guard_XXX accesses the value as int* while the memory
>> is a 64-bit long, so I could imagine that is an aliasing violation.
>>
>>
>> But nothing that needs to be fixed immediately.
> 
> Agreed.
> 
>> Attached is the corrected patch.
>>
>> Tested again on arm-none-eabi with arm-sim.
>> Is it OK for trunk?
>>
>> Thanks
>> Bernd.
>>
>>> Jason
>>>
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Avoid atomic for guard acquire when that is expensive
  2020-12-07 16:17                   ` Bernd Edlinger
@ 2020-12-08 19:50                     ` Jason Merrill
  0 siblings, 0 replies; 17+ messages in thread
From: Jason Merrill @ 2020-12-08 19:50 UTC (permalink / raw)
  To: Bernd Edlinger, Richard Earnshaw (lists),
	gcc-patches, Ramana Radhakrishnan, Nathan Sidwell,
	Christophe Lyon

On 12/7/20 11:17 AM, Bernd Edlinger wrote:
> On 12/7/20 4:04 PM, Jason Merrill wrote:
>> On 12/5/20 7:37 AM, Bernd Edlinger wrote:
>>> On 12/2/20 7:57 PM, Jason Merrill wrote:
>>>> On 12/1/20 1:28 PM, Bernd Edlinger wrote:>>>> +      tree type = targetm.cxx.guard_mask_bit ()
>>>>> +          ? TREE_TYPE (guard) : char_type_node;
>>>>> +
>>>>> +      if (is_atomic_expensive_p (TYPE_MODE (type)))
>>>>> +    guard = integer_zero_node;
>>>>> +      else
>>>>> +    guard = build_atomic_load_type (guard, MEMMODEL_ACQUIRE, type);
>>>>
>>>> It should still work to load a single byte, it just needs to be the least-significant byte.
>>
>> I still don't think you need to load the whole word to check the guard bit.
> 
> Of course that is also possible.  But I would not expect an
> address offset and a byte access to be cheaper than a word access.

Fair point.

> I just saw that get_guard_bits does the same thing:
> access the first byte if !targetm.cxx.guard_mask_bit ()
> and access the whole word otherwise, which is only
> there for ARM EABI.

>>> And this isn't an EABI issue; it looks like the non-EABI code is also broken for big-endian targets, both the atomic load and the normal load in get_guard_bits.
>>>
>>> I think the non-EABI code is always using bit 0 in the first byte,
>>> by using the endian-neutral #define _GLIBCXX_GUARD_BIT __guard_test_bit (0, 1).
>>
>> Except that set_guard sets the least-significant bit on all targets.
> 
> Hmm, as I said, get_guard_bits gets the guard as a word if !targetm.cxx.guard_mask_bit (),
> and as the first byte otherwise.  Of course it could get the third byte,
> if !targetm.cxx.guard_mask_bit () && BYTES_BIG_ENDIAN, but it would be more complicated
> this way, wouldn't it?

Ah, yes, I was overlooking that set_guard uses get_guard_bits.

The patch is OK.

>>> Only ARM EABI uses bit 0 in byte 3 if big-endian and bit 0 in byte 0 otherwise.
>>>
>>> For all other targets when _GLIBCXX_USE_FUTEX is defined,
>>> __cxa_guard_XXX accesses the value as int* while the memory
>>> is a 64-bit long, so I could imagine that is an aliasing violation.
>>>
>>>
>>> But nothing that needs to be fixed immediately.
>>
>> Agreed.
>>
>>> Attached is the corrected patch.
>>>
>>> Tested again on arm-none-eabi with arm-sim.
>>> Is it OK for trunk?
>>>
>>> Thanks
>>> Bernd.
>>>
>>>> Jason
>>>>
>>
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-12-08 19:50 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-03 15:08 [PATCH] libgcc: Add a weak stub for __sync_synchronize Bernd Edlinger
2020-11-17  5:43 ` [PING] " Bernd Edlinger
2020-11-17 12:44 ` Richard Earnshaw (lists)
2020-11-17 15:18   ` Bernd Edlinger
2020-11-17 15:41     ` Richard Earnshaw (lists)
2020-11-17 15:51       ` Christophe Lyon
2020-11-17 17:17       ` Bernd Edlinger
2020-11-22  8:05       ` [PATCH] Avoid atomic for guard acquire when that is expensive Bernd Edlinger
2020-11-24 22:10         ` Jason Merrill
2020-12-01 18:28           ` Bernd Edlinger
2020-12-02 18:57             ` Jason Merrill
2020-12-05 12:37               ` Bernd Edlinger
2020-12-07 15:04                 ` Jason Merrill
2020-12-07 16:17                   ` Bernd Edlinger
2020-12-08 19:50                     ` Jason Merrill
2020-11-30 20:08         ` [PING] " Bernd Edlinger
2020-11-30 20:54           ` Jason Merrill

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).