[RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [RFC / CFT] PR c++/66192 -  Remove TARGET_RELAXED_ORDERING and use load acquires.
@ 2015-05-22 11:26 Ramana Radhakrishnan
  2015-05-22 11:37 ` Ramana Radhakrishnan
  0 siblings, 1 reply; 17+ messages in thread
From: Ramana Radhakrishnan @ 2015-05-22 11:26 UTC (permalink / raw)
  To: gcc-patches, jason
  Cc: David Edelsohn, wilson, Steve Ellcey, Richard Henderson

All,


	This patch removes the special casing for targets with relaxed memory 
ordering and handles guard accesses with equivalent atomic load acquire 
operations. In this process we change the algorithm to load the guard 
variable with an atomic load that has ACQUIRE semantics. I'm not 
terribly familiar with the C++ front-end so I'm not sure I've used the 
appropriate interfaces for doing something like this.

This then means that on targets which have weak memory models, the fast 
path is inlined and can directly use a load-acquire instruction where 
available (and yay! one more hook gone).

Currently bootstrapping and regression testing on AArch64 and ARM (prior 
to the commit that caused PR66241). If this goes in then I'm happy to 
withdraw part of the patches to trunk for AArch64 / ARM that defines 
TARGET_RELAXED_ORDERING and only propose those hunks to the branches.

I'd also request the other target maintainers CC'd to help by testing 
this on their platforms as I do not have access to all of them.

To help folks see the difference, this is the difference in output for a 
compiler for AArch64 built with TARGET_RELAXED_ORDERING set to true and 
this patch for the testcase below.


int* f(void) {
   static int* p = new int;
   return p;
}

-       adrp    x19, .LANCHOR0
-       add     x20, x19, :lo12:.LANCHOR0
-       mov     x0, x20
-       bl      __cxa_guard_acquire
-       cbnz    w0, .L2
-       ldr     x0, [x20, 8]
+       adrp    x20, .LANCHOR0
+       add     x19, x20, :lo12:.LANCHOR0
+       ldar    x0, [x19]
+       tbz     x0, 0, .L11
+.L9:
+       ldr     x0, [x19, 8]



regards
Ramana

2015-05-22  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

     PR c++/66192
     * config/alpha/alpha.c (TARGET_RELAXED_ORDERING): Likewise.
     * config/ia64/ia64.c (TARGET_RELAXED_ORDERING): Likewise.
     * config/rs6000/rs6000.c (TARGET_RELAXED_ORDERING): Likewise.
     * config/sparc/linux.h (SPARC_RELAXED_ORDERING): Likewise.
     * config/sparc/linux64.h (SPARC_RELAXED_ORDERING): Likewise.
     * config/sparc/sparc.c (TARGET_RELAXED_ORDERING): Likewise.
     * config/sparc/sparc.h (SPARC_RELAXED_ORDERING): Likewise.
     * doc/tm.texi: Regenerate.
     * doc/tm.texi.in (TARGET_RELAXED_ORDERING): Delete.
     * target.def (TARGET_RELAXED_ORDERING): Delete.

gcc/cp/ChangeLog:

2015-05-22  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

     PR c++/66192
     * decl.c (expand_static_init): Remove special casing for
     targets with weak memory model.
     * decl2.c (build_atomic_load): New function.
      (get_guard_cond): Adjust for atomic_load.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC / CFT] PR c++/66192 -  Remove TARGET_RELAXED_ORDERING and use load acquires.
  2015-05-22 11:26 [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires Ramana Radhakrishnan
@ 2015-05-22 11:37 ` Ramana Radhakrishnan
  2015-05-22 13:49   ` Jason Merrill
  0 siblings, 1 reply; 17+ messages in thread
From: Ramana Radhakrishnan @ 2015-05-22 11:37 UTC (permalink / raw)
  To: gcc-patches, jason
  Cc: David Edelsohn, wilson, Steve Ellcey, Richard Henderson

[-- Attachment #1: Type: text/plain, Size: 39 bytes --]

Bah ! now with patch attached.

Ramana

[-- Attachment #2: relaxed-ordering-followup.txt --]
[-- Type: text/plain, Size: 10644 bytes --]

diff --git a/gcc/config/alpha/alpha.c b/gcc/config/alpha/alpha.c
index 1ba99d0..857c9ac 100644
--- a/gcc/config/alpha/alpha.c
+++ b/gcc/config/alpha/alpha.c
@@ -9987,12 +9987,6 @@ alpha_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
 #undef TARGET_EXPAND_BUILTIN_VA_START
 #define TARGET_EXPAND_BUILTIN_VA_START alpha_va_start
 
-/* The Alpha architecture does not require sequential consistency.  See
-   http://www.cs.umd.edu/~pugh/java/memoryModel/AlphaReordering.html
-   for an example of how it can be violated in practice.  */
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING true
-
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE alpha_option_override
 
diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c
index c1e2ecd..45ad97a 100644
--- a/gcc/config/ia64/ia64.c
+++ b/gcc/config/ia64/ia64.c
@@ -630,11 +630,6 @@ static const struct attribute_spec ia64_attribute_table[] =
 #define TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P \
   ia64_libgcc_floating_mode_supported_p
 
-/* ia64 architecture manual 4.4.7: ... reads, writes, and flushes may occur
-   in an order different from the specified program order.  */
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING true
-
 #undef TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P ia64_legitimate_constant_p
 #undef TARGET_LEGITIMATE_ADDRESS_P
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a590ef4..ce70ca0 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1620,17 +1620,6 @@ static const struct attribute_spec rs6000_attribute_table[] =
 #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail
 #endif
 
-/* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors
-   The PowerPC architecture requires only weak consistency among
-   processors--that is, memory accesses between processors need not be
-   sequentially consistent and memory accesses among processors can occur
-   in any order. The ability to order memory accesses weakly provides
-   opportunities for more efficient use of the system bus. Unless a
-   dependency exists, the 604e allows read operations to precede store
-   operations.  */
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING true
-
 #ifdef HAVE_AS_TLS
 #undef TARGET_ASM_OUTPUT_DWARF_DTPREL
 #define TARGET_ASM_OUTPUT_DWARF_DTPREL rs6000_output_dwarf_dtprel
diff --git a/gcc/config/sparc/linux.h b/gcc/config/sparc/linux.h
index 56def4b..37f507d 100644
--- a/gcc/config/sparc/linux.h
+++ b/gcc/config/sparc/linux.h
@@ -139,12 +139,6 @@ do {									\
 /* Static stack checking is supported by means of probes.  */
 #define STACK_CHECK_STATIC_BUILTIN 1
 
-/* Linux currently uses RMO in uniprocessor mode, which is equivalent to
-   TMO, and TMO in multiprocessor mode.  But they reserve the right to
-   change their minds.  */
-#undef SPARC_RELAXED_ORDERING
-#define SPARC_RELAXED_ORDERING true
-
 #undef NEED_INDICATE_EXEC_STACK
 #define NEED_INDICATE_EXEC_STACK 1
 
diff --git a/gcc/config/sparc/linux64.h b/gcc/config/sparc/linux64.h
index fa805fd..8b4a435 100644
--- a/gcc/config/sparc/linux64.h
+++ b/gcc/config/sparc/linux64.h
@@ -253,12 +253,6 @@ do {									\
 /* Static stack checking is supported by means of probes.  */
 #define STACK_CHECK_STATIC_BUILTIN 1
 
-/* Linux currently uses RMO in uniprocessor mode, which is equivalent to
-   TMO, and TMO in multiprocessor mode.  But they reserve the right to
-   change their minds.  */
-#undef SPARC_RELAXED_ORDERING
-#define SPARC_RELAXED_ORDERING true
-
 #undef NEED_INDICATE_EXEC_STACK
 #define NEED_INDICATE_EXEC_STACK 1
 
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index a1562ad..094287f 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -808,9 +808,6 @@ char sparc_hard_reg_printed[8];
 #define TARGET_ATTRIBUTE_TABLE sparc_attribute_table
 #endif
 
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING SPARC_RELAXED_ORDERING
-
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE sparc_option_override
 
diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
index ce1b68b..fd24431 100644
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@@ -106,17 +106,6 @@ extern enum cmodel sparc_cmodel;
 
 #define SPARC_DEFAULT_CMODEL CM_32
 
-/* The SPARC-V9 architecture defines a relaxed memory ordering model (RMO)
-   which requires the following macro to be true if enabled.  Prior to V9,
-   there are no instructions to even talk about memory synchronization.
-   Note that the UltraSPARC III processors don't implement RMO, unlike the
-   UltraSPARC II processors.  Niagara, Niagara-2, and Niagara-3 do not
-   implement RMO either.
-
-   Default to false; for example, Solaris never enables RMO, only ever uses
-   total memory ordering (TMO).  */
-#define SPARC_RELAXED_ORDERING false
-
 /* Do not use the .note.GNU-stack convention by default.  */
 #define NEED_INDICATE_EXEC_STACK 0
 
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 5396994..fad5997 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -7206,7 +7206,7 @@ expand_static_init (tree decl, tree init)
 	 looks like:
 
 	   static <type> guard;
-	   if (!guard.first_byte) {
+	   if (!__atomic_load (guard.first_byte, MEMMODEL_ACQUIRE)) {
 	     if (__cxa_guard_acquire (&guard)) {
 	       bool flag = false;
 	       try {
@@ -7236,16 +7236,10 @@ expand_static_init (tree decl, tree init)
       /* Create the guard variable.  */
       guard = get_guard (decl);
 
-      /* This optimization isn't safe on targets with relaxed memory
-	 consistency.  On such targets we force synchronization in
-	 __cxa_guard_acquire.  */
-      if (!targetm.relaxed_ordering || !thread_guard)
-	{
-	  /* Begin the conditional initialization.  */
-	  if_stmt = begin_if_stmt ();
-	  finish_if_stmt_cond (get_guard_cond (guard), if_stmt);
-	  then_clause = begin_compound_stmt (BCS_NO_SCOPE);
-	}
+      /* Begin the conditional initialization.  */
+      if_stmt = begin_if_stmt ();
+      finish_if_stmt_cond (get_guard_cond (guard), if_stmt);
+      then_clause = begin_compound_stmt (BCS_NO_SCOPE);
 
       if (thread_guard)
 	{
@@ -7314,12 +7308,9 @@ expand_static_init (tree decl, tree init)
 	  finish_if_stmt (inner_if_stmt);
 	}
 
-      if (!targetm.relaxed_ordering || !thread_guard)
-	{
-	  finish_compound_stmt (then_clause);
-	  finish_then_clause (if_stmt);
-	  finish_if_stmt (if_stmt);
-	}
+      finish_compound_stmt (then_clause);
+      finish_then_clause (if_stmt);
+      finish_if_stmt (if_stmt);
     }
   else if (DECL_THREAD_LOCAL_P (decl))
     tls_aggregates = tree_cons (init, decl, tls_aggregates);
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index f1b3d0c..4997fc6 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -3056,6 +3056,32 @@ get_guard_bits (tree guard)
   return guard;
 }
 
+static tree
+build_atomic_load (tree src, HOST_WIDE_INT model)
+{
+  tree ptr_type
+    = build_pointer_type
+      (build_qualified_type (void_type_node, TYPE_QUAL_VOLATILE));
+  tree mem_model = build_int_cst (integer_type_node, model);
+  tree orig_src = src;
+  tree t, addr, val;
+  unsigned int size;
+  int fncode;
+
+  size = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (src)));
+
+  fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1;
+  t = builtin_decl_implicit ((enum built_in_function) fncode);
+
+  addr = build1 (ADDR_EXPR, ptr_type, src);
+  val = build_call_expr (t, 2, addr, mem_model);
+
+  /* First reinterpret the loaded bits in the original type of the load,
+     then convert to the expected result type.  */
+  t = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (src), val);
+  return convert (TREE_TYPE (orig_src), t);
+}
+
 /* Return an expression which determines whether or not the GUARD
    variable has already been initialized.  */
 
@@ -3064,6 +3090,9 @@ get_guard_cond (tree guard)
 {
   tree guard_value;
 
+  /* Load the guard value only through an atomic acquire load.  */
+  guard = build_atomic_load (guard, MEMMODEL_ACQUIRE);
+
   /* Check to see if the GUARD is zero.  */
   guard = get_guard_bits (guard);
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f2f3497..a16cd92 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11395,16 +11395,6 @@ routine for target specific customizations of the system printf
 and scanf formatter settings.
 @end defmac
 
-@deftypevr {Target Hook} bool TARGET_RELAXED_ORDERING
-If set to @code{true}, means that the target's memory model does not
-guarantee that loads which do not depend on one another will access
-main memory in the order of the instruction stream; if ordering is
-important, an explicit memory barrier must be used.  This is true of
-many recent processors which implement a policy of ``relaxed,''
-``weak,'' or ``release'' memory consistency, such as Alpha, PowerPC,
-and ia64.  The default is @code{false}.
-@end deftypevr
-
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN (const_tree @var{typelist}, const_tree @var{funcdecl}, const_tree @var{val})
 If defined, this macro returns the diagnostic message when it is
 illegal to pass argument @var{val} to function @var{funcdecl}
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 35b02b7..93fb41c 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8143,8 +8143,6 @@ routine for target specific customizations of the system printf
 and scanf formatter settings.
 @end defmac
 
-@hook TARGET_RELAXED_ORDERING
-
 @hook TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
 
 @hook TARGET_INVALID_CONVERSION
diff --git a/gcc/target.def b/gcc/target.def
index f2cb81d..b606b81 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5785,19 +5785,6 @@ for the primary source file, immediately after printing\n\
 this to be done.  The default is false.",
  bool, false)
 
-/* True if the target is allowed to reorder memory accesses unless
-   synchronization is explicitly requested.  */
-DEFHOOKPOD
-(relaxed_ordering,
- "If set to @code{true}, means that the target's memory model does not\n\
-guarantee that loads which do not depend on one another will access\n\
-main memory in the order of the instruction stream; if ordering is\n\
-important, an explicit memory barrier must be used.  This is true of\n\
-many recent processors which implement a policy of ``relaxed,''\n\
-``weak,'' or ``release'' memory consistency, such as Alpha, PowerPC,\n\
-and ia64.  The default is @code{false}.",
- bool, false)
-
 /* Returns true if we should generate exception tables for use with the
    ARM EABI.  The effects the encoding of function exception specifications.  */
 DEFHOOKPOD

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC / CFT] PR c++/66192 -  Remove TARGET_RELAXED_ORDERING and use load acquires.
  2015-05-22 11:37 ` Ramana Radhakrishnan
@ 2015-05-22 13:49   ` Jason Merrill
  2015-05-22 14:15     ` David Edelsohn
  2015-05-22 14:28     ` Ramana Radhakrishnan
  0 siblings, 2 replies; 17+ messages in thread
From: Jason Merrill @ 2015-05-22 13:49 UTC (permalink / raw)
  To: Ramana Radhakrishnan, gcc-patches
  Cc: David Edelsohn, wilson, Steve Ellcey, Richard Henderson

On 05/22/2015 07:23 AM, Ramana Radhakrishnan wrote:
> +  /* Load the guard value only through an atomic acquire load.  */
> +  guard = build_atomic_load (guard, MEMMODEL_ACQUIRE);
> +
>     /* Check to see if the GUARD is zero.  */
>     guard = get_guard_bits (guard);

I wonder if these calls should be reversed, to express that we're only 
trying to atomically load a byte (on non-ARM targets)?

> +  tree orig_src = src;
> +  tree t, addr, val;
> +  unsigned int size;
> +  int fncode;
> +
> +  size = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (src)));
> +
> +  fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1;
> +  t = builtin_decl_implicit ((enum built_in_function) fncode);
> +
> +  addr = build1 (ADDR_EXPR, ptr_type, src);
> +  val = build_call_expr (t, 2, addr, mem_model);
> +
> +  /* First reinterpret the loaded bits in the original type of the load,
> +     then convert to the expected result type.  */
> +  t = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (src), val);
> +  return convert (TREE_TYPE (orig_src), t);

I don't see anything that changes src here.

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
  2015-05-22 13:49   ` Jason Merrill
@ 2015-05-22 14:15     ` David Edelsohn
  2015-05-22 14:42       ` Jason Merrill
  2015-05-22 14:28     ` Ramana Radhakrishnan
  1 sibling, 1 reply; 17+ messages in thread
From: David Edelsohn @ 2015-05-22 14:15 UTC (permalink / raw)
  To: Jason Merrill
  Cc: Ramana Radhakrishnan, gcc-patches, Jim Wilson, Steve Ellcey,
	Richard Henderson, Steve Munroe

On Fri, May 22, 2015 at 9:40 AM, Jason Merrill <jason@redhat.com> wrote:
> On 05/22/2015 07:23 AM, Ramana Radhakrishnan wrote:
>>
>> +  /* Load the guard value only through an atomic acquire load.  */
>> +  guard = build_atomic_load (guard, MEMMODEL_ACQUIRE);
>> +
>>     /* Check to see if the GUARD is zero.  */
>>     guard = get_guard_bits (guard);
>
>
> I wonder if these calls should be reversed, to express that we're only
> trying to atomically load a byte (on non-ARM targets)?

That expresses the semantics more directly, but will that lead to less
efficient code on some RISC architectures?

- David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC / CFT] PR c++/66192 -  Remove TARGET_RELAXED_ORDERING and use load acquires.
  2015-05-22 13:49   ` Jason Merrill
  2015-05-22 14:15     ` David Edelsohn
@ 2015-05-22 14:28     ` Ramana Radhakrishnan
  1 sibling, 0 replies; 17+ messages in thread
From: Ramana Radhakrishnan @ 2015-05-22 14:28 UTC (permalink / raw)
  To: Jason Merrill, gcc-patches
  Cc: David Edelsohn, wilson, Steve Ellcey, Richard Henderson



On 22/05/15 14:40, Jason Merrill wrote:
> On 05/22/2015 07:23 AM, Ramana Radhakrishnan wrote:
>> +  /* Load the guard value only through an atomic acquire load.  */
>> +  guard = build_atomic_load (guard, MEMMODEL_ACQUIRE);
>> +
>>     /* Check to see if the GUARD is zero.  */
>>     guard = get_guard_bits (guard);
>
> I wonder if these calls should be reversed, to express that we're only
> trying to atomically load a byte (on non-ARM targets)?
>

I'm not sure about the impact on other non-ARM targets without some more 
investigation.

>> +  tree orig_src = src;
>> +  tree t, addr, val;
>> +  unsigned int size;
>> +  int fncode;
>> +
>> +  size = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (src)));
>> +
>> +  fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1;
>> +  t = builtin_decl_implicit ((enum built_in_function) fncode);
>> +
>> +  addr = build1 (ADDR_EXPR, ptr_type, src);
>> +  val = build_call_expr (t, 2, addr, mem_model);
>> +
>> +  /* First reinterpret the loaded bits in the original type of the load,
>> +     then convert to the expected result type.  */
>> +  t = fold_build1 (VIEW_CONVERT_EXPR, TREE_TYPE (src), val);
>> +  return convert (TREE_TYPE (orig_src), t);
>
> I don't see anything that changes src here.

Sorry 'bout that. Will fix in the next spin depending on comments from 
others.

I take it that you are happy with the patch otherwise ...

regards
Ramana

>
> Jason
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
  2015-05-22 14:15     ` David Edelsohn
@ 2015-05-22 14:42       ` Jason Merrill
  2015-05-22 15:42         ` Ramana Radhakrishnan
  0 siblings, 1 reply; 17+ messages in thread
From: Jason Merrill @ 2015-05-22 14:42 UTC (permalink / raw)
  To: David Edelsohn
  Cc: Ramana Radhakrishnan, gcc-patches, Jim Wilson, Steve Ellcey,
	Richard Henderson, Steve Munroe

On 05/22/2015 09:55 AM, David Edelsohn wrote:
> On Fri, May 22, 2015 at 9:40 AM, Jason Merrill <jason@redhat.com> wrote:
>> On 05/22/2015 07:23 AM, Ramana Radhakrishnan wrote:
>>>
>>> +  /* Load the guard value only through an atomic acquire load.  */
>>> +  guard = build_atomic_load (guard, MEMMODEL_ACQUIRE);
>>> +
>>>      /* Check to see if the GUARD is zero.  */
>>>      guard = get_guard_bits (guard);
>>
>>
>> I wonder if these calls should be reversed, to express that we're only
>> trying to atomically load a byte (on non-ARM targets)?
>
> That expresses the semantics more directly, but will that lead to less
> efficient code on some RISC architectures?

I'm not sure.  I would expect that the target would use a larger load 
and mask it if that is better for performance, but I don't know.

I do notice that get_guard_bits after build_atomic_load just won't work 
on non-ARM targets, as it ends up trying to take the address of a value.

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
  2015-05-22 14:42       ` Jason Merrill
@ 2015-05-22 15:42         ` Ramana Radhakrishnan
  2015-05-22 17:48           ` David Edelsohn
  2015-05-22 18:19           ` Jason Merrill
  0 siblings, 2 replies; 17+ messages in thread
From: Ramana Radhakrishnan @ 2015-05-22 15:42 UTC (permalink / raw)
  To: Jason Merrill, David Edelsohn
  Cc: gcc-patches, Jim Wilson, Steve Ellcey, Richard Henderson, Steve Munroe



On 22/05/15 15:28, Jason Merrill wrote:
> On 05/22/2015 09:55 AM, David Edelsohn wrote:
>> On Fri, May 22, 2015 at 9:40 AM, Jason Merrill <jason@redhat.com> wrote:
>>> On 05/22/2015 07:23 AM, Ramana Radhakrishnan wrote:
>>>>
>>>> +  /* Load the guard value only through an atomic acquire load.  */
>>>> +  guard = build_atomic_load (guard, MEMMODEL_ACQUIRE);
>>>> +
>>>>      /* Check to see if the GUARD is zero.  */
>>>>      guard = get_guard_bits (guard);
>>>
>>>
>>> I wonder if these calls should be reversed, to express that we're only
>>> trying to atomically load a byte (on non-ARM targets)?
>>
>> That expresses the semantics more directly, but will that lead to less
>> efficient code on some RISC architectures?
>
> I'm not sure.  I would expect that the target would use a larger load
> and mask it if that is better for performance, but I don't know.
>
> I do notice that get_guard_bits after build_atomic_load just won't work
> on non-ARM targets, as it ends up trying to take the address of a value.
>
> Jason
>



So on powerpc where targetm.guard_mask_bit is false - this is what I see.


{
   static int * p;

     static int * p;
   if (<<cleanup_point (unsigned char) *(char *) &(long long int) 
__atomic_load_8 (&_ZGVZ1fvE1p, 2) == 0>>)
     {
       if (<<cleanup_point __cxa_guard_acquire (&_ZGVZ1fvE1p) != 0>>)
         {
           <<cleanup_point <<< Unknown tree: expr_stmt
   TARGET_EXPR <D.2846, 0>;, p = (int *) operator new (4);;, D.2846 = 
1;;, __cxa_guard_release (&_ZGVZ1fvE1p); >>>>>;
         }
     }
   return <retval> = p;
}

with the following output - David - is this what you would expect ?


0:      addis 2,12,.TOC.-0b@ha
         addi 2,2,.TOC.-0b@l
         .localentry     _Z1fv,.-_Z1fv
         mflr %r0
         std %r30,-16(%r1)
         std %r31,-8(%r1)
         .cfi_register 65, 0
         .cfi_offset 30, -16
         .cfi_offset 31, -8
         addis %r30,%r2,_ZGVZ1fvE1p@toc@ha
         addi %r30,%r30,_ZGVZ1fvE1p@toc@l
         std %r0,16(%r1)
         stdu %r1,-64(%r1)
         .cfi_def_cfa_offset 64
         .cfi_offset 65, 16
         addis %r10,%r2,_ZGVZ1fvE1p@toc@ha               # gpr load 
fusion, type long
         ld %r10,_ZGVZ1fvE1p@toc@l(%r10)
         cmpw %cr7,%r10,%r10
         bne- %cr7,$+4
         isync
         rlwinm %r9,%r10,0,0xff
         std %r10,32(%r1)
         cmpwi %cr7,%r9,0
         beq %cr7,.L11
.L9:
         addi %r1,%r1,64
         .cfi_remember_state
         .cfi_def_cfa_offset 0
         addis %r3,%r2,_ZZ1fvE1p@toc@ha          # gpr load fusion, type 
long
         ld %r3,_ZZ1fvE1p@toc@l(%r3)
         ld %r0,16(%r1)
         ld %r30,-16(%r1)
         ld %r31,-8(%r1)
         mtlr %r0
         .cfi_restore 65
         .cfi_restore 31
         .cfi_restore 30
         blr
         .p2align 4,,15
.L11:
         .cfi_restore_state
         mr %r3,%r30




And on AArch64 which is where guard_mask_bit is true.


  static int * p;

     static int * p;
   if (<<cleanup_point ((long long int) __atomic_load_8 (&_ZGVZ1fvE1p, 
2) & 1) == 0>>)
     {
       if (<<cleanup_point __cxa_guard_acquire (&_ZGVZ1fvE1p) != 0>>)
         {
           <<cleanup_point <<< Unknown tree: expr_stmt
   TARGET_EXPR <D.3168, 0>;, p = (int *) operator new (4);;, D.3168 = 
1;;, __cxa_guard_release (&_ZGVZ1fvE1p); >>>>>;
         }
     }
   return <retval> = p;
}

Alternatively I can change this to an atomic_load of just the byte 
required but I'm not sure if that's supported on all targets.

I'm going to be running away shortly for the long weekend here, so will 
resume discussions/experiments on Tuesday.


regards
Ramana

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
  2015-05-22 15:42         ` Ramana Radhakrishnan
@ 2015-05-22 17:48           ` David Edelsohn
  2015-05-22 18:19           ` Jason Merrill
  1 sibling, 0 replies; 17+ messages in thread
From: David Edelsohn @ 2015-05-22 17:48 UTC (permalink / raw)
  To: Ramana Radhakrishnan, Steve Munroe
  Cc: Jason Merrill, gcc-patches, Jim Wilson, Steve Ellcey, Richard Henderson

On Fri, May 22, 2015 at 11:23 AM, Ramana Radhakrishnan
<ramana.radhakrishnan@foss.arm.com> wrote:

> So on powerpc where targetm.guard_mask_bit is false - this is what I see.
>
>
> {
>   static int * p;
>
>     static int * p;
>   if (<<cleanup_point (unsigned char) *(char *) &(long long int)
> __atomic_load_8 (&_ZGVZ1fvE1p, 2) == 0>>)
>     {
>       if (<<cleanup_point __cxa_guard_acquire (&_ZGVZ1fvE1p) != 0>>)
>         {
>           <<cleanup_point <<< Unknown tree: expr_stmt
>   TARGET_EXPR <D.2846, 0>;, p = (int *) operator new (4);;, D.2846 = 1;;,
> __cxa_guard_release (&_ZGVZ1fvE1p); >>>>>;
>         }
>     }
>   return <retval> = p;
> }
>
> with the following output - David - is this what you would expect ?
>
>
> 0:      addis 2,12,.TOC.-0b@ha
>         addi 2,2,.TOC.-0b@l
>         .localentry     _Z1fv,.-_Z1fv
>         mflr %r0
>         std %r30,-16(%r1)
>         std %r31,-8(%r1)
>         .cfi_register 65, 0
>         .cfi_offset 30, -16
>         .cfi_offset 31, -8
>         addis %r30,%r2,_ZGVZ1fvE1p@toc@ha
>         addi %r30,%r30,_ZGVZ1fvE1p@toc@l
>         std %r0,16(%r1)
>         stdu %r1,-64(%r1)
>         .cfi_def_cfa_offset 64
>         .cfi_offset 65, 16
>         addis %r10,%r2,_ZGVZ1fvE1p@toc@ha               # gpr load fusion,
> type long
>         ld %r10,_ZGVZ1fvE1p@toc@l(%r10)
>         cmpw %cr7,%r10,%r10
>         bne- %cr7,$+4
>         isync
>         rlwinm %r9,%r10,0,0xff
>         std %r10,32(%r1)
>         cmpwi %cr7,%r9,0
>         beq %cr7,.L11
> .L9:
>         addi %r1,%r1,64
>         .cfi_remember_state
>         .cfi_def_cfa_offset 0
>         addis %r3,%r2,_ZZ1fvE1p@toc@ha          # gpr load fusion, type long
>         ld %r3,_ZZ1fvE1p@toc@l(%r3)
>         ld %r0,16(%r1)
>         ld %r30,-16(%r1)
>         ld %r31,-8(%r1)
>         mtlr %r0
>         .cfi_restore 65
>         .cfi_restore 31
>         .cfi_restore 30
>         blr
>         .p2align 4,,15
> .L11:
>         .cfi_restore_state
>         mr %r3,%r30

It looks reasonable, but we really need to examine the code produced
by the final version of the patch for some examples.

Thanks, David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
  2015-05-22 15:42         ` Ramana Radhakrishnan
  2015-05-22 17:48           ` David Edelsohn
@ 2015-05-22 18:19           ` Jason Merrill
  2015-05-22 19:49             ` Richard Henderson
  2015-05-29 13:32             ` Ramana Radhakrishnan
  1 sibling, 2 replies; 17+ messages in thread
From: Jason Merrill @ 2015-05-22 18:19 UTC (permalink / raw)
  To: Ramana Radhakrishnan, David Edelsohn
  Cc: gcc-patches, Jim Wilson, Steve Ellcey, Richard Henderson,
	Steve Munroe, Torvald Riegel

On 05/22/2015 11:23 AM, Ramana Radhakrishnan wrote:
> On 22/05/15 15:28, Jason Merrill wrote:
>> I do notice that get_guard_bits after build_atomic_load just won't work
>> on non-ARM targets, as it ends up trying to take the address of a value.
>
> So on powerpc where targetm.guard_mask_bit is false - this is what I see.
>
>    &(long long int) __atomic_load_8 (&_ZGVZ1fvE1p, 2)

This is the bit that doesn't make sense to me.  __atomic_load_8 returns 
a value; what does it mean to take its address?  If we're going to load 
more than just the first byte, we should mask off the rest rather than 
trying to mess with its address.

It also seems unnecessary to load 8 bytes on any target; we could add a 
function to optabs.c that returns the smallest mode for which there's 
atomic load support?

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
  2015-05-22 18:19           ` Jason Merrill
@ 2015-05-22 19:49             ` Richard Henderson
  2015-05-29 13:32             ` Ramana Radhakrishnan
  1 sibling, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2015-05-22 19:49 UTC (permalink / raw)
  To: Jason Merrill, Ramana Radhakrishnan, David Edelsohn
  Cc: gcc-patches, Jim Wilson, Steve Ellcey, Steve Munroe, Torvald Riegel

On 05/22/2015 10:36 AM, Jason Merrill wrote:
> It also seems unnecessary to load 8 bytes on any target; we could add a
> function to optabs.c that returns the smallest mode for which there's atomic
> load support?

No, while we do use an atomic_load<mode> pattern if it exists, we assume that
all loads no larger than word_mode are atomic.  Thus if the atomic_load<mode>
doesn't exist, we'll issue a normal move.

If we're just trying to load 8 bits, we can do that on *any* target.


r~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
  2015-05-22 18:19           ` Jason Merrill
  2015-05-22 19:49             ` Richard Henderson
@ 2015-05-29 13:32             ` Ramana Radhakrishnan
  2015-05-29 16:34               ` Richard Henderson
                                 ` (3 more replies)
  1 sibling, 4 replies; 17+ messages in thread
From: Ramana Radhakrishnan @ 2015-05-29 13:32 UTC (permalink / raw)
  To: Jason Merrill, David Edelsohn
  Cc: gcc-patches, Jim Wilson, Steve Ellcey, Richard Henderson,
	Steve Munroe, Torvald Riegel

[-- Attachment #1: Type: text/plain, Size: 3792 bytes --]



On 22/05/15 18:36, Jason Merrill wrote:
> On 05/22/2015 11:23 AM, Ramana Radhakrishnan wrote:
>> On 22/05/15 15:28, Jason Merrill wrote:
>>> I do notice that get_guard_bits after build_atomic_load just won't work
>>> on non-ARM targets, as it ends up trying to take the address of a value.
>>
>> So on powerpc where targetm.guard_mask_bit is false - this is what I see.
>>
>>    &(long long int) __atomic_load_8 (&_ZGVZ1fvE1p, 2)
>
> This is the bit that doesn't make sense to me.  __atomic_load_8 returns
> a value; what does it mean to take its address?  If we're going to load
> more than just the first byte, we should mask off the rest rather than
> trying to mess with its address.
>
> It also seems unnecessary to load 8 bytes on any target; we could add a
> function to optabs.c that returns the smallest mode for which there's
> atomic load support?
>
> Jason
>


Right, taken all comments on-board - thanks to those who tested the 
original patch, here's v2 of the patch currently bootstrapped and tested 
only on aarch64-none-linux-gnu

This patch -

- Turns build_atomic_load into build_atomic_load_byte in order
   to do an atomic load of 1 byte instead of a full word atomic load.
- Restructures get_guard_cond to decide whether to use an atomic load
   or a standard load depending on whether code generated will be in
   a multi-threaded context or not.
- Adjusts all callers of get_guard_cond accordingly.

One of the bits of fallout that I've observed in my testing and that I'm 
not sure about what to do is that on *bare-metal* arm-none-eabi targets 
we still put out calls to __sync_synchronize on architecture versions 
that do not have a barrier instruction which will result in a link error.

While it is tempting to take the easy approach of not putting out the 
call, I suspect in practice a number of users of the bare-metal tools 
use these for their own RTOS's and other micro-OS's. Thus generating 
barriers at higher architecture levels and not generating barriers at 
lower architecture levels appears to be a bit dangerous especially on 
architectures where there is backwards compatibility (i.e. 
-mcpu=arm7tdmi on standard user code is still expected to generate code 
that works on a core that conforms to a later architecture revision).

I am considering leaving this in the ARM backend to force people to 
think what they want to do about thread safety with statics and C++ on 
bare-metal systems. If they really do not want thread safety they can 
well add -fno-threadsafe-statics or provide an appropriate 
implementation for __sync_synchronize on their platforms.

Any thoughts / comments ?

regards
Ramana

gcc/cp/ChangeLog:

2015-05-29  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

     PR c++/66192
     * cp-tree.h (get_guard_cond): Adjust declaration
     * decl.c (expand_static_init): Use atomic load acquire
      and adjust call to get_guard_cond.
     * decl2.c (build_atomic_load_byte): New function.
     (get_guard_cond): Handle thread_safety.
     (one_static_initialization_or_destruction): Adjust call to
     get_guard_cond.

gcc/ChangeLog:

2015-05-29  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

     PR c++/66192
     * config/alpha/alpha.c (TARGET_RELAXED_ORDERING): Likewise.
     * config/ia64/ia64.c (TARGET_RELAXED_ORDERING): Likewise.
     * config/rs6000/rs6000.c (TARGET_RELAXED_ORDERING): Likewise.
     * config/sparc/linux.h (SPARC_RELAXED_ORDERING): Likewise.
     * config/sparc/linux64.h (SPARC_RELAXED_ORDERING): Likewise.
     * config/sparc/sparc.c (TARGET_RELAXED_ORDERING): Likewise.
     * config/sparc/sparc.h (SPARC_RELAXED_ORDERING): Likewise.
     * doc/tm.texi: Regenerate.
     * doc/tm.texi.in (TARGET_RELAXED_ORDERING): Delete.
     * target.def (TARGET_RELAXED_ORDERING): Delete.


[-- Attachment #2: relaxed-ordering-rewrite-v2.txt --]
[-- Type: text/plain, Size: 13465 bytes --]

commit e9c757b843b22609ac161e18fc7cd17e8866562d
Author: Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>
Date:   Fri May 29 10:55:18 2015 +0100

     Patch v2 - Remove TARGET_RELAXED_ORDERING.
    
    This respun version handles comments from the previous revision.
    
    - Simplifies build_atomic_load into build_atomic_load_byte in order
    to do an atomic load of 1 byte instead of a full word atomic load.
    - Restructures get_guard_cond to decide whether to use an atomic load
    or a standard load depending on whether code generated will be in
    a multi-threaded context or not.
    - Adjusts all callers of get_guard_cond accordingly.
    
    Is this better now ?
    
    Bootstrapped and regression tested on aarch64-none-linux-gnu
    
    regards
    Ramana

gcc/cp/ChangeLog:

2015-05-29  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>
    PR c++/66192
    * cp-tree.h (get_guard_cond): Adjust declaration
    * decl.c (expand_static_init): Use atomic load acquire
     and adjust call to get_guard_cond.
    * decl2.c (build_atomic_load_byte): New function.
    (get_guard_cond): Handle thread_safety.
    (one_static_initialization_or_destruction): Adjust call to
    get_guard_cond.

gcc/ChangeLog:

2015-05-29  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

  PR c++/66192
    * config/alpha/alpha.c (TARGET_RELAXED_ORDERING): Likewise.
    * config/ia64/ia64.c (TARGET_RELAXED_ORDERING): Likewise.
    * config/rs6000/rs6000.c (TARGET_RELAXED_ORDERING): Likewise.
    * config/sparc/linux.h (SPARC_RELAXED_ORDERING): Likewise.
    * config/sparc/linux64.h (SPARC_RELAXED_ORDERING): Likewise.
    * config/sparc/sparc.c (TARGET_RELAXED_ORDERING): Likewise.
    * config/sparc/sparc.h (SPARC_RELAXED_ORDERING): Likewise.
    * doc/tm.texi: Regenerate.
    * doc/tm.texi.in (TARGET_RELAXED_ORDERING): Delete.
    * target.def (TARGET_RELAXED_ORDERING): Delete.


diff --git a/gcc/config/alpha/alpha.c b/gcc/config/alpha/alpha.c
index 1ba99d0..857c9ac 100644
--- a/gcc/config/alpha/alpha.c
+++ b/gcc/config/alpha/alpha.c
@@ -9987,12 +9987,6 @@ alpha_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
 #undef TARGET_EXPAND_BUILTIN_VA_START
 #define TARGET_EXPAND_BUILTIN_VA_START alpha_va_start
 
-/* The Alpha architecture does not require sequential consistency.  See
-   http://www.cs.umd.edu/~pugh/java/memoryModel/AlphaReordering.html
-   for an example of how it can be violated in practice.  */
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING true
-
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE alpha_option_override
 
diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c
index c1e2ecd..45ad97a 100644
--- a/gcc/config/ia64/ia64.c
+++ b/gcc/config/ia64/ia64.c
@@ -630,11 +630,6 @@ static const struct attribute_spec ia64_attribute_table[] =
 #define TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P \
   ia64_libgcc_floating_mode_supported_p
 
-/* ia64 architecture manual 4.4.7: ... reads, writes, and flushes may occur
-   in an order different from the specified program order.  */
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING true
-
 #undef TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P ia64_legitimate_constant_p
 #undef TARGET_LEGITIMATE_ADDRESS_P
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a590ef4..ce70ca0 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1620,17 +1620,6 @@ static const struct attribute_spec rs6000_attribute_table[] =
 #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail
 #endif
 
-/* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors
-   The PowerPC architecture requires only weak consistency among
-   processors--that is, memory accesses between processors need not be
-   sequentially consistent and memory accesses among processors can occur
-   in any order. The ability to order memory accesses weakly provides
-   opportunities for more efficient use of the system bus. Unless a
-   dependency exists, the 604e allows read operations to precede store
-   operations.  */
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING true
-
 #ifdef HAVE_AS_TLS
 #undef TARGET_ASM_OUTPUT_DWARF_DTPREL
 #define TARGET_ASM_OUTPUT_DWARF_DTPREL rs6000_output_dwarf_dtprel
diff --git a/gcc/config/sparc/linux.h b/gcc/config/sparc/linux.h
index 17e1e86..29763c4 100644
--- a/gcc/config/sparc/linux.h
+++ b/gcc/config/sparc/linux.h
@@ -139,12 +139,6 @@ do {									\
 /* Static stack checking is supported by means of probes.  */
 #define STACK_CHECK_STATIC_BUILTIN 1
 
-/* Linux currently uses RMO in uniprocessor mode, which is equivalent to
-   TMO, and TMO in multiprocessor mode.  But they reserve the right to
-   change their minds.  */
-#undef SPARC_RELAXED_ORDERING
-#define SPARC_RELAXED_ORDERING true
-
 #undef NEED_INDICATE_EXEC_STACK
 #define NEED_INDICATE_EXEC_STACK 1
 
diff --git a/gcc/config/sparc/linux64.h b/gcc/config/sparc/linux64.h
index 43da848..efa33fb 100644
--- a/gcc/config/sparc/linux64.h
+++ b/gcc/config/sparc/linux64.h
@@ -253,12 +253,6 @@ do {									\
 /* Static stack checking is supported by means of probes.  */
 #define STACK_CHECK_STATIC_BUILTIN 1
 
-/* Linux currently uses RMO in uniprocessor mode, which is equivalent to
-   TMO, and TMO in multiprocessor mode.  But they reserve the right to
-   change their minds.  */
-#undef SPARC_RELAXED_ORDERING
-#define SPARC_RELAXED_ORDERING true
-
 #undef NEED_INDICATE_EXEC_STACK
 #define NEED_INDICATE_EXEC_STACK 1
 
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index a1562ad..094287f 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -808,9 +808,6 @@ char sparc_hard_reg_printed[8];
 #define TARGET_ATTRIBUTE_TABLE sparc_attribute_table
 #endif
 
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING SPARC_RELAXED_ORDERING
-
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE sparc_option_override
 
diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
index 106d993..72dd18b 100644
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@@ -106,17 +106,6 @@ extern enum cmodel sparc_cmodel;
 
 #define SPARC_DEFAULT_CMODEL CM_32
 
-/* The SPARC-V9 architecture defines a relaxed memory ordering model (RMO)
-   which requires the following macro to be true if enabled.  Prior to V9,
-   there are no instructions to even talk about memory synchronization.
-   Note that the UltraSPARC III processors don't implement RMO, unlike the
-   UltraSPARC II processors.  Niagara, Niagara-2, and Niagara-3 do not
-   implement RMO either.
-
-   Default to false; for example, Solaris never enables RMO, only ever uses
-   total memory ordering (TMO).  */
-#define SPARC_RELAXED_ORDERING false
-
 /* Do not use the .note.GNU-stack convention by default.  */
 #define NEED_INDICATE_EXEC_STACK 0
 
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 91619e2..aa391c8 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5490,7 +5490,7 @@ extern bool mark_used			        (tree, tsubst_flags_t);
 extern void finish_static_data_member_decl	(tree, tree, bool, tree, int);
 extern tree cp_build_parm_decl			(tree, tree);
 extern tree get_guard				(tree);
-extern tree get_guard_cond			(tree);
+extern tree get_guard_cond			(tree, bool);
 extern tree set_guard				(tree);
 extern tree get_tls_wrapper_fn			(tree);
 extern void mark_needed				(tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index a8cb358..8f92667 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -7211,7 +7211,7 @@ expand_static_init (tree decl, tree init)
 	 looks like:
 
 	   static <type> guard;
-	   if (!guard.first_byte) {
+	   if (!__atomic_load (guard.first_byte)) {
 	     if (__cxa_guard_acquire (&guard)) {
 	       bool flag = false;
 	       try {
@@ -7241,16 +7241,11 @@ expand_static_init (tree decl, tree init)
       /* Create the guard variable.  */
       guard = get_guard (decl);
 
-      /* This optimization isn't safe on targets with relaxed memory
-	 consistency.  On such targets we force synchronization in
-	 __cxa_guard_acquire.  */
-      if (!targetm.relaxed_ordering || !thread_guard)
-	{
-	  /* Begin the conditional initialization.  */
-	  if_stmt = begin_if_stmt ();
-	  finish_if_stmt_cond (get_guard_cond (guard), if_stmt);
-	  then_clause = begin_compound_stmt (BCS_NO_SCOPE);
-	}
+      /* Begin the conditional initialization.  */
+      if_stmt = begin_if_stmt ();
+
+      finish_if_stmt_cond (get_guard_cond (guard, thread_guard), if_stmt);
+      then_clause = begin_compound_stmt (BCS_NO_SCOPE);
 
       if (thread_guard)
 	{
@@ -7319,12 +7314,9 @@ expand_static_init (tree decl, tree init)
 	  finish_if_stmt (inner_if_stmt);
 	}
 
-      if (!targetm.relaxed_ordering || !thread_guard)
-	{
-	  finish_compound_stmt (then_clause);
-	  finish_then_clause (if_stmt);
-	  finish_if_stmt (if_stmt);
-	}
+      finish_compound_stmt (then_clause);
+      finish_then_clause (if_stmt);
+      finish_if_stmt (if_stmt);
     }
   else if (DECL_THREAD_LOCAL_P (decl))
     tls_aggregates = tree_cons (init, decl, tls_aggregates);
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index f1b3d0c..36aa8b4 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -3034,6 +3034,25 @@ get_guard (tree decl)
   return guard;
 }
 
+static tree
+build_atomic_load_byte (tree src, HOST_WIDE_INT model)
+{
+  tree ptr_type = build_pointer_type (char_type_node);
+  tree mem_model = build_int_cst (integer_type_node, model);
+  tree t, addr, val;
+  unsigned int size;
+  int fncode;
+
+  size = tree_to_uhwi (TYPE_SIZE_UNIT (char_type_node));
+
+  fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1;
+  t = builtin_decl_implicit ((enum built_in_function) fncode);
+
+  addr = build1 (ADDR_EXPR, ptr_type, src);
+  val = build_call_expr (t, 2, addr, mem_model);
+  return val;
+}
+
 /* Return those bits of the GUARD variable that should be set when the
    guarded entity is actually initialized.  */
 
@@ -3060,12 +3079,14 @@ get_guard_bits (tree guard)
    variable has already been initialized.  */
 
 tree
-get_guard_cond (tree guard)
+get_guard_cond (tree guard, bool thread_safe)
 {
   tree guard_value;
 
-  /* Check to see if the GUARD is zero.  */
-  guard = get_guard_bits (guard);
+  if (!thread_safe)
+    guard = get_guard_bits (guard);
+  else
+    guard = build_atomic_load_byte (guard, MEMMODEL_ACQUIRE);
 
   /* Mask off all but the low bit.  */
   if (targetm.cxx.guard_mask_bit ())
@@ -3681,7 +3702,7 @@ one_static_initialization_or_destruction (tree decl, tree init, bool initp)
 	  /* When using __cxa_atexit, we never try to destroy
 	     anything from a static destructor.  */
 	  gcc_assert (initp);
-	  guard_cond = get_guard_cond (guard);
+	  guard_cond = get_guard_cond (guard, false);
 	}
       /* If we don't have __cxa_atexit, then we will be running
 	 destructors from .fini sections, or their equivalents.  So,
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f2f3497..a16cd92 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11395,16 +11395,6 @@ routine for target specific customizations of the system printf
 and scanf formatter settings.
 @end defmac
 
-@deftypevr {Target Hook} bool TARGET_RELAXED_ORDERING
-If set to @code{true}, means that the target's memory model does not
-guarantee that loads which do not depend on one another will access
-main memory in the order of the instruction stream; if ordering is
-important, an explicit memory barrier must be used.  This is true of
-many recent processors which implement a policy of ``relaxed,''
-``weak,'' or ``release'' memory consistency, such as Alpha, PowerPC,
-and ia64.  The default is @code{false}.
-@end deftypevr
-
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN (const_tree @var{typelist}, const_tree @var{funcdecl}, const_tree @var{val})
 If defined, this macro returns the diagnostic message when it is
 illegal to pass argument @var{val} to function @var{funcdecl}
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 35b02b7..93fb41c 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8143,8 +8143,6 @@ routine for target specific customizations of the system printf
 and scanf formatter settings.
 @end defmac
 
-@hook TARGET_RELAXED_ORDERING
-
 @hook TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
 
 @hook TARGET_INVALID_CONVERSION
diff --git a/gcc/target.def b/gcc/target.def
index f2cb81d..b606b81 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5785,19 +5785,6 @@ for the primary source file, immediately after printing\n\
 this to be done.  The default is false.",
  bool, false)
 
-/* True if the target is allowed to reorder memory accesses unless
-   synchronization is explicitly requested.  */
-DEFHOOKPOD
-(relaxed_ordering,
- "If set to @code{true}, means that the target's memory model does not\n\
-guarantee that loads which do not depend on one another will access\n\
-main memory in the order of the instruction stream; if ordering is\n\
-important, an explicit memory barrier must be used.  This is true of\n\
-many recent processors which implement a policy of ``relaxed,''\n\
-``weak,'' or ``release'' memory consistency, such as Alpha, PowerPC,\n\
-and ia64.  The default is @code{false}.",
- bool, false)
-
 /* Returns true if we should generate exception tables for use with the
    ARM EABI.  The effects the encoding of function exception specifications.  */
 DEFHOOKPOD

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
  2015-05-29 13:32             ` Ramana Radhakrishnan
@ 2015-05-29 16:34               ` Richard Henderson
  2015-05-29 16:36               ` Richard Henderson
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2015-05-29 16:34 UTC (permalink / raw)
  To: Ramana Radhakrishnan, Jason Merrill, David Edelsohn
  Cc: gcc-patches, Jim Wilson, Steve Ellcey, Steve Munroe, Torvald Riegel

On 05/29/2015 06:18 AM, Ramana Radhakrishnan wrote:
> 
>     PR c++/66192
>     * config/alpha/alpha.c (TARGET_RELAXED_ORDERING): Likewise.
>     * config/ia64/ia64.c (TARGET_RELAXED_ORDERING): Likewise.
>     * config/rs6000/rs6000.c (TARGET_RELAXED_ORDERING): Likewise.
>     * config/sparc/linux.h (SPARC_RELAXED_ORDERING): Likewise.
>     * config/sparc/linux64.h (SPARC_RELAXED_ORDERING): Likewise.
>     * config/sparc/sparc.c (TARGET_RELAXED_ORDERING): Likewise.
>     * config/sparc/sparc.h (SPARC_RELAXED_ORDERING): Likewise.
>     * doc/tm.texi: Regenerate.
>     * doc/tm.texi.in (TARGET_RELAXED_ORDERING): Delete.
>     * target.def (TARGET_RELAXED_ORDERING): Delete.

Looks good to me.  You do need to poison the macro in system.h.


r~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
  2015-05-29 13:32             ` Ramana Radhakrishnan
  2015-05-29 16:34               ` Richard Henderson
@ 2015-05-29 16:36               ` Richard Henderson
  2015-05-29 20:53               ` Jason Merrill
  2015-06-02 14:42               ` David Edelsohn
  3 siblings, 0 replies; 17+ messages in thread
From: Richard Henderson @ 2015-05-29 16:36 UTC (permalink / raw)
  To: Ramana Radhakrishnan, Jason Merrill, David Edelsohn
  Cc: gcc-patches, Jim Wilson, Steve Ellcey, Steve Munroe, Torvald Riegel

On 05/29/2015 06:18 AM, Ramana Radhakrishnan wrote:
> 
> One of the bits of fallout that I've observed in my testing and that I'm not
> sure about what to do is that on *bare-metal* arm-none-eabi targets we still
> put out calls to __sync_synchronize on architecture versions that do not have a
> barrier instruction which will result in a link error.
> 
> While it is tempting to take the easy approach of not putting out the call, I
> suspect in practice a number of users of the bare-metal tools use these for
> their own RTOS's and other micro-OS's. Thus generating barriers at higher
> architecture levels and not generating barriers at lower architecture levels
> appears to be a bit dangerous especially on architectures where there is
> backwards compatibility (i.e. -mcpu=arm7tdmi on standard user code is still
> expected to generate code that works on a core that conforms to a later
> architecture revision).
> 
> I am considering leaving this in the ARM backend to force people to think what
> they want to do about thread safety with statics and C++ on bare-metal systems.
> If they really do not want thread safety they can well add
> -fno-threadsafe-statics or provide an appropriate implementation for
> __sync_synchronize on their platforms.
> 
> Any thoughts / comments ?

That seems reasonable.  It probably warrants some documentation somewhere though.


r~

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
  2015-05-29 13:32             ` Ramana Radhakrishnan
  2015-05-29 16:34               ` Richard Henderson
  2015-05-29 16:36               ` Richard Henderson
@ 2015-05-29 20:53               ` Jason Merrill
  2015-06-04  9:46                 ` Ramana Radhakrishnan
  2015-06-02 14:42               ` David Edelsohn
  3 siblings, 1 reply; 17+ messages in thread
From: Jason Merrill @ 2015-05-29 20:53 UTC (permalink / raw)
  To: Ramana Radhakrishnan, David Edelsohn
  Cc: gcc-patches, Jim Wilson, Steve Ellcey, Richard Henderson,
	Steve Munroe, Torvald Riegel

On 05/29/2015 09:18 AM, Ramana Radhakrishnan wrote:
> +static tree
> +build_atomic_load_byte (tree src, HOST_WIDE_INT model)

This function needs a comment.  The C++ changes are OK with that.

Jason

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
  2015-05-29 13:32             ` Ramana Radhakrishnan
                                 ` (2 preceding siblings ...)
  2015-05-29 20:53               ` Jason Merrill
@ 2015-06-02 14:42               ` David Edelsohn
  3 siblings, 0 replies; 17+ messages in thread
From: David Edelsohn @ 2015-06-02 14:42 UTC (permalink / raw)
  To: Ramana Radhakrishnan
  Cc: Jason Merrill, gcc-patches, Jim Wilson, Steve Ellcey,
	Richard Henderson, Steve Munroe, Torvald Riegel

On Fri, May 29, 2015 at 9:18 AM, Ramana Radhakrishnan
<ramana.radhakrishnan@foss.arm.com> wrote:

> - Turns build_atomic_load into build_atomic_load_byte in order
>   to do an atomic load of 1 byte instead of a full word atomic load.
> - Restructures get_guard_cond to decide whether to use an atomic load
>   or a standard load depending on whether code generated will be in
>   a multi-threaded context or not.
> - Adjusts all callers of get_guard_cond accordingly.
>
> One of the bits of fallout that I've observed in my testing and that I'm not
> sure about what to do is that on *bare-metal* arm-none-eabi targets we still
> put out calls to __sync_synchronize on architecture versions that do not
> have a barrier instruction which will result in a link error.
>
> While it is tempting to take the easy approach of not putting out the call,
> I suspect in practice a number of users of the bare-metal tools use these
> for their own RTOS's and other micro-OS's. Thus generating barriers at
> higher architecture levels and not generating barriers at lower architecture
> levels appears to be a bit dangerous especially on architectures where there
> is backwards compatibility (i.e. -mcpu=arm7tdmi on standard user code is
> still expected to generate code that works on a core that conforms to a
> later architecture revision).
>
> I am considering leaving this in the ARM backend to force people to think
> what they want to do about thread safety with statics and C++ on bare-metal
> systems. If they really do not want thread safety they can well add
> -fno-threadsafe-statics or provide an appropriate implementation for
> __sync_synchronize on their platforms.
>
> Any thoughts / comments ?
>
> regards
> Ramana
>
> gcc/cp/ChangeLog:
>
> 2015-05-29  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>
>
>     PR c++/66192
>     * cp-tree.h (get_guard_cond): Adjust declaration
>     * decl.c (expand_static_init): Use atomic load acquire
>      and adjust call to get_guard_cond.
>     * decl2.c (build_atomic_load_byte): New function.
>     (get_guard_cond): Handle thread_safety.
>     (one_static_initialization_or_destruction): Adjust call to
>     get_guard_cond.
>
> gcc/ChangeLog:
>
> 2015-05-29  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>
>
>
>     PR c++/66192
>     * config/alpha/alpha.c (TARGET_RELAXED_ORDERING): Likewise.
>     * config/ia64/ia64.c (TARGET_RELAXED_ORDERING): Likewise.
>     * config/rs6000/rs6000.c (TARGET_RELAXED_ORDERING): Likewise.
>     * config/sparc/linux.h (SPARC_RELAXED_ORDERING): Likewise.
>     * config/sparc/linux64.h (SPARC_RELAXED_ORDERING): Likewise.
>     * config/sparc/sparc.c (TARGET_RELAXED_ORDERING): Likewise.
>     * config/sparc/sparc.h (SPARC_RELAXED_ORDERING): Likewise.
>     * doc/tm.texi: Regenerate.
>     * doc/tm.texi.in (TARGET_RELAXED_ORDERING): Delete.
>     * target.def (TARGET_RELAXED_ORDERING): Delete.

Bootstrap on PPC64 BE and LE Linux successful.  The generated code
looks like what we expect and much better than the current code
generation.

Thanks, David

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
  2015-05-29 20:53               ` Jason Merrill
@ 2015-06-04  9:46                 ` Ramana Radhakrishnan
  0 siblings, 0 replies; 17+ messages in thread
From: Ramana Radhakrishnan @ 2015-06-04  9:46 UTC (permalink / raw)
  To: Jason Merrill, David Edelsohn
  Cc: gcc-patches, Jim Wilson, Steve Ellcey, Richard Henderson,
	Steve Munroe, Torvald Riegel

[-- Attachment #1: Type: text/plain, Size: 1731 bytes --]



On 29/05/15 20:40, Jason Merrill wrote:
> On 05/29/2015 09:18 AM, Ramana Radhakrishnan wrote:
>> +static tree
>> +build_atomic_load_byte (tree src, HOST_WIDE_INT model)
>
> This function needs a comment.  The C++ changes are OK with that.
>
> Jason
>

I'm assuming your review and rth's review constitute an OK.

Given no regressions on aarch64-none-linux-gnu, arm-none-linux-gnueabihf 
and powerpc, I've applied the attached patch.

Ramana

2015-06-04  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

         PR c++/66192
         PR target/66200
         * doc/tm.texi: Regenerate.
         * doc/tm.texi.in (TARGET_RELAXED_ORDERING): Delete.
         * target.def (TARGET_RELAXED_ORDERING): Likewise.
         * config/alpha/alpha.c (TARGET_RELAXED_ORDERING): Likewise.
         * config/ia64/ia64.c (TARGET_RELAXED_ORDERING): Likewise.
         * config/rs6000/rs6000.c (TARGET_RELAXED_ORDERING): Likewise.
         * config/sparc/linux.h (SPARC_RELAXED_ORDERING): Likewise.
         * config/sparc/linux64.h (SPARC_RELAXED_ORDERING): Likewise.
         * config/sparc/sparc.c (TARGET_RELAXED_ORDERING): Likewise.
         * config/sparc/sparc.h (SPARC_RELAXED_ORDERING): Likewise.
         * system.h (TARGET_RELAXED_ORDERING): Poison.

2015-06-04  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

         PR c++/66192
         PR target/66200
         * cp-tree.h (get_guard_cond): Adjust declaration
         * decl.c (expand_static_init): Use atomic load acquire
         and adjust call to get_guard_cond.
         * decl2.c (build_atomic_load_byte): New function.
         (get_guard_cond): Handle thread_safety.
         (one_static_initialization_or_destruction): Adjust call to
         get_guard_cond.

[-- Attachment #2: rewrite-relaxed-as-atomics-committed.txt --]
[-- Type: text/plain, Size: 13524 bytes --]

Index: gcc/ChangeLog
===================================================================
--- gcc/ChangeLog	(revision 224117)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,18 @@
+2015-06-04  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>
+
+	PR c++/66192
+	PR target/66200
+	* doc/tm.texi: Regenerate.
+	* doc/tm.texi.in (TARGET_RELAXED_ORDERING): Delete.
+	* target.def (TARGET_RELAXED_ORDERING): Likewise.
+	* config/alpha/alpha.c (TARGET_RELAXED_ORDERING): Likewise.
+	* config/ia64/ia64.c (TARGET_RELAXED_ORDERING): Likewise.
+	* config/rs6000/rs6000.c (TARGET_RELAXED_ORDERING): Likewise.
+	* config/sparc/linux.h (SPARC_RELAXED_ORDERING): Likewise.
+	* config/sparc/linux64.h (SPARC_RELAXED_ORDERING): Likewise.
+	* config/sparc/sparc.c (TARGET_RELAXED_ORDERING): Likewise.
+	* config/sparc/sparc.h (SPARC_RELAXED_ORDERING): Likewise.
+
 2015-06-04  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
 
 	* config/aarch64/aarch64.c (aarch64_override_options): Unconditionally
Index: gcc/config/alpha/alpha.c
===================================================================
--- gcc/config/alpha/alpha.c	(revision 224117)
+++ gcc/config/alpha/alpha.c	(working copy)
@@ -9987,12 +9987,6 @@
 #undef TARGET_EXPAND_BUILTIN_VA_START
 #define TARGET_EXPAND_BUILTIN_VA_START alpha_va_start
 
-/* The Alpha architecture does not require sequential consistency.  See
-   http://www.cs.umd.edu/~pugh/java/memoryModel/AlphaReordering.html
-   for an example of how it can be violated in practice.  */
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING true
-
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE alpha_option_override
 
Index: gcc/config/ia64/ia64.c
===================================================================
--- gcc/config/ia64/ia64.c	(revision 224117)
+++ gcc/config/ia64/ia64.c	(working copy)
@@ -630,11 +630,6 @@
 #define TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P \
   ia64_libgcc_floating_mode_supported_p
 
-/* ia64 architecture manual 4.4.7: ... reads, writes, and flushes may occur
-   in an order different from the specified program order.  */
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING true
-
 #undef TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P ia64_legitimate_constant_p
 #undef TARGET_LEGITIMATE_ADDRESS_P
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	(revision 224117)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -1620,17 +1620,6 @@
 #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail
 #endif
 
-/* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors
-   The PowerPC architecture requires only weak consistency among
-   processors--that is, memory accesses between processors need not be
-   sequentially consistent and memory accesses among processors can occur
-   in any order. The ability to order memory accesses weakly provides
-   opportunities for more efficient use of the system bus. Unless a
-   dependency exists, the 604e allows read operations to precede store
-   operations.  */
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING true
-
 #ifdef HAVE_AS_TLS
 #undef TARGET_ASM_OUTPUT_DWARF_DTPREL
 #define TARGET_ASM_OUTPUT_DWARF_DTPREL rs6000_output_dwarf_dtprel
Index: gcc/config/sparc/linux.h
===================================================================
--- gcc/config/sparc/linux.h	(revision 224117)
+++ gcc/config/sparc/linux.h	(working copy)
@@ -139,12 +139,6 @@
 /* Static stack checking is supported by means of probes.  */
 #define STACK_CHECK_STATIC_BUILTIN 1
 
-/* Linux currently uses RMO in uniprocessor mode, which is equivalent to
-   TMO, and TMO in multiprocessor mode.  But they reserve the right to
-   change their minds.  */
-#undef SPARC_RELAXED_ORDERING
-#define SPARC_RELAXED_ORDERING true
-
 #undef NEED_INDICATE_EXEC_STACK
 #define NEED_INDICATE_EXEC_STACK 1
 
Index: gcc/config/sparc/linux64.h
===================================================================
--- gcc/config/sparc/linux64.h	(revision 224117)
+++ gcc/config/sparc/linux64.h	(working copy)
@@ -253,12 +253,6 @@
 /* Static stack checking is supported by means of probes.  */
 #define STACK_CHECK_STATIC_BUILTIN 1
 
-/* Linux currently uses RMO in uniprocessor mode, which is equivalent to
-   TMO, and TMO in multiprocessor mode.  But they reserve the right to
-   change their minds.  */
-#undef SPARC_RELAXED_ORDERING
-#define SPARC_RELAXED_ORDERING true
-
 #undef NEED_INDICATE_EXEC_STACK
 #define NEED_INDICATE_EXEC_STACK 1
 
Index: gcc/config/sparc/sparc.c
===================================================================
--- gcc/config/sparc/sparc.c	(revision 224117)
+++ gcc/config/sparc/sparc.c	(working copy)
@@ -808,9 +808,6 @@
 #define TARGET_ATTRIBUTE_TABLE sparc_attribute_table
 #endif
 
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING SPARC_RELAXED_ORDERING
-
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE sparc_option_override
 
Index: gcc/config/sparc/sparc.h
===================================================================
--- gcc/config/sparc/sparc.h	(revision 224117)
+++ gcc/config/sparc/sparc.h	(working copy)
@@ -106,17 +106,6 @@
 
 #define SPARC_DEFAULT_CMODEL CM_32
 
-/* The SPARC-V9 architecture defines a relaxed memory ordering model (RMO)
-   which requires the following macro to be true if enabled.  Prior to V9,
-   there are no instructions to even talk about memory synchronization.
-   Note that the UltraSPARC III processors don't implement RMO, unlike the
-   UltraSPARC II processors.  Niagara, Niagara-2, and Niagara-3 do not
-   implement RMO either.
-
-   Default to false; for example, Solaris never enables RMO, only ever uses
-   total memory ordering (TMO).  */
-#define SPARC_RELAXED_ORDERING false
-
 /* Do not use the .note.GNU-stack convention by default.  */
 #define NEED_INDICATE_EXEC_STACK 0
 
Index: gcc/cp/ChangeLog
===================================================================
--- gcc/cp/ChangeLog	(revision 224117)
+++ gcc/cp/ChangeLog	(working copy)
@@ -1,3 +1,15 @@
+2015-06-04  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>
+
+	PR c++/66192
+	PR target/66200
+	* cp-tree.h (get_guard_cond): Adjust declaration
+	* decl.c (expand_static_init): Use atomic load acquire
+	and adjust call to get_guard_cond.
+	* decl2.c (build_atomic_load_byte): New function.
+	(get_guard_cond): Handle thread_safety.
+	(one_static_initialization_or_destruction): Adjust call to
+	get_guard_cond.
+
 2015-06-03  Jason Merrill  <jason@redhat.com>
 
 	PR c++/44282
Index: gcc/cp/cp-tree.h
===================================================================
--- gcc/cp/cp-tree.h	(revision 224117)
+++ gcc/cp/cp-tree.h	(working copy)
@@ -5491,7 +5491,7 @@
 extern void finish_static_data_member_decl	(tree, tree, bool, tree, int);
 extern tree cp_build_parm_decl			(tree, tree);
 extern tree get_guard				(tree);
-extern tree get_guard_cond			(tree);
+extern tree get_guard_cond			(tree, bool);
 extern tree set_guard				(tree);
 extern tree get_tls_wrapper_fn			(tree);
 extern void mark_needed				(tree);
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	(revision 224117)
+++ gcc/cp/decl.c	(working copy)
@@ -7227,7 +7227,7 @@
 	 looks like:
 
 	   static <type> guard;
-	   if (!guard.first_byte) {
+	   if (!__atomic_load (guard.first_byte)) {
 	     if (__cxa_guard_acquire (&guard)) {
 	       bool flag = false;
 	       try {
@@ -7257,17 +7257,12 @@
       /* Create the guard variable.  */
       guard = get_guard (decl);
 
-      /* This optimization isn't safe on targets with relaxed memory
-	 consistency.  On such targets we force synchronization in
-	 __cxa_guard_acquire.  */
-      if (!targetm.relaxed_ordering || !thread_guard)
-	{
-	  /* Begin the conditional initialization.  */
-	  if_stmt = begin_if_stmt ();
-	  finish_if_stmt_cond (get_guard_cond (guard), if_stmt);
-	  then_clause = begin_compound_stmt (BCS_NO_SCOPE);
-	}
+      /* Begin the conditional initialization.  */
+      if_stmt = begin_if_stmt ();
 
+      finish_if_stmt_cond (get_guard_cond (guard, thread_guard), if_stmt);
+      then_clause = begin_compound_stmt (BCS_NO_SCOPE);
+
       if (thread_guard)
 	{
 	  tree vfntype = NULL_TREE;
@@ -7335,12 +7330,9 @@
 	  finish_if_stmt (inner_if_stmt);
 	}
 
-      if (!targetm.relaxed_ordering || !thread_guard)
-	{
-	  finish_compound_stmt (then_clause);
-	  finish_then_clause (if_stmt);
-	  finish_if_stmt (if_stmt);
-	}
+      finish_compound_stmt (then_clause);
+      finish_then_clause (if_stmt);
+      finish_if_stmt (if_stmt);
     }
   else if (DECL_THREAD_LOCAL_P (decl))
     tls_aggregates = tree_cons (init, decl, tls_aggregates);
Index: gcc/cp/decl2.c
===================================================================
--- gcc/cp/decl2.c	(revision 224117)
+++ gcc/cp/decl2.c	(working copy)
@@ -3034,6 +3034,27 @@
   return guard;
 }
 
+/* Return an atomic load of src with the appropriate memory model.  */
+
+static tree
+build_atomic_load_byte (tree src, HOST_WIDE_INT model)
+{
+  tree ptr_type = build_pointer_type (char_type_node);
+  tree mem_model = build_int_cst (integer_type_node, model);
+  tree t, addr, val;
+  unsigned int size;
+  int fncode;
+
+  size = tree_to_uhwi (TYPE_SIZE_UNIT (char_type_node));
+
+  fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1;
+  t = builtin_decl_implicit ((enum built_in_function) fncode);
+
+  addr = build1 (ADDR_EXPR, ptr_type, src);
+  val = build_call_expr (t, 2, addr, mem_model);
+  return val;
+}
+
 /* Return those bits of the GUARD variable that should be set when the
    guarded entity is actually initialized.  */
 
@@ -3060,12 +3081,14 @@
    variable has already been initialized.  */
 
 tree
-get_guard_cond (tree guard)
+get_guard_cond (tree guard, bool thread_safe)
 {
   tree guard_value;
 
-  /* Check to see if the GUARD is zero.  */
-  guard = get_guard_bits (guard);
+  if (!thread_safe)
+    guard = get_guard_bits (guard);
+  else
+    guard = build_atomic_load_byte (guard, MEMMODEL_ACQUIRE);
 
   /* Mask off all but the low bit.  */
   if (targetm.cxx.guard_mask_bit ())
@@ -3681,7 +3704,7 @@
 	  /* When using __cxa_atexit, we never try to destroy
 	     anything from a static destructor.  */
 	  gcc_assert (initp);
-	  guard_cond = get_guard_cond (guard);
+	  guard_cond = get_guard_cond (guard, false);
 	}
       /* If we don't have __cxa_atexit, then we will be running
 	 destructors from .fini sections, or their equivalents.  So,
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	(revision 224117)
+++ gcc/doc/tm.texi	(working copy)
@@ -11395,16 +11395,6 @@
 and scanf formatter settings.
 @end defmac
 
-@deftypevr {Target Hook} bool TARGET_RELAXED_ORDERING
-If set to @code{true}, means that the target's memory model does not
-guarantee that loads which do not depend on one another will access
-main memory in the order of the instruction stream; if ordering is
-important, an explicit memory barrier must be used.  This is true of
-many recent processors which implement a policy of ``relaxed,''
-``weak,'' or ``release'' memory consistency, such as Alpha, PowerPC,
-and ia64.  The default is @code{false}.
-@end deftypevr
-
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN (const_tree @var{typelist}, const_tree @var{funcdecl}, const_tree @var{val})
 If defined, this macro returns the diagnostic message when it is
 illegal to pass argument @var{val} to function @var{funcdecl}
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	(revision 224117)
+++ gcc/doc/tm.texi.in	(working copy)
@@ -8143,8 +8143,6 @@
 and scanf formatter settings.
 @end defmac
 
-@hook TARGET_RELAXED_ORDERING
-
 @hook TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
 
 @hook TARGET_INVALID_CONVERSION
Index: gcc/system.h
===================================================================
--- gcc/system.h	(revision 224117)
+++ gcc/system.h	(working copy)
@@ -964,7 +964,7 @@
 	TARGET_HANDLE_PRAGMA_EXTERN_PREFIX \
 	TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN \
 	TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD \
-	TARGET_MD_ASM_CLOBBERS
+	TARGET_MD_ASM_CLOBBERS TARGET_RELAXED_ORDERING
 
 /* Arrays that were deleted in favor of a functional interface.  */
  #pragma GCC poison built_in_decls implicit_built_in_decls
Index: gcc/target.def
===================================================================
--- gcc/target.def	(revision 224117)
+++ gcc/target.def	(working copy)
@@ -5785,19 +5785,6 @@
 this to be done.  The default is false.",
  bool, false)
 
-/* True if the target is allowed to reorder memory accesses unless
-   synchronization is explicitly requested.  */
-DEFHOOKPOD
-(relaxed_ordering,
- "If set to @code{true}, means that the target's memory model does not\n\
-guarantee that loads which do not depend on one another will access\n\
-main memory in the order of the instruction stream; if ordering is\n\
-important, an explicit memory barrier must be used.  This is true of\n\
-many recent processors which implement a policy of ``relaxed,''\n\
-``weak,'' or ``release'' memory consistency, such as Alpha, PowerPC,\n\
-and ia64.  The default is @code{false}.",
- bool, false)
-
 /* Returns true if we should generate exception tables for use with the
    ARM EABI.  The effects the encoding of function exception specifications.  */
 DEFHOOKPOD

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
@ 2015-05-24 18:55 Uros Bizjak
  0 siblings, 0 replies; 17+ messages in thread
From: Uros Bizjak @ 2015-05-24 18:55 UTC (permalink / raw)
  To: gcc-patches; +Cc: Ramana Radhakrishnan, Richard Henderson

Hello!

> This patch removes the special casing for targets with relaxed memory ordering and handles
> guard accesses with equivalent atomic load acquire operations. In this process we change the
> algorithm to load the guard variable with an atomic load that has ACQUIRE semantics. I'm not
> terribly familiar with the C++ front-end so I'm not sure I've used the appropriate interfaces for
> doing something like this.
>
> This then means that on targets which have weak memory models, the fast path is inlined and can
> directly use a load-acquire instruction where available (and yay! one more hook gone).
>
> Currently bootstrapping and regression testing on AArch64 and ARM (prior to the commit that
> caused PR66241). If this goes in then I'm happy to withdraw part of the patches to trunk for
> AArch64 / ARM that defines TARGET_RELAXED_ORDERING and only propose those hunks to
> the branches.
>
> I'd also request the other target maintainers CC'd to help by testing this on their platforms as I
> do not have access to all of them.

The patch works OK on alphaev68-linux-gnu [1].

[1] https://gcc.gnu.org/ml/gcc-testresults/2015-05/msg03008.html

Uros.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-06-04  9:19 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-22 11:26 [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires Ramana Radhakrishnan
2015-05-22 11:37 ` Ramana Radhakrishnan
2015-05-22 13:49   ` Jason Merrill
2015-05-22 14:15     ` David Edelsohn
2015-05-22 14:42       ` Jason Merrill
2015-05-22 15:42         ` Ramana Radhakrishnan
2015-05-22 17:48           ` David Edelsohn
2015-05-22 18:19           ` Jason Merrill
2015-05-22 19:49             ` Richard Henderson
2015-05-29 13:32             ` Ramana Radhakrishnan
2015-05-29 16:34               ` Richard Henderson
2015-05-29 16:36               ` Richard Henderson
2015-05-29 20:53               ` Jason Merrill
2015-06-04  9:46                 ` Ramana Radhakrishnan
2015-06-02 14:42               ` David Edelsohn
2015-05-22 14:28     ` Ramana Radhakrishnan
2015-05-24 18:55 Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).