Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Ramana Radhakrishnan <ramana.radhakrishnan@foss.arm.com>
To: Jason Merrill <jason@redhat.com>, David Edelsohn <dje.gcc@gmail.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	 Jim Wilson <wilson@tuliptree.org>,
	Steve Ellcey <sellcey@mips.com>,
	Richard Henderson <rth@redhat.com>,
	 Steve Munroe <munroesj@linux.vnet.ibm.com>,
	Torvald Riegel <triegel@redhat.com>
Subject: Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
Date: Fri, 29 May 2015 13:32:00 -0000	[thread overview]
Message-ID: <5568673A.6060007@foss.arm.com> (raw)
In-Reply-To: <555F6911.6030205@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3792 bytes --]



On 22/05/15 18:36, Jason Merrill wrote:
> On 05/22/2015 11:23 AM, Ramana Radhakrishnan wrote:
>> On 22/05/15 15:28, Jason Merrill wrote:
>>> I do notice that get_guard_bits after build_atomic_load just won't work
>>> on non-ARM targets, as it ends up trying to take the address of a value.
>>
>> So on powerpc where targetm.guard_mask_bit is false - this is what I see.
>>
>>    &(long long int) __atomic_load_8 (&_ZGVZ1fvE1p, 2)
>
> This is the bit that doesn't make sense to me.  __atomic_load_8 returns
> a value; what does it mean to take its address?  If we're going to load
> more than just the first byte, we should mask off the rest rather than
> trying to mess with its address.
>
> It also seems unnecessary to load 8 bytes on any target; we could add a
> function to optabs.c that returns the smallest mode for which there's
> atomic load support?
>
> Jason
>


Right, taken all comments on-board - thanks to those who tested the 
original patch, here's v2 of the patch currently bootstrapped and tested 
only on aarch64-none-linux-gnu

This patch -

- Turns build_atomic_load into build_atomic_load_byte in order
   to do an atomic load of 1 byte instead of a full word atomic load.
- Restructures get_guard_cond to decide whether to use an atomic load
   or a standard load depending on whether code generated will be in
   a multi-threaded context or not.
- Adjusts all callers of get_guard_cond accordingly.

One of the bits of fallout that I've observed in my testing and that I'm 
not sure about what to do is that on *bare-metal* arm-none-eabi targets 
we still put out calls to __sync_synchronize on architecture versions 
that do not have a barrier instruction which will result in a link error.

While it is tempting to take the easy approach of not putting out the 
call, I suspect in practice a number of users of the bare-metal tools 
use these for their own RTOS's and other micro-OS's. Thus generating 
barriers at higher architecture levels and not generating barriers at 
lower architecture levels appears to be a bit dangerous especially on 
architectures where there is backwards compatibility (i.e. 
-mcpu=arm7tdmi on standard user code is still expected to generate code 
that works on a core that conforms to a later architecture revision).

I am considering leaving this in the ARM backend to force people to 
think what they want to do about thread safety with statics and C++ on 
bare-metal systems. If they really do not want thread safety they can 
well add -fno-threadsafe-statics or provide an appropriate 
implementation for __sync_synchronize on their platforms.

Any thoughts / comments ?

regards
Ramana

gcc/cp/ChangeLog:

2015-05-29  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

     PR c++/66192
     * cp-tree.h (get_guard_cond): Adjust declaration
     * decl.c (expand_static_init): Use atomic load acquire
      and adjust call to get_guard_cond.
     * decl2.c (build_atomic_load_byte): New function.
     (get_guard_cond): Handle thread_safety.
     (one_static_initialization_or_destruction): Adjust call to
     get_guard_cond.

gcc/ChangeLog:

2015-05-29  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

     PR c++/66192
     * config/alpha/alpha.c (TARGET_RELAXED_ORDERING): Likewise.
     * config/ia64/ia64.c (TARGET_RELAXED_ORDERING): Likewise.
     * config/rs6000/rs6000.c (TARGET_RELAXED_ORDERING): Likewise.
     * config/sparc/linux.h (SPARC_RELAXED_ORDERING): Likewise.
     * config/sparc/linux64.h (SPARC_RELAXED_ORDERING): Likewise.
     * config/sparc/sparc.c (TARGET_RELAXED_ORDERING): Likewise.
     * config/sparc/sparc.h (SPARC_RELAXED_ORDERING): Likewise.
     * doc/tm.texi: Regenerate.
     * doc/tm.texi.in (TARGET_RELAXED_ORDERING): Delete.
     * target.def (TARGET_RELAXED_ORDERING): Delete.


[-- Attachment #2: relaxed-ordering-rewrite-v2.txt --]
[-- Type: text/plain, Size: 13465 bytes --]

commit e9c757b843b22609ac161e18fc7cd17e8866562d
Author: Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>
Date:   Fri May 29 10:55:18 2015 +0100

     Patch v2 - Remove TARGET_RELAXED_ORDERING.
    
    This respun version handles comments from the previous revision.
    
    - Simplifies build_atomic_load into build_atomic_load_byte in order
    to do an atomic load of 1 byte instead of a full word atomic load.
    - Restructures get_guard_cond to decide whether to use an atomic load
    or a standard load depending on whether code generated will be in
    a multi-threaded context or not.
    - Adjusts all callers of get_guard_cond accordingly.
    
    Is this better now ?
    
    Bootstrapped and regression tested on aarch64-none-linux-gnu
    
    regards
    Ramana

gcc/cp/ChangeLog:

2015-05-29  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>
    PR c++/66192
    * cp-tree.h (get_guard_cond): Adjust declaration
    * decl.c (expand_static_init): Use atomic load acquire
     and adjust call to get_guard_cond.
    * decl2.c (build_atomic_load_byte): New function.
    (get_guard_cond): Handle thread_safety.
    (one_static_initialization_or_destruction): Adjust call to
    get_guard_cond.

gcc/ChangeLog:

2015-05-29  Ramana Radhakrishnan  <ramana.radhakrishnan@arm.com>

  PR c++/66192
    * config/alpha/alpha.c (TARGET_RELAXED_ORDERING): Likewise.
    * config/ia64/ia64.c (TARGET_RELAXED_ORDERING): Likewise.
    * config/rs6000/rs6000.c (TARGET_RELAXED_ORDERING): Likewise.
    * config/sparc/linux.h (SPARC_RELAXED_ORDERING): Likewise.
    * config/sparc/linux64.h (SPARC_RELAXED_ORDERING): Likewise.
    * config/sparc/sparc.c (TARGET_RELAXED_ORDERING): Likewise.
    * config/sparc/sparc.h (SPARC_RELAXED_ORDERING): Likewise.
    * doc/tm.texi: Regenerate.
    * doc/tm.texi.in (TARGET_RELAXED_ORDERING): Delete.
    * target.def (TARGET_RELAXED_ORDERING): Delete.


diff --git a/gcc/config/alpha/alpha.c b/gcc/config/alpha/alpha.c
index 1ba99d0..857c9ac 100644
--- a/gcc/config/alpha/alpha.c
+++ b/gcc/config/alpha/alpha.c
@@ -9987,12 +9987,6 @@ alpha_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
 #undef TARGET_EXPAND_BUILTIN_VA_START
 #define TARGET_EXPAND_BUILTIN_VA_START alpha_va_start
 
-/* The Alpha architecture does not require sequential consistency.  See
-   http://www.cs.umd.edu/~pugh/java/memoryModel/AlphaReordering.html
-   for an example of how it can be violated in practice.  */
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING true
-
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE alpha_option_override
 
diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c
index c1e2ecd..45ad97a 100644
--- a/gcc/config/ia64/ia64.c
+++ b/gcc/config/ia64/ia64.c
@@ -630,11 +630,6 @@ static const struct attribute_spec ia64_attribute_table[] =
 #define TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P \
   ia64_libgcc_floating_mode_supported_p
 
-/* ia64 architecture manual 4.4.7: ... reads, writes, and flushes may occur
-   in an order different from the specified program order.  */
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING true
-
 #undef TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P ia64_legitimate_constant_p
 #undef TARGET_LEGITIMATE_ADDRESS_P
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a590ef4..ce70ca0 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1620,17 +1620,6 @@ static const struct attribute_spec rs6000_attribute_table[] =
 #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail
 #endif
 
-/* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors
-   The PowerPC architecture requires only weak consistency among
-   processors--that is, memory accesses between processors need not be
-   sequentially consistent and memory accesses among processors can occur
-   in any order. The ability to order memory accesses weakly provides
-   opportunities for more efficient use of the system bus. Unless a
-   dependency exists, the 604e allows read operations to precede store
-   operations.  */
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING true
-
 #ifdef HAVE_AS_TLS
 #undef TARGET_ASM_OUTPUT_DWARF_DTPREL
 #define TARGET_ASM_OUTPUT_DWARF_DTPREL rs6000_output_dwarf_dtprel
diff --git a/gcc/config/sparc/linux.h b/gcc/config/sparc/linux.h
index 17e1e86..29763c4 100644
--- a/gcc/config/sparc/linux.h
+++ b/gcc/config/sparc/linux.h
@@ -139,12 +139,6 @@ do {									\
 /* Static stack checking is supported by means of probes.  */
 #define STACK_CHECK_STATIC_BUILTIN 1
 
-/* Linux currently uses RMO in uniprocessor mode, which is equivalent to
-   TMO, and TMO in multiprocessor mode.  But they reserve the right to
-   change their minds.  */
-#undef SPARC_RELAXED_ORDERING
-#define SPARC_RELAXED_ORDERING true
-
 #undef NEED_INDICATE_EXEC_STACK
 #define NEED_INDICATE_EXEC_STACK 1
 
diff --git a/gcc/config/sparc/linux64.h b/gcc/config/sparc/linux64.h
index 43da848..efa33fb 100644
--- a/gcc/config/sparc/linux64.h
+++ b/gcc/config/sparc/linux64.h
@@ -253,12 +253,6 @@ do {									\
 /* Static stack checking is supported by means of probes.  */
 #define STACK_CHECK_STATIC_BUILTIN 1
 
-/* Linux currently uses RMO in uniprocessor mode, which is equivalent to
-   TMO, and TMO in multiprocessor mode.  But they reserve the right to
-   change their minds.  */
-#undef SPARC_RELAXED_ORDERING
-#define SPARC_RELAXED_ORDERING true
-
 #undef NEED_INDICATE_EXEC_STACK
 #define NEED_INDICATE_EXEC_STACK 1
 
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index a1562ad..094287f 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -808,9 +808,6 @@ char sparc_hard_reg_printed[8];
 #define TARGET_ATTRIBUTE_TABLE sparc_attribute_table
 #endif
 
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING SPARC_RELAXED_ORDERING
-
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE sparc_option_override
 
diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
index 106d993..72dd18b 100644
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@@ -106,17 +106,6 @@ extern enum cmodel sparc_cmodel;
 
 #define SPARC_DEFAULT_CMODEL CM_32
 
-/* The SPARC-V9 architecture defines a relaxed memory ordering model (RMO)
-   which requires the following macro to be true if enabled.  Prior to V9,
-   there are no instructions to even talk about memory synchronization.
-   Note that the UltraSPARC III processors don't implement RMO, unlike the
-   UltraSPARC II processors.  Niagara, Niagara-2, and Niagara-3 do not
-   implement RMO either.
-
-   Default to false; for example, Solaris never enables RMO, only ever uses
-   total memory ordering (TMO).  */
-#define SPARC_RELAXED_ORDERING false
-
 /* Do not use the .note.GNU-stack convention by default.  */
 #define NEED_INDICATE_EXEC_STACK 0
 
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 91619e2..aa391c8 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5490,7 +5490,7 @@ extern bool mark_used			        (tree, tsubst_flags_t);
 extern void finish_static_data_member_decl	(tree, tree, bool, tree, int);
 extern tree cp_build_parm_decl			(tree, tree);
 extern tree get_guard				(tree);
-extern tree get_guard_cond			(tree);
+extern tree get_guard_cond			(tree, bool);
 extern tree set_guard				(tree);
 extern tree get_tls_wrapper_fn			(tree);
 extern void mark_needed				(tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index a8cb358..8f92667 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -7211,7 +7211,7 @@ expand_static_init (tree decl, tree init)
 	 looks like:
 
 	   static <type> guard;
-	   if (!guard.first_byte) {
+	   if (!__atomic_load (guard.first_byte)) {
 	     if (__cxa_guard_acquire (&guard)) {
 	       bool flag = false;
 	       try {
@@ -7241,16 +7241,11 @@ expand_static_init (tree decl, tree init)
       /* Create the guard variable.  */
       guard = get_guard (decl);
 
-      /* This optimization isn't safe on targets with relaxed memory
-	 consistency.  On such targets we force synchronization in
-	 __cxa_guard_acquire.  */
-      if (!targetm.relaxed_ordering || !thread_guard)
-	{
-	  /* Begin the conditional initialization.  */
-	  if_stmt = begin_if_stmt ();
-	  finish_if_stmt_cond (get_guard_cond (guard), if_stmt);
-	  then_clause = begin_compound_stmt (BCS_NO_SCOPE);
-	}
+      /* Begin the conditional initialization.  */
+      if_stmt = begin_if_stmt ();
+
+      finish_if_stmt_cond (get_guard_cond (guard, thread_guard), if_stmt);
+      then_clause = begin_compound_stmt (BCS_NO_SCOPE);
 
       if (thread_guard)
 	{
@@ -7319,12 +7314,9 @@ expand_static_init (tree decl, tree init)
 	  finish_if_stmt (inner_if_stmt);
 	}
 
-      if (!targetm.relaxed_ordering || !thread_guard)
-	{
-	  finish_compound_stmt (then_clause);
-	  finish_then_clause (if_stmt);
-	  finish_if_stmt (if_stmt);
-	}
+      finish_compound_stmt (then_clause);
+      finish_then_clause (if_stmt);
+      finish_if_stmt (if_stmt);
     }
   else if (DECL_THREAD_LOCAL_P (decl))
     tls_aggregates = tree_cons (init, decl, tls_aggregates);
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index f1b3d0c..36aa8b4 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -3034,6 +3034,25 @@ get_guard (tree decl)
   return guard;
 }
 
+static tree
+build_atomic_load_byte (tree src, HOST_WIDE_INT model)
+{
+  tree ptr_type = build_pointer_type (char_type_node);
+  tree mem_model = build_int_cst (integer_type_node, model);
+  tree t, addr, val;
+  unsigned int size;
+  int fncode;
+
+  size = tree_to_uhwi (TYPE_SIZE_UNIT (char_type_node));
+
+  fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1;
+  t = builtin_decl_implicit ((enum built_in_function) fncode);
+
+  addr = build1 (ADDR_EXPR, ptr_type, src);
+  val = build_call_expr (t, 2, addr, mem_model);
+  return val;
+}
+
 /* Return those bits of the GUARD variable that should be set when the
    guarded entity is actually initialized.  */
 
@@ -3060,12 +3079,14 @@ get_guard_bits (tree guard)
    variable has already been initialized.  */
 
 tree
-get_guard_cond (tree guard)
+get_guard_cond (tree guard, bool thread_safe)
 {
   tree guard_value;
 
-  /* Check to see if the GUARD is zero.  */
-  guard = get_guard_bits (guard);
+  if (!thread_safe)
+    guard = get_guard_bits (guard);
+  else
+    guard = build_atomic_load_byte (guard, MEMMODEL_ACQUIRE);
 
   /* Mask off all but the low bit.  */
   if (targetm.cxx.guard_mask_bit ())
@@ -3681,7 +3702,7 @@ one_static_initialization_or_destruction (tree decl, tree init, bool initp)
 	  /* When using __cxa_atexit, we never try to destroy
 	     anything from a static destructor.  */
 	  gcc_assert (initp);
-	  guard_cond = get_guard_cond (guard);
+	  guard_cond = get_guard_cond (guard, false);
 	}
       /* If we don't have __cxa_atexit, then we will be running
 	 destructors from .fini sections, or their equivalents.  So,
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f2f3497..a16cd92 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11395,16 +11395,6 @@ routine for target specific customizations of the system printf
 and scanf formatter settings.
 @end defmac
 
-@deftypevr {Target Hook} bool TARGET_RELAXED_ORDERING
-If set to @code{true}, means that the target's memory model does not
-guarantee that loads which do not depend on one another will access
-main memory in the order of the instruction stream; if ordering is
-important, an explicit memory barrier must be used.  This is true of
-many recent processors which implement a policy of ``relaxed,''
-``weak,'' or ``release'' memory consistency, such as Alpha, PowerPC,
-and ia64.  The default is @code{false}.
-@end deftypevr
-
 @deftypefn {Target Hook} {const char *} TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN (const_tree @var{typelist}, const_tree @var{funcdecl}, const_tree @var{val})
 If defined, this macro returns the diagnostic message when it is
 illegal to pass argument @var{val} to function @var{funcdecl}
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 35b02b7..93fb41c 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8143,8 +8143,6 @@ routine for target specific customizations of the system printf
 and scanf formatter settings.
 @end defmac
 
-@hook TARGET_RELAXED_ORDERING
-
 @hook TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
 
 @hook TARGET_INVALID_CONVERSION
diff --git a/gcc/target.def b/gcc/target.def
index f2cb81d..b606b81 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5785,19 +5785,6 @@ for the primary source file, immediately after printing\n\
 this to be done.  The default is false.",
  bool, false)
 
-/* True if the target is allowed to reorder memory accesses unless
-   synchronization is explicitly requested.  */
-DEFHOOKPOD
-(relaxed_ordering,
- "If set to @code{true}, means that the target's memory model does not\n\
-guarantee that loads which do not depend on one another will access\n\
-main memory in the order of the instruction stream; if ordering is\n\
-important, an explicit memory barrier must be used.  This is true of\n\
-many recent processors which implement a policy of ``relaxed,''\n\
-``weak,'' or ``release'' memory consistency, such as Alpha, PowerPC,\n\
-and ia64.  The default is @code{false}.",
- bool, false)
-
 /* Returns true if we should generate exception tables for use with the
    ARM EABI.  The effects the encoding of function exception specifications.  */
 DEFHOOKPOD

next prev parent reply	other threads:[~2015-05-29 13:18 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-22 11:26 Ramana Radhakrishnan
2015-05-22 11:37 ` Ramana Radhakrishnan
2015-05-22 13:49   ` Jason Merrill
2015-05-22 14:15     ` David Edelsohn
2015-05-22 14:42       ` Jason Merrill
2015-05-22 15:42         ` Ramana Radhakrishnan
2015-05-22 17:48           ` David Edelsohn
2015-05-22 18:19           ` Jason Merrill
2015-05-22 19:49             ` Richard Henderson
2015-05-29 13:32             ` Ramana Radhakrishnan [this message]
2015-05-29 16:34               ` Richard Henderson
2015-05-29 16:36               ` Richard Henderson
2015-05-29 20:53               ` Jason Merrill
2015-06-04  9:46                 ` Ramana Radhakrishnan
2015-06-02 14:42               ` David Edelsohn
2015-05-22 14:28     ` Ramana Radhakrishnan
2015-05-24 18:55 Uros Bizjak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5568673A.6060007@foss.arm.com \
    --to=ramana.radhakrishnan@foss.arm.com \
    --cc=dje.gcc@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jason@redhat.com \
    --cc=munroesj@linux.vnet.ibm.com \
    --cc=rth@redhat.com \
    --cc=sellcey@mips.com \
    --cc=triegel@redhat.com \
    --cc=wilson@tuliptree.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).