From: Ramana Radhakrishnan <ramana.radhakrishnan@foss.arm.com>
To: Jason Merrill <jason@redhat.com>, David Edelsohn <dje.gcc@gmail.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
Jim Wilson <wilson@tuliptree.org>,
Steve Ellcey <sellcey@mips.com>,
Richard Henderson <rth@redhat.com>,
Steve Munroe <munroesj@linux.vnet.ibm.com>,
Torvald Riegel <triegel@redhat.com>
Subject: Re: [RFC / CFT] PR c++/66192 - Remove TARGET_RELAXED_ORDERING and use load acquires.
Date: Fri, 29 May 2015 13:32:00 -0000 [thread overview]
Message-ID: <5568673A.6060007@foss.arm.com> (raw)
In-Reply-To: <555F6911.6030205@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 3792 bytes --]
On 22/05/15 18:36, Jason Merrill wrote:
> On 05/22/2015 11:23 AM, Ramana Radhakrishnan wrote:
>> On 22/05/15 15:28, Jason Merrill wrote:
>>> I do notice that get_guard_bits after build_atomic_load just won't work
>>> on non-ARM targets, as it ends up trying to take the address of a value.
>>
>> So on powerpc where targetm.guard_mask_bit is false - this is what I see.
>>
>> &(long long int) __atomic_load_8 (&_ZGVZ1fvE1p, 2)
>
> This is the bit that doesn't make sense to me. __atomic_load_8 returns
> a value; what does it mean to take its address? If we're going to load
> more than just the first byte, we should mask off the rest rather than
> trying to mess with its address.
>
> It also seems unnecessary to load 8 bytes on any target; we could add a
> function to optabs.c that returns the smallest mode for which there's
> atomic load support?
>
> Jason
>
Right, taken all comments on-board - thanks to those who tested the
original patch, here's v2 of the patch currently bootstrapped and tested
only on aarch64-none-linux-gnu
This patch -
- Turns build_atomic_load into build_atomic_load_byte in order
to do an atomic load of 1 byte instead of a full word atomic load.
- Restructures get_guard_cond to decide whether to use an atomic load
or a standard load depending on whether code generated will be in
a multi-threaded context or not.
- Adjusts all callers of get_guard_cond accordingly.
One of the bits of fallout that I've observed in my testing and that I'm
not sure about what to do is that on *bare-metal* arm-none-eabi targets
we still put out calls to __sync_synchronize on architecture versions
that do not have a barrier instruction which will result in a link error.
While it is tempting to take the easy approach of not putting out the
call, I suspect in practice a number of users of the bare-metal tools
use these for their own RTOS's and other micro-OS's. Thus generating
barriers at higher architecture levels and not generating barriers at
lower architecture levels appears to be a bit dangerous especially on
architectures where there is backwards compatibility (i.e.
-mcpu=arm7tdmi on standard user code is still expected to generate code
that works on a core that conforms to a later architecture revision).
I am considering leaving this in the ARM backend to force people to
think what they want to do about thread safety with statics and C++ on
bare-metal systems. If they really do not want thread safety they can
well add -fno-threadsafe-statics or provide an appropriate
implementation for __sync_synchronize on their platforms.
Any thoughts / comments ?
regards
Ramana
gcc/cp/ChangeLog:
2015-05-29 Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>
PR c++/66192
* cp-tree.h (get_guard_cond): Adjust declaration
* decl.c (expand_static_init): Use atomic load acquire
and adjust call to get_guard_cond.
* decl2.c (build_atomic_load_byte): New function.
(get_guard_cond): Handle thread_safety.
(one_static_initialization_or_destruction): Adjust call to
get_guard_cond.
gcc/ChangeLog:
2015-05-29 Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>
PR c++/66192
* config/alpha/alpha.c (TARGET_RELAXED_ORDERING): Likewise.
* config/ia64/ia64.c (TARGET_RELAXED_ORDERING): Likewise.
* config/rs6000/rs6000.c (TARGET_RELAXED_ORDERING): Likewise.
* config/sparc/linux.h (SPARC_RELAXED_ORDERING): Likewise.
* config/sparc/linux64.h (SPARC_RELAXED_ORDERING): Likewise.
* config/sparc/sparc.c (TARGET_RELAXED_ORDERING): Likewise.
* config/sparc/sparc.h (SPARC_RELAXED_ORDERING): Likewise.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_RELAXED_ORDERING): Delete.
* target.def (TARGET_RELAXED_ORDERING): Delete.
[-- Attachment #2: relaxed-ordering-rewrite-v2.txt --]
[-- Type: text/plain, Size: 13465 bytes --]
commit e9c757b843b22609ac161e18fc7cd17e8866562d
Author: Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>
Date: Fri May 29 10:55:18 2015 +0100
Patch v2 - Remove TARGET_RELAXED_ORDERING.
This respun version handles comments from the previous revision.
- Simplifies build_atomic_load into build_atomic_load_byte in order
to do an atomic load of 1 byte instead of a full word atomic load.
- Restructures get_guard_cond to decide whether to use an atomic load
or a standard load depending on whether code generated will be in
a multi-threaded context or not.
- Adjusts all callers of get_guard_cond accordingly.
Is this better now ?
Bootstrapped and regression tested on aarch64-none-linux-gnu
regards
Ramana
gcc/cp/ChangeLog:
2015-05-29 Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>
PR c++/66192
* cp-tree.h (get_guard_cond): Adjust declaration
* decl.c (expand_static_init): Use atomic load acquire
and adjust call to get_guard_cond.
* decl2.c (build_atomic_load_byte): New function.
(get_guard_cond): Handle thread_safety.
(one_static_initialization_or_destruction): Adjust call to
get_guard_cond.
gcc/ChangeLog:
2015-05-29 Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>
PR c++/66192
* config/alpha/alpha.c (TARGET_RELAXED_ORDERING): Likewise.
* config/ia64/ia64.c (TARGET_RELAXED_ORDERING): Likewise.
* config/rs6000/rs6000.c (TARGET_RELAXED_ORDERING): Likewise.
* config/sparc/linux.h (SPARC_RELAXED_ORDERING): Likewise.
* config/sparc/linux64.h (SPARC_RELAXED_ORDERING): Likewise.
* config/sparc/sparc.c (TARGET_RELAXED_ORDERING): Likewise.
* config/sparc/sparc.h (SPARC_RELAXED_ORDERING): Likewise.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in (TARGET_RELAXED_ORDERING): Delete.
* target.def (TARGET_RELAXED_ORDERING): Delete.
diff --git a/gcc/config/alpha/alpha.c b/gcc/config/alpha/alpha.c
index 1ba99d0..857c9ac 100644
--- a/gcc/config/alpha/alpha.c
+++ b/gcc/config/alpha/alpha.c
@@ -9987,12 +9987,6 @@ alpha_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
#undef TARGET_EXPAND_BUILTIN_VA_START
#define TARGET_EXPAND_BUILTIN_VA_START alpha_va_start
-/* The Alpha architecture does not require sequential consistency. See
- http://www.cs.umd.edu/~pugh/java/memoryModel/AlphaReordering.html
- for an example of how it can be violated in practice. */
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING true
-
#undef TARGET_OPTION_OVERRIDE
#define TARGET_OPTION_OVERRIDE alpha_option_override
diff --git a/gcc/config/ia64/ia64.c b/gcc/config/ia64/ia64.c
index c1e2ecd..45ad97a 100644
--- a/gcc/config/ia64/ia64.c
+++ b/gcc/config/ia64/ia64.c
@@ -630,11 +630,6 @@ static const struct attribute_spec ia64_attribute_table[] =
#define TARGET_LIBGCC_FLOATING_MODE_SUPPORTED_P \
ia64_libgcc_floating_mode_supported_p
-/* ia64 architecture manual 4.4.7: ... reads, writes, and flushes may occur
- in an order different from the specified program order. */
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING true
-
#undef TARGET_LEGITIMATE_CONSTANT_P
#define TARGET_LEGITIMATE_CONSTANT_P ia64_legitimate_constant_p
#undef TARGET_LEGITIMATE_ADDRESS_P
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a590ef4..ce70ca0 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1620,17 +1620,6 @@ static const struct attribute_spec rs6000_attribute_table[] =
#define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail
#endif
-/* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors
- The PowerPC architecture requires only weak consistency among
- processors--that is, memory accesses between processors need not be
- sequentially consistent and memory accesses among processors can occur
- in any order. The ability to order memory accesses weakly provides
- opportunities for more efficient use of the system bus. Unless a
- dependency exists, the 604e allows read operations to precede store
- operations. */
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING true
-
#ifdef HAVE_AS_TLS
#undef TARGET_ASM_OUTPUT_DWARF_DTPREL
#define TARGET_ASM_OUTPUT_DWARF_DTPREL rs6000_output_dwarf_dtprel
diff --git a/gcc/config/sparc/linux.h b/gcc/config/sparc/linux.h
index 17e1e86..29763c4 100644
--- a/gcc/config/sparc/linux.h
+++ b/gcc/config/sparc/linux.h
@@ -139,12 +139,6 @@ do { \
/* Static stack checking is supported by means of probes. */
#define STACK_CHECK_STATIC_BUILTIN 1
-/* Linux currently uses RMO in uniprocessor mode, which is equivalent to
- TMO, and TMO in multiprocessor mode. But they reserve the right to
- change their minds. */
-#undef SPARC_RELAXED_ORDERING
-#define SPARC_RELAXED_ORDERING true
-
#undef NEED_INDICATE_EXEC_STACK
#define NEED_INDICATE_EXEC_STACK 1
diff --git a/gcc/config/sparc/linux64.h b/gcc/config/sparc/linux64.h
index 43da848..efa33fb 100644
--- a/gcc/config/sparc/linux64.h
+++ b/gcc/config/sparc/linux64.h
@@ -253,12 +253,6 @@ do { \
/* Static stack checking is supported by means of probes. */
#define STACK_CHECK_STATIC_BUILTIN 1
-/* Linux currently uses RMO in uniprocessor mode, which is equivalent to
- TMO, and TMO in multiprocessor mode. But they reserve the right to
- change their minds. */
-#undef SPARC_RELAXED_ORDERING
-#define SPARC_RELAXED_ORDERING true
-
#undef NEED_INDICATE_EXEC_STACK
#define NEED_INDICATE_EXEC_STACK 1
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index a1562ad..094287f 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -808,9 +808,6 @@ char sparc_hard_reg_printed[8];
#define TARGET_ATTRIBUTE_TABLE sparc_attribute_table
#endif
-#undef TARGET_RELAXED_ORDERING
-#define TARGET_RELAXED_ORDERING SPARC_RELAXED_ORDERING
-
#undef TARGET_OPTION_OVERRIDE
#define TARGET_OPTION_OVERRIDE sparc_option_override
diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
index 106d993..72dd18b 100644
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@@ -106,17 +106,6 @@ extern enum cmodel sparc_cmodel;
#define SPARC_DEFAULT_CMODEL CM_32
-/* The SPARC-V9 architecture defines a relaxed memory ordering model (RMO)
- which requires the following macro to be true if enabled. Prior to V9,
- there are no instructions to even talk about memory synchronization.
- Note that the UltraSPARC III processors don't implement RMO, unlike the
- UltraSPARC II processors. Niagara, Niagara-2, and Niagara-3 do not
- implement RMO either.
-
- Default to false; for example, Solaris never enables RMO, only ever uses
- total memory ordering (TMO). */
-#define SPARC_RELAXED_ORDERING false
-
/* Do not use the .note.GNU-stack convention by default. */
#define NEED_INDICATE_EXEC_STACK 0
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 91619e2..aa391c8 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -5490,7 +5490,7 @@ extern bool mark_used (tree, tsubst_flags_t);
extern void finish_static_data_member_decl (tree, tree, bool, tree, int);
extern tree cp_build_parm_decl (tree, tree);
extern tree get_guard (tree);
-extern tree get_guard_cond (tree);
+extern tree get_guard_cond (tree, bool);
extern tree set_guard (tree);
extern tree get_tls_wrapper_fn (tree);
extern void mark_needed (tree);
diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index a8cb358..8f92667 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -7211,7 +7211,7 @@ expand_static_init (tree decl, tree init)
looks like:
static <type> guard;
- if (!guard.first_byte) {
+ if (!__atomic_load (guard.first_byte)) {
if (__cxa_guard_acquire (&guard)) {
bool flag = false;
try {
@@ -7241,16 +7241,11 @@ expand_static_init (tree decl, tree init)
/* Create the guard variable. */
guard = get_guard (decl);
- /* This optimization isn't safe on targets with relaxed memory
- consistency. On such targets we force synchronization in
- __cxa_guard_acquire. */
- if (!targetm.relaxed_ordering || !thread_guard)
- {
- /* Begin the conditional initialization. */
- if_stmt = begin_if_stmt ();
- finish_if_stmt_cond (get_guard_cond (guard), if_stmt);
- then_clause = begin_compound_stmt (BCS_NO_SCOPE);
- }
+ /* Begin the conditional initialization. */
+ if_stmt = begin_if_stmt ();
+
+ finish_if_stmt_cond (get_guard_cond (guard, thread_guard), if_stmt);
+ then_clause = begin_compound_stmt (BCS_NO_SCOPE);
if (thread_guard)
{
@@ -7319,12 +7314,9 @@ expand_static_init (tree decl, tree init)
finish_if_stmt (inner_if_stmt);
}
- if (!targetm.relaxed_ordering || !thread_guard)
- {
- finish_compound_stmt (then_clause);
- finish_then_clause (if_stmt);
- finish_if_stmt (if_stmt);
- }
+ finish_compound_stmt (then_clause);
+ finish_then_clause (if_stmt);
+ finish_if_stmt (if_stmt);
}
else if (DECL_THREAD_LOCAL_P (decl))
tls_aggregates = tree_cons (init, decl, tls_aggregates);
diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c
index f1b3d0c..36aa8b4 100644
--- a/gcc/cp/decl2.c
+++ b/gcc/cp/decl2.c
@@ -3034,6 +3034,25 @@ get_guard (tree decl)
return guard;
}
+static tree
+build_atomic_load_byte (tree src, HOST_WIDE_INT model)
+{
+ tree ptr_type = build_pointer_type (char_type_node);
+ tree mem_model = build_int_cst (integer_type_node, model);
+ tree t, addr, val;
+ unsigned int size;
+ int fncode;
+
+ size = tree_to_uhwi (TYPE_SIZE_UNIT (char_type_node));
+
+ fncode = BUILT_IN_ATOMIC_LOAD_N + exact_log2 (size) + 1;
+ t = builtin_decl_implicit ((enum built_in_function) fncode);
+
+ addr = build1 (ADDR_EXPR, ptr_type, src);
+ val = build_call_expr (t, 2, addr, mem_model);
+ return val;
+}
+
/* Return those bits of the GUARD variable that should be set when the
guarded entity is actually initialized. */
@@ -3060,12 +3079,14 @@ get_guard_bits (tree guard)
variable has already been initialized. */
tree
-get_guard_cond (tree guard)
+get_guard_cond (tree guard, bool thread_safe)
{
tree guard_value;
- /* Check to see if the GUARD is zero. */
- guard = get_guard_bits (guard);
+ if (!thread_safe)
+ guard = get_guard_bits (guard);
+ else
+ guard = build_atomic_load_byte (guard, MEMMODEL_ACQUIRE);
/* Mask off all but the low bit. */
if (targetm.cxx.guard_mask_bit ())
@@ -3681,7 +3702,7 @@ one_static_initialization_or_destruction (tree decl, tree init, bool initp)
/* When using __cxa_atexit, we never try to destroy
anything from a static destructor. */
gcc_assert (initp);
- guard_cond = get_guard_cond (guard);
+ guard_cond = get_guard_cond (guard, false);
}
/* If we don't have __cxa_atexit, then we will be running
destructors from .fini sections, or their equivalents. So,
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f2f3497..a16cd92 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -11395,16 +11395,6 @@ routine for target specific customizations of the system printf
and scanf formatter settings.
@end defmac
-@deftypevr {Target Hook} bool TARGET_RELAXED_ORDERING
-If set to @code{true}, means that the target's memory model does not
-guarantee that loads which do not depend on one another will access
-main memory in the order of the instruction stream; if ordering is
-important, an explicit memory barrier must be used. This is true of
-many recent processors which implement a policy of ``relaxed,''
-``weak,'' or ``release'' memory consistency, such as Alpha, PowerPC,
-and ia64. The default is @code{false}.
-@end deftypevr
-
@deftypefn {Target Hook} {const char *} TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN (const_tree @var{typelist}, const_tree @var{funcdecl}, const_tree @var{val})
If defined, this macro returns the diagnostic message when it is
illegal to pass argument @var{val} to function @var{funcdecl}
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 35b02b7..93fb41c 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -8143,8 +8143,6 @@ routine for target specific customizations of the system printf
and scanf formatter settings.
@end defmac
-@hook TARGET_RELAXED_ORDERING
-
@hook TARGET_INVALID_ARG_FOR_UNPROTOTYPED_FN
@hook TARGET_INVALID_CONVERSION
diff --git a/gcc/target.def b/gcc/target.def
index f2cb81d..b606b81 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -5785,19 +5785,6 @@ for the primary source file, immediately after printing\n\
this to be done. The default is false.",
bool, false)
-/* True if the target is allowed to reorder memory accesses unless
- synchronization is explicitly requested. */
-DEFHOOKPOD
-(relaxed_ordering,
- "If set to @code{true}, means that the target's memory model does not\n\
-guarantee that loads which do not depend on one another will access\n\
-main memory in the order of the instruction stream; if ordering is\n\
-important, an explicit memory barrier must be used. This is true of\n\
-many recent processors which implement a policy of ``relaxed,''\n\
-``weak,'' or ``release'' memory consistency, such as Alpha, PowerPC,\n\
-and ia64. The default is @code{false}.",
- bool, false)
-
/* Returns true if we should generate exception tables for use with the
ARM EABI. The effects the encoding of function exception specifications. */
DEFHOOKPOD
next prev parent reply other threads:[~2015-05-29 13:18 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-22 11:26 Ramana Radhakrishnan
2015-05-22 11:37 ` Ramana Radhakrishnan
2015-05-22 13:49 ` Jason Merrill
2015-05-22 14:15 ` David Edelsohn
2015-05-22 14:42 ` Jason Merrill
2015-05-22 15:42 ` Ramana Radhakrishnan
2015-05-22 17:48 ` David Edelsohn
2015-05-22 18:19 ` Jason Merrill
2015-05-22 19:49 ` Richard Henderson
2015-05-29 13:32 ` Ramana Radhakrishnan [this message]
2015-05-29 16:34 ` Richard Henderson
2015-05-29 16:36 ` Richard Henderson
2015-05-29 20:53 ` Jason Merrill
2015-06-04 9:46 ` Ramana Radhakrishnan
2015-06-02 14:42 ` David Edelsohn
2015-05-22 14:28 ` Ramana Radhakrishnan
2015-05-24 18:55 Uros Bizjak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5568673A.6060007@foss.arm.com \
--to=ramana.radhakrishnan@foss.arm.com \
--cc=dje.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=jason@redhat.com \
--cc=munroesj@linux.vnet.ibm.com \
--cc=rth@redhat.com \
--cc=sellcey@mips.com \
--cc=triegel@redhat.com \
--cc=wilson@tuliptree.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).