Re: C++11 atomic library notes

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Re: C++11 atomic library notes
       [not found] <4E862864.2010607@redhat.com>
@ 2011-10-01  6:56 ` Marc Glisse
  2011-10-01 23:12   ` Andrew MacLeod
  2011-10-03 17:31 ` Richard Henderson
  2011-10-05  7:26 ` Jeffrey Yasskin
  2 siblings, 1 reply; 12+ messages in thread
From: Marc Glisse @ 2011-10-01  6:56 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Lawrence Crowl, Benjamin Kosnik, Richard Henderson, Aldy Hernandez, GCC

On Fri, 30 Sep 2011, Andrew MacLeod wrote:

> I've been working on GCC's C++11 atomic implementation.

Cool!

> In discussions with 
> Lawrence, I've recently discovered a fundamental change in what libstdc++-v3 
> is likely to provide as far as an implementation.
>
> Previously, header files provided a choice between a locked or a lock-free 
> implementation, preferring the lock-free version when available on the 
> architecture and falling back to the locked version in other cases.
>
> Now the thought is to provide lock-free instructions when possible, and fall 
> back to external function calls the rest of the time. These would then be 
> resolved by an application or system library.
>
> If proceeding with that change, it would be convenient to make the same calls 
> that other implementations are going to use, allowing OS or application 
> providers to simply provide a single library with atomic routines that can be 
> used  by multiple C++11 compilers.
>
> Since GCC 4.7 stage 1 is going to end shortly and it would be nice to get the 
> cxx-mem-model branch integrated, I quickly wrote up what the current plan for 
> the branch is regarding these external calls and such and brought up a couple 
> of issues.  Its located in the gcc wiki at: 
> http://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary

"The compiler must ensure that for any given object, it either ALWAYS 
inlines lock free routines, OR calls the external routines. For any given 
object, these cannot be intermixed."

Why? You give an example explaining why it is fine to link 386 and 486 
objects, and I cant see the difference. Not that I'm advocating mixing 
them, just wondering whether it really matters if it happens (by 
accident).

I assume that locks are supposed to be implemented in terms of those 
functions too (it sounds like lock uses atomic which uses lock ;-)

For the atomic version of a user-defined "small" POD type, do you intend 
to query the compiler about the presence of a volatile member to dispatch 
to the right function?

The design looks a lot like:
http://libcxx.llvm.org/atomic_design_a.html
which is good since the main point seems to be to share it between 
implementations. Are there others on board?

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: C++11 atomic library notes
  2011-10-01  6:56 ` C++11 atomic library notes Marc Glisse
@ 2011-10-01 23:12   ` Andrew MacLeod
  2011-10-02  8:40     ` Marc Glisse
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew MacLeod @ 2011-10-01 23:12 UTC (permalink / raw)
  To: gcc
  Cc: Marc Glisse, Lawrence Crowl, Benjamin Kosnik, Richard Henderson,
	Aldy Hernandez

On 10/01/2011 02:55 AM, Marc Glisse wrote:
>
>
> "The compiler must ensure that for any given object, it either ALWAYS 
> inlines lock free routines, OR calls the external routines. For any 
> given object, these cannot be intermixed."
>
> Why? You give an example explaining why it is fine to link 386 and 486 
> objects, and I cant see the difference. Not that I'm advocating mixing 
> them, just wondering whether it really matters if it happens (by 
> accident).

If we have an architecture which we cannot generate one of the functions 
for, say __atomic_load_16, then it will have to use whatever the library 
supplies. If you continues to generate all the rest of the __atomic 
builtins for 16 bytes using lock free instructions, and the call to the 
library turns out to be a locked implementation at runtime, then atomic 
support for 16 byte objects is broken. The load thinks its getting a 
lock, but none of the other routines pay any attention to locks. So if 
one atomic operations requires then library, they all do in order to get 
consistent behaviour.

>
> I assume that locks are supposed to be implemented in terms of those 
> functions too (it sounds like lock uses atomic which uses lock ;-)
>
> For the atomic version of a user-defined "small" POD type, do you 
> intend to query the compiler about the presence of a volatile member 
> to dispatch to the right function?
if it is sorted out that volatile does require something different, then 
we'd need to dispatch differently for volatile.  At the moment Im not 
doing anything different for them, just asking "the experts" if we need 
to :-)  I suspect we might need to, but Its easier to add something 
later than to remove something thats already in a compiler.
>
> The design looks a lot like:
> http://libcxx.llvm.org/atomic_design_a.html
> which is good since the main point seems to be to share it between 
> implementations. Are there others on board?

Im not aware that anyone has done anything with an external library 
yet.   Hopefully this will make us all aware of each other and get 
consistent :-).  I hadn's seen that link, I guess the name is a good 
choice :-)

Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: C++11 atomic library notes
  2011-10-01 23:12   ` Andrew MacLeod
@ 2011-10-02  8:40     ` Marc Glisse
  2011-10-02 13:56       ` Andrew MacLeod
  0 siblings, 1 reply; 12+ messages in thread
From: Marc Glisse @ 2011-10-02  8:40 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: gcc, Lawrence Crowl, Benjamin Kosnik, Richard Henderson, Aldy Hernandez

On Sat, 1 Oct 2011, Andrew MacLeod wrote:

> On 10/01/2011 02:55 AM, Marc Glisse wrote:
>> 
>> "The compiler must ensure that for any given object, it either ALWAYS 
>> inlines lock free routines, OR calls the external routines. For any given 
>> object, these cannot be intermixed."
>> 
>> Why? You give an example explaining why it is fine to link 386 and 486 
>> objects, and I cant see the difference. Not that I'm advocating mixing 
>> them, just wondering whether it really matters if it happens (by accident).
>
> If we have an architecture which we cannot generate one of the functions for, 
> say __atomic_load_16, then it will have to use whatever the library supplies. 
> If you continues to generate all the rest of the __atomic builtins for 16 
> bytes using lock free instructions, and the call to the library turns out to 
> be a locked implementation at runtime, then atomic support for 16 byte 
> objects is broken. The load thinks its getting a lock, but none of the other 
> routines pay any attention to locks. So if one atomic operations requires 
> then library, they all do in order to get consistent behaviour.

Ah ok, I had understood:
* if __atomic_store_8 is inlined on line 18, it should also be inlined on 
line 42

when instead it is:
* we can't have a locked addition and a lock-free subtraction (hence the 
__atomic_is_lock_free which only takes a size as argument)

Makes perfect sense, thank you for the precision.

By the way, does it make sense to work atomically on a 16 byte object, and 
also work atomically on its first 8 bytes, thus potentially requiring 
__atomic_is_lock_free not to depend on the size?

-- 
Marc Glisse

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: C++11 atomic library notes
  2011-10-02  8:40     ` Marc Glisse
@ 2011-10-02 13:56       ` Andrew MacLeod
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew MacLeod @ 2011-10-02 13:56 UTC (permalink / raw)
  To: gcc
  Cc: Marc Glisse, Lawrence Crowl, Benjamin Kosnik, Richard Henderson,
	Aldy Hernandez

> On Sat, 1 Oct 2011, Andrew MacLeod wrote:
>
>>
>
> Ah ok, I had understood:
> * if __atomic_store_8 is inlined on line 18, it should also be inlined 
> on line 42
>
> when instead it is:
> * we can't have a locked addition and a lock-free subtraction (hence 
> the __atomic_is_lock_free which only takes a size as argument)
>
> Makes perfect sense, thank you for the precision.

I Added another line in the document to clarify this, thanks.

>
> By the way, does it make sense to work atomically on a 16 byte object, 
> and also work atomically on its first 8 bytes, thus potentially 
> requiring __atomic_is_lock_free not to depend on the size?
>
not really...  working on the first 8 bytes of the 16 byte atomic object 
'breaks' the atomicity of the 16 byte object.   Volatility of the object 
is the only thing that *may* impact this, AFAICT.

Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: C++11 atomic library notes
       [not found] <4E862864.2010607@redhat.com>
  2011-10-01  6:56 ` C++11 atomic library notes Marc Glisse
@ 2011-10-03 17:31 ` Richard Henderson
  2011-10-03 17:54   ` Andrew MacLeod
  2011-10-05  7:26 ` Jeffrey Yasskin
  2 siblings, 1 reply; 12+ messages in thread
From: Richard Henderson @ 2011-10-03 17:31 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: Lawrence Crowl, Benjamin Kosnik, Aldy Hernandez, GCC

On 09/30/2011 01:36 PM, Andrew MacLeod wrote:
> http://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary


>   __atomic_store (size_t obj_size, T *mem, T val, enum memory_model model)

I don't like this.  I really cannot imagine any situation for which the
compiler can't resolve SIZE to a compile-time constant.  I think it's
pointless to have a dispatch routine that just calls all of

>   __atomic_store_1  (T *mem, T val, enum memory_model model)
>   __atomic_store_2  (T *mem, T val, enum memory_model model)
>   __atomic_store_4  (T *mem, T val, enum memory_model model)
>   __atomic_store_8  (T *mem, T val, enum memory_model model)
>   __atomic_store_16 (T *mem, T val, enum memory_model model)

and so forth.



r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: C++11 atomic library notes
  2011-10-03 17:31 ` Richard Henderson
@ 2011-10-03 17:54   ` Andrew MacLeod
  2011-10-03 18:10     ` Richard Henderson
  2011-10-03 19:52     ` Joseph S. Myers
  0 siblings, 2 replies; 12+ messages in thread
From: Andrew MacLeod @ 2011-10-03 17:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Lawrence Crowl, Benjamin Kosnik, Aldy Hernandez, GCC

On 10/03/2011 01:31 PM, Richard Henderson wrote:
> On 09/30/2011 01:36 PM, Andrew MacLeod wrote:
>> http://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary
>
>>    __atomic_store (size_t obj_size, T *mem, T val, enum memory_model model)
> I don't like this.  I really cannot imagine any situation for which the
> compiler can't resolve SIZE to a compile-time constant.  I think it's
> pointless to have a dispatch routine that just calls all of
>
its a library call for arbitrary sized objects...  C++ can have any 
class declared atomic, so it doesn't have to map to one of those 
optimized lock-free routines.


Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: C++11 atomic library notes
  2011-10-03 17:54   ` Andrew MacLeod
@ 2011-10-03 18:10     ` Richard Henderson
  2011-10-03 19:52     ` Joseph S. Myers
  1 sibling, 0 replies; 12+ messages in thread
From: Richard Henderson @ 2011-10-03 18:10 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: Lawrence Crowl, Benjamin Kosnik, Aldy Hernandez, GCC

On 10/03/2011 10:54 AM, Andrew MacLeod wrote:
> its a library call for arbitrary sized objects...  C++ can have any
> class declared atomic, so it doesn't have to map to one of those
> optimized lock-free routines.

Ah, I get it now.  Ew.


r~

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: C++11 atomic library notes
  2011-10-03 17:54   ` Andrew MacLeod
  2011-10-03 18:10     ` Richard Henderson
@ 2011-10-03 19:52     ` Joseph S. Myers
  1 sibling, 0 replies; 12+ messages in thread
From: Joseph S. Myers @ 2011-10-03 19:52 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Richard Henderson, Lawrence Crowl, Benjamin Kosnik, Aldy Hernandez, GCC

On Mon, 3 Oct 2011, Andrew MacLeod wrote:

> On 10/03/2011 01:31 PM, Richard Henderson wrote:
> > On 09/30/2011 01:36 PM, Andrew MacLeod wrote:
> > > http://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary
> > 
> > >    __atomic_store (size_t obj_size, T *mem, T val, enum memory_model
> > > model)
> > I don't like this.  I really cannot imagine any situation for which the
> > compiler can't resolve SIZE to a compile-time constant.  I think it's
> > pointless to have a dispatch routine that just calls all of
> > 
> its a library call for arbitrary sized objects...  C++ can have any class
> declared atomic, so it doesn't have to map to one of those optimized lock-free
> routines.

Likewise, in C1X you can also apply _Atomic to arbitrary-size structures.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: C++11 atomic library notes
       [not found] <4E862864.2010607@redhat.com>
  2011-10-01  6:56 ` C++11 atomic library notes Marc Glisse
  2011-10-03 17:31 ` Richard Henderson
@ 2011-10-05  7:26 ` Jeffrey Yasskin
  2011-10-05 18:58   ` Andrew MacLeod
  2 siblings, 1 reply; 12+ messages in thread
From: Jeffrey Yasskin @ 2011-10-05  7:26 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Lawrence Crowl, Benjamin Kosnik, Richard Henderson, Aldy Hernandez, GCC

On Fri, Sep 30, 2011 at 1:36 PM, Andrew MacLeod <amacleod@redhat.com> wrote:
> I've been working on GCC's C++11 atomic implementation. In discussions with
> Lawrence, I've recently discovered a fundamental change in what libstdc++-v3
> is likely to provide as far as an implementation.
>
> Previously, header files provided a choice between a locked or a lock-free
> implementation, preferring the lock-free version when available on the
> architecture and falling back to the locked version in other cases.
>
> Now the thought is to provide lock-free instructions when possible, and fall
> back to external function calls the rest of the time. These would then be
> resolved by an application or system library.
>
> If proceeding with that change, it would be convenient to make the same
> calls that other implementations are going to use, allowing OS or
> application providers to simply provide a single library with atomic
> routines that can be used  by multiple C++11 compilers.
>
> Since GCC 4.7 stage 1 is going to end shortly and it would be nice to get
> the cxx-mem-model branch integrated, I quickly wrote up what the current
> plan for the branch is regarding these external calls and such and brought
> up a couple of issues.  Its located in the gcc wiki at:
> http://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary
>
> Its my first cut at it, so hopefully its mostly correct :-)
>
> If anyone has any interest or input on this subject, the sooner it is
> brought up the better!

I wanted to comment on
http://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary#volatility. Say we have:

typedef pair<int*, int64> DWord;
std::atomic<DWord> shared_var;

void thread1() {
  use(shared_var.load());
}

void thread2() {
  // It's legal to add "volatile" in a pointer or reference cast.
  volatile std::atomic<DWord>* v_shared_var = &shared_var;
  // Now this looks identical to an access to a real volatile object.
  v_shared_var->store(DWord(ptr, val));
}

If, as the document proposes, "16 byte volatile will have to call the
external rotines, but 16 byte non-volatiles would be lock-free.", and
the external routines use locked accesses for 16-byte volatile
atomics, then this makes the concurrent accesses to shared_var not
thread-safe. To be thread-safe, we'd have to call the external
routines for every 16-byte atomic, not just the volatile ones, and
those routines would have to use locked accesses uniformly rather than
distinguishing between volatile and non-volatile accesses. Not good.

Even worse, on LL/SC architectures, every lock-free RMW operation
potentially involves multiple loads, so this interpretation of
volatility would prohibit lock-free access to all objects.

I see two ways out:
1) Say that accessing a non-volatile atomic through a volatile
reference or pointer causes undefined behavior. The standard doesn't
say that, and the casts are implicit, so this is icky.
2) Say that volatile atomic accesses may be implemented with more than
one instruction-level access.

(2) is something like how volatile reads of 128-bit structs involve
multiple mov instructions that execute in an arbitrary order. It's
also unlikely to cause problems in existing programs because nobody's
using volatile atomics yet, and they'll only start using them in ways
that work with what compilers implement.

Jeffrey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: C++11 atomic library notes
  2011-10-05  7:26 ` Jeffrey Yasskin
@ 2011-10-05 18:58   ` Andrew MacLeod
  2011-10-05 19:07     ` Jeffrey Yasskin
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew MacLeod @ 2011-10-05 18:58 UTC (permalink / raw)
  To: Jeffrey Yasskin
  Cc: Lawrence Crowl, Benjamin Kosnik, Richard Henderson, Aldy Hernandez, GCC

On 10/05/2011 12:14 AM, Jeffrey Yasskin wrote:
>
> If, as the document proposes, "16 byte volatile will have to call the
> external rotines, but 16 byte non-volatiles would be lock-free.", and
> the external routines use locked accesses for 16-byte volatile
> atomics, then this makes the concurrent accesses to shared_var not
> thread-safe. To be thread-safe, we'd have to call the external
> routines for every 16-byte atomic, not just the volatile ones, and
> those routines would have to use locked accesses uniformly rather than
> distinguishing between volatile and non-volatile accesses. Not good.
>

This would seem to support that an object of a given size must be 
consistent, and that volatility is not a basis to segregate behaviour.
Which is good because thats the result I want but was concerned about :-)
> Even worse, on LL/SC architectures, every lock-free RMW operation
> potentially involves multiple loads, so this interpretation of
> volatility would prohibit lock-free access to all objects.
>
> I see two ways out:
> 1) Say that accessing a non-volatile atomic through a volatile
> reference or pointer causes undefined behavior. The standard doesn't
> say that, and the casts are implicit, so this is icky.
> 2) Say that volatile atomic accesses may be implemented with more than
> one instruction-level access.
>
> (2) is something like how volatile reads of 128-bit structs involve
> multiple mov instructions that execute in an arbitrary order. It's
> also unlikely to cause problems in existing programs because nobody's
> using volatile atomics yet, and they'll only start using them in ways
> that work with what compilers implement.

To clarify, you are suggesting that we say atomic accesses to volatile 
objects may involve more than a single load?

Can we also state that a 'harmless' store may also happen? (ie, a 0 to 
an existing 0, or some other arbitrary value)   Otherwise I don't know 
how to get a 128 bit atomic load on x86-64 :-P  which then means no 
inlined lock-free atomics on 16 byte values.
Its unpleasant, but...  other suggestions?

Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: C++11 atomic library notes
  2011-10-05 18:58   ` Andrew MacLeod
@ 2011-10-05 19:07     ` Jeffrey Yasskin
  2011-10-05 20:12       ` Andrew MacLeod
  0 siblings, 1 reply; 12+ messages in thread
From: Jeffrey Yasskin @ 2011-10-05 19:07 UTC (permalink / raw)
  To: Andrew MacLeod
  Cc: Lawrence Crowl, Benjamin Kosnik, Richard Henderson, Aldy Hernandez, GCC

On Wed, Oct 5, 2011 at 5:49 AM, Andrew MacLeod <amacleod@redhat.com> wrote:
> On 10/05/2011 12:14 AM, Jeffrey Yasskin wrote:
>> I see two ways out:
>> 1) Say that accessing a non-volatile atomic through a volatile
>> reference or pointer causes undefined behavior. The standard doesn't
>> say that, and the casts are implicit, so this is icky.
>> 2) Say that volatile atomic accesses may be implemented with more than
>> one instruction-level access.
>>
>> (2) is something like how volatile reads of 128-bit structs involve
>> multiple mov instructions that execute in an arbitrary order. It's
>> also unlikely to cause problems in existing programs because nobody's
>> using volatile atomics yet, and they'll only start using them in ways
>> that work with what compilers implement.
>
> To clarify, you are suggesting that we say atomic accesses to volatile
> objects may involve more than a single load?
>
> Can we also state that a 'harmless' store may also happen? (ie, a 0 to an
> existing 0, or some other arbitrary value)   Otherwise I don't know how to
> get a 128 bit atomic load on x86-64 :-P  which then means no inlined
> lock-free atomics on 16 byte values.
> Its unpleasant, but...  other suggestions?

Yes, that's what I'm suggesting. The rule for 'volatile' from the
language is just that "Accesses to volatile objects are evaluated
strictly according to the rules of the abstract machine." If the
instruction-level implementation for a 16-byte atomic load is
cmpxchg16b, then that's just how the abstract machine is implemented,
and the rule says you have to do that consistently for volatile
objects rather than sometimes optimizing it away. That's my argument
anyway. If there's another standard you're following beyond "kernel
people tend to ask for it," the situation may be trickier.

Jeffrey

P.S. On x86, cmpxchg's description says, "To simplify the interface to
the processor’s bus, the destination operand receives a write cycle
without regard to the result of the comparison. The destination
operand is written back if the comparison fails; otherwise, the source
operand is written into the destination. (The processor never produces
a locked read without also producing a locked write.)", so 128-bit
atomic loads will always write the original value back.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: C++11 atomic library notes
  2011-10-05 19:07     ` Jeffrey Yasskin
@ 2011-10-05 20:12       ` Andrew MacLeod
  0 siblings, 0 replies; 12+ messages in thread
From: Andrew MacLeod @ 2011-10-05 20:12 UTC (permalink / raw)
  To: Jeffrey Yasskin
  Cc: Lawrence Crowl, Benjamin Kosnik, Richard Henderson, Aldy Hernandez, GCC

On 10/05/2011 10:44 AM, Jeffrey Yasskin wrote:
>
> Yes, that's what I'm suggesting. The rule for 'volatile' from the
> language is just that "Accesses to volatile objects are evaluated
> strictly according to the rules of the abstract machine." If the
> instruction-level implementation for a 16-byte atomic load is
> cmpxchg16b, then that's just how the abstract machine is implemented,
> and the rule says you have to do that consistently for volatile
> objects rather than sometimes optimizing it away. That's my argument
> anyway. If there's another standard you're following beyond "kernel
> people tend to ask for it," the situation may be trickier.

perfect, I like it.
Andrew

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-10-05 14:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4E862864.2010607@redhat.com>
2011-10-01  6:56 ` C++11 atomic library notes Marc Glisse
2011-10-01 23:12   ` Andrew MacLeod
2011-10-02  8:40     ` Marc Glisse
2011-10-02 13:56       ` Andrew MacLeod
2011-10-03 17:31 ` Richard Henderson
2011-10-03 17:54   ` Andrew MacLeod
2011-10-03 18:10     ` Richard Henderson
2011-10-03 19:52     ` Joseph S. Myers
2011-10-05  7:26 ` Jeffrey Yasskin
2011-10-05 18:58   ` Andrew MacLeod
2011-10-05 19:07     ` Jeffrey Yasskin
2011-10-05 20:12       ` Andrew MacLeod

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).