C++0x Memory model and gcc

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* C++0x Memory model and gcc
@ 2010-05-06 15:43 Andrew MacLeod
  2010-05-06 15:51 ` Richard Guenther
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Andrew MacLeod @ 2010-05-06 15:43 UTC (permalink / raw)
  To: GCC

I've been working for a while on understanding how the new memory model 
and Atomics work, and what the impacts are on GCC.

It would be ideal to get as many of these changes into GCC 4.6 as 
possible. I've started work on some of the modifications and testing,  
and the overall impact on GCC shouldn't be *too* bad :-)

The plan is to localize the changes as much as possible, and any 
intrusive bits like optimization changes will be controlled by a flag 
enabling us to keep the current behaviour when we want it.

I've put together a document summarizing how the memory model works, and 
how I propose to make the changes. I've converted it to wiki pages.  
Maybe no one will laugh at my choice of document format this time :-)

The document is linked off the Atomics wiki page, or directly  here:  
http://gcc.gnu.org/wiki/Atomic/GCCMM

It consists mainly of describing the 2 primary aspects of the memory 
model which affects us
 - Optimization changes to avoid introducing new data races
 - Implementation of atomic variables and synchronization modes
as well as a new infrastructure to test these types of things.

I'm sure I've screwed something up while doing it, and I will proofread 
it later today again and tweak it further.

Please point out anything that isn't clear,  or is downright wrong. 
Especially in the testing methodology since its all new stuff.
Suggestions for improvements on any of the plan are welcome as well.

Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-06 15:43 C++0x Memory model and gcc Andrew MacLeod
@ 2010-05-06 15:51 ` Richard Guenther
  2010-05-06 16:11   ` Richard Guenther
  2010-05-06 15:54 ` Joseph S. Myers
  2010-05-06 20:40 ` Ian Lance Taylor
  2 siblings, 1 reply; 27+ messages in thread
From: Richard Guenther @ 2010-05-06 15:51 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: GCC

On Thu, May 6, 2010 at 5:43 PM, Andrew MacLeod <amacleod@redhat.com> wrote:
> I've been working for a while on understanding how the new memory model and
> Atomics work, and what the impacts are on GCC.
>
> It would be ideal to get as many of these changes into GCC 4.6 as possible.
> I've started work on some of the modifications and testing,  and the overall
> impact on GCC shouldn't be *too* bad :-)
>
> The plan is to localize the changes as much as possible, and any intrusive
> bits like optimization changes will be controlled by a flag enabling us to
> keep the current behaviour when we want it.
>
> I've put together a document summarizing how the memory model works, and how
> I propose to make the changes. I've converted it to wiki pages.  Maybe no
> one will laugh at my choice of document format this time :-)
>
> The document is linked off the Atomics wiki page, or directly  here:
>  http://gcc.gnu.org/wiki/Atomic/GCCMM
>
> It consists mainly of describing the 2 primary aspects of the memory model
> which affects us
> - Optimization changes to avoid introducing new data races
> - Implementation of atomic variables and synchronization modes
> as well as a new infrastructure to test these types of things.
>
> I'm sure I've screwed something up while doing it, and I will proofread it
> later today again and tweak it further.
>
> Please point out anything that isn't clear,  or is downright wrong.
> Especially in the testing methodology since its all new stuff.
> Suggestions for improvements on any of the plan are welcome as well.

First let me say that the C++ memory model is crap when it
forces data-races to be avoided for unannotated data like
the examples for packed data.

Well, I hope that instead of just disabling optimizations you
will help to improve their implementation to be able to optimize
in a conformant manner.

Richard.

> Andrew
>
>
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-06 15:51 ` Richard Guenther
@ 2010-05-06 16:11   ` Richard Guenther
  2010-05-06 16:23     ` Andrew MacLeod
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Guenther @ 2010-05-06 16:11 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: GCC

On Thu, May 6, 2010 at 5:50 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Thu, May 6, 2010 at 5:43 PM, Andrew MacLeod <amacleod@redhat.com> wrote:
>> I've been working for a while on understanding how the new memory model and
>> Atomics work, and what the impacts are on GCC.
>>
>> It would be ideal to get as many of these changes into GCC 4.6 as possible.
>> I've started work on some of the modifications and testing,  and the overall
>> impact on GCC shouldn't be *too* bad :-)
>>
>> The plan is to localize the changes as much as possible, and any intrusive
>> bits like optimization changes will be controlled by a flag enabling us to
>> keep the current behaviour when we want it.
>>
>> I've put together a document summarizing how the memory model works, and how
>> I propose to make the changes. I've converted it to wiki pages.  Maybe no
>> one will laugh at my choice of document format this time :-)
>>
>> The document is linked off the Atomics wiki page, or directly  here:
>>  http://gcc.gnu.org/wiki/Atomic/GCCMM
>>
>> It consists mainly of describing the 2 primary aspects of the memory model
>> which affects us
>> - Optimization changes to avoid introducing new data races
>> - Implementation of atomic variables and synchronization modes
>> as well as a new infrastructure to test these types of things.
>>
>> I'm sure I've screwed something up while doing it, and I will proofread it
>> later today again and tweak it further.
>>
>> Please point out anything that isn't clear,  or is downright wrong.
>> Especially in the testing methodology since its all new stuff.
>> Suggestions for improvements on any of the plan are welcome as well.
>
> First let me say that the C++ memory model is crap when it
> forces data-races to be avoided for unannotated data like
> the examples for packed data.
>
> Well, I hope that instead of just disabling optimizations you
> will help to improve their implementation to be able to optimize
> in a conformant manner.

And btw, if you are thinking on how to represent the extra
data-dependencies required for the consistency models think
of how to extend whatever you need in infrastructure for that
to also allow FENV dependencies - it's a quite similar problem
(FENV query/set are the atomic operations, usual arithmetic
is what the dependency is to).  It's completely non-trivial
(because it's scalar code, not memory accesses).  For
atomics you should be able to just massage the alias-oracle
data-dependence routines (maybe).

Richard.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-06 16:11   ` Richard Guenther
@ 2010-05-06 16:23     ` Andrew MacLeod
  2010-05-07  9:15       ` Richard Guenther
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew MacLeod @ 2010-05-06 16:23 UTC (permalink / raw)
  To: Richard Guenther; +Cc: GCC

Richard Guenther wrote:
> On Thu, May 6, 2010 at 5:50 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>   
>> First let me say that the C++ memory model is crap when it
>> forces data-races to be avoided for unannotated data like
>> the examples for packed data.
>>     

And it isn't consistent across the board, since neighbouring bits 
normally don't qualify and can introduce data races. I don't like it 
when a solution has exceptions like that. It is what it is however, and 
last I heard the plan was for C to adopt the changes as well. 
>> Well, I hope that instead of just disabling optimizations you
>> will help to improve their implementation to be able to optimize
>> in a conformant manner.
>>     
I don't want to disable any more than required. SSA names aren't 
affected since they are local variables only,  its only operations on 
shared memory, and I am hopeful that I can minimize the restrictions 
placed on them.  Some will be more interesting than others... like 
CSE... you can still perform CSE on a global as long as you don't 
introduce a NEW load on some execution path that didn't have before. 
What fun.
>
> And btw, if you are thinking on how to represent the extra
> data-dependencies required for the consistency models think
> of how to extend whatever you need in infrastructure for that
> to also allow FENV dependencies - it's a quite similar problem
> (FENV query/set are the atomic operations, usual arithmetic
> is what the dependency is to).  It's completely non-trivial
> (because it's scalar code, not memory accesses).  For
> atomics you should be able to just massage the alias-oracle
> data-dependence routines (maybe).
>   

That's what I'm hoping actually..

Andrew.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-06 16:23     ` Andrew MacLeod
@ 2010-05-07  9:15       ` Richard Guenther
  2010-05-07 13:27         ` Andrew MacLeod
  0 siblings, 1 reply; 27+ messages in thread
From: Richard Guenther @ 2010-05-07  9:15 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: GCC

On Thu, May 6, 2010 at 6:22 PM, Andrew MacLeod <amacleod@redhat.com> wrote:
> Richard Guenther wrote:
>>
>> On Thu, May 6, 2010 at 5:50 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>
>>>
>>> First let me say that the C++ memory model is crap when it
>>> forces data-races to be avoided for unannotated data like
>>> the examples for packed data.
>>>
>
> And it isn't consistent across the board, since neighbouring bits normally
> don't qualify and can introduce data races. I don't like it when a solution
> has exceptions like that. It is what it is however, and last I heard the
> plan was for C to adopt the changes as well.

I would have hoped that only data races between independent
objects are covered, thus

 tmp = a.i;
 b.j = tmp;

would qualify as a load of a and a store to b as far as dependencies
are concerned.  That would have been consistent with the
exceptions for bitfields and much more friendly to architectures
with weak support for unaligned accesses.

>>>
>>> Well, I hope that instead of just disabling optimizations you
>>> will help to improve their implementation to be able to optimize
>>> in a conformant manner.
>>>
>
> I don't want to disable any more than required. SSA names aren't affected
> since they are local variables only,  its only operations on shared memory,
> and I am hopeful that I can minimize the restrictions placed on them.  Some
> will be more interesting than others... like CSE... you can still perform
> CSE on a global as long as you don't introduce a NEW load on some execution
> path that didn't have before. What fun.

I don't understand that restriction anyway - how can an extra
load cause a data-race if the result is only used when it was
used before?  (You'd need to disable PPRE and GCSE completely
if that's really a problem)

Thus,

if (p)
  tmp = load;
...
if (q)
  use tmp;

how can transforming that to

tmp = load;
...
if (q)
  use tmp;

ever cause a problem?

>> And btw, if you are thinking on how to represent the extra
>> data-dependencies required for the consistency models think
>> of how to extend whatever you need in infrastructure for that
>> to also allow FENV dependencies - it's a quite similar problem
>> (FENV query/set are the atomic operations, usual arithmetic
>> is what the dependency is to).  It's completely non-trivial
>> (because it's scalar code, not memory accesses).  For
>> atomics you should be able to just massage the alias-oracle
>> data-dependence routines (maybe).
>>
>
> That's what I'm hoping actually..

We'll see.

Richard.

> Andrew.
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-07  9:15       ` Richard Guenther
@ 2010-05-07 13:27         ` Andrew MacLeod
  2010-05-07 14:26           ` Ian Lance Taylor
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew MacLeod @ 2010-05-07 13:27 UTC (permalink / raw)
  To: Richard Guenther; +Cc: GCC

Richard Guenther wrote:
> On Thu, May 6, 2010 at 6:22 PM, Andrew MacLeod <amacleod@redhat.com> wrote:
>   
>> Richard Guenther wrote:
>>
>
> I would have hoped that only data races between independent
> objects are covered, thus
>
>  tmp = a.i;
>  b.j = tmp;
>
> would qualify as a load of a and a store to b as far as dependencies
> are concerned.  That would have been consistent with the
> exceptions for bitfields and much more friendly to architectures
> with weak support for unaligned accesses.
>   
They are independent as far as dependencies within this compilation unit.
The problem is if thread number 2 is performing
  a.j = val
  b.i = val2

now there are data races on both A and B if we load/store full words and 
the struct was something like: 
struct {
  char i;
  char j;
} a, b;

The store to B is particularly unpleasant since you may lose one of the 
2 stores.  The load data race on A is only in the territory of hardware 
or software race detectors.

>>
>> I don't want to disable any more than required. SSA names aren't affected
>> since they are local variables only,  its only operations on shared memory,
>> and I am hopeful that I can minimize the restrictions placed on them.  Some
>> will be more interesting than others... like CSE... you can still perform
>> CSE on a global as long as you don't introduce a NEW load on some execution
>> path that didn't have before. What fun.
>>     
>
> I don't understand that restriction anyway - how can an extra
> load cause a data-race if the result is only used when it was
> used before?  (You'd need to disable PPRE and GCSE completely
> if that's really a problem)
>
> Thus,
>
> if (p)
>   tmp = load;
> ...
> if (q)
>   use tmp;
>
> how can transforming that to
>
> tmp = load;
> ...
> if (q)
>   use tmp;
>
> ever cause a problem?
>   

If the other thread is doing something like:

if (!p)
  load = something

then there was no data race before since your thread wasn't performing a 
load when this thread was storing.  ie, 'p' was being used as the 
synchronization guard that prevented a data race.  When you do the 
transformation,  there is now a potential race that wasn't there before.

Some hardware can detect that and trigger an exception, or a software 
data race detector could trigger, and neither would have before, which 
means the behaviour is detectably different.

Thats also why I've separated the loads and stores for handling 
separately.  under normal circumstances, we want to allow this 
transformation.  If there aren't any detection abilities in play, then 
the transformation is fine... you can't tell that there was a race.  
With stores you can actually get different results, so we do need to 
monitor those.

Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-07 13:27         ` Andrew MacLeod
@ 2010-05-07 14:26           ` Ian Lance Taylor
  0 siblings, 0 replies; 27+ messages in thread
From: Ian Lance Taylor @ 2010-05-07 14:26 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: Richard Guenther, GCC

Andrew MacLeod <amacleod@redhat.com> writes:

> They are independent as far as dependencies within this compilation unit.
> The problem is if thread number 2 is performing
>  a.j = val
>  b.i = val2
>
> now there are data races on both A and B if we load/store full words
> and the struct was something like: struct {
>  char i;
>  char j;
> } a, b;
>
> The store to B is particularly unpleasant since you may lose one of
> the 2 stores.  The load data race on A is only in the territory of
> hardware or software race detectors.

In this exmaple, if we do a word access to a, then we are running past
the boundaries of the struct.  We can only assume that is OK if a is
aligned to a word boundary.  And if both a and b are aligned to word
boundaries, then there is no problem doing a word access to a.

So the only potential problem here is if we have two small variables
where one is aligned and the other is not.  This is an unusual
situation because small variables are not normally aligned.  We can
avoid trouble by forcing an alignment to a word boundary after every
aligned variable.

Or so it seems to me.

Ian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-06 15:43 C++0x Memory model and gcc Andrew MacLeod
  2010-05-06 15:51 ` Richard Guenther
@ 2010-05-06 15:54 ` Joseph S. Myers
  2010-05-06 16:12   ` Andrew MacLeod
  2010-05-06 20:40 ` Ian Lance Taylor
  2 siblings, 1 reply; 27+ messages in thread
From: Joseph S. Myers @ 2010-05-06 15:54 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: GCC

On Thu, 6 May 2010, Andrew MacLeod wrote:

> - Implementation of atomic variables and synchronization modes
> as well as a new infrastructure to test these types of things.

I presume you've read the long thread starting at 
<http://gcc.gnu.org/ml/gcc/2009-08/msg00199.html> regarding the issues 
involved in implementing the atomics (involving compiler and libc 
cooperation to provide stdatomic.h), and in particular ensuring that code 
built for one CPU remains safe on later CPU variants that may have more 
native atomic operations.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-06 15:54 ` Joseph S. Myers
@ 2010-05-06 16:12   ` Andrew MacLeod
  0 siblings, 0 replies; 27+ messages in thread
From: Andrew MacLeod @ 2010-05-06 16:12 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: GCC

Joseph S. Myers wrote:
> On Thu, 6 May 2010, Andrew MacLeod wrote:
>
>   
>> - Implementation of atomic variables and synchronization modes
>> as well as a new infrastructure to test these types of things.
>>     
>
> I presume you've read the long thread starting at 
> <http://gcc.gnu.org/ml/gcc/2009-08/msg00199.html> regarding the issues 
> involved in implementing the atomics (involving compiler and libc 
> cooperation to provide stdatomic.h), and in particular ensuring that code 
> built for one CPU remains safe on later CPU variants that may have more 
> native atomic operations.
>
>   
Im not actually doing the implementation of Atomics themselves right 
now, Lawrence is looking at that. Im focusing on the GCC optimization 
requirements, changes and testing.  I'll leave issues like you point out 
there to the guys like like that stuff :-)

I couldn't understand a lot  of the atomic synchronization stuff from 
existing documentation, so I figured that part might help others 
understand it better too.  It still gives me a headache.

Andrew


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-06 15:43 C++0x Memory model and gcc Andrew MacLeod
  2010-05-06 15:51 ` Richard Guenther
  2010-05-06 15:54 ` Joseph S. Myers
@ 2010-05-06 20:40 ` Ian Lance Taylor
  2010-05-06 22:02   ` Andrew MacLeod
  2010-05-07 13:55   ` Mark Mitchell
  2 siblings, 2 replies; 27+ messages in thread
From: Ian Lance Taylor @ 2010-05-06 20:40 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: GCC

Andrew MacLeod <amacleod@redhat.com> writes:

> I've been working for a while on understanding how the new memory
> model and Atomics work, and what the impacts are on GCC.

Thanks for looking at this.

One issue I didn't see clearly was how to actually implement this in
the compiler.  For example, speculated stores are fine for local stack
variables, but not for global variables or heap memory.  We can
implement that in the compiler via a set of tests at each potential
speculated store.  Or we can implement it via a constraint expressed
directly in the IR--perhaps some indicator that this specific store
may not merge with conditionals.  The latter approach is harder to
design but I suspect will be more likely to be reliable over time.
The former approach is straightforward to patch into the compiler but
can easily degrade as people who don't understand the issues work on
the code.

I don't agree with your proposed command line options.  They seem fine
for internal use, but I think very very few users would know when or
whether they should use -fno-data-race-stores.  I think you should
downgrade those options to a --param value, and think about a
multi-layered -fmemory-model option.  E.g.,
    -fmemory-model=single
        Assume single threaded execution, which also means no signal
        handlers.
    -fmemory-model=fast
        The user is responsible for all synchronization.  Accessing
        the same memory words from different threads may break
        unpredictably.
    -fmemory-model=safe
        The compiler will do its best to protect you.

Ian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-06 20:40 ` Ian Lance Taylor
@ 2010-05-06 22:02   ` Andrew MacLeod
  2010-05-07 13:55   ` Mark Mitchell
  1 sibling, 0 replies; 27+ messages in thread
From: Andrew MacLeod @ 2010-05-06 22:02 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: GCC

Ian Lance Taylor wrote:
> Andrew MacLeod <amacleod@redhat.com> writes:
>
>   
>> I've been working for a while on understanding how the new memory
>> model and Atomics work, and what the impacts are on GCC.
>>     
>
> Thanks for looking at this.
>
> One issue I didn't see clearly was how to actually implement this in
> the compiler.  For example, speculated stores are fine for local stack
> variables, but not for global variables or heap memory.  We can
> implement that in the compiler via a set of tests at each potential
> speculated store.  Or we can implement it via a constraint expressed
> directly in the IR--perhaps some indicator that this specific store
> may not merge with conditionals.  The latter approach is harder to
> design but I suspect will be more likely to be reliable over time.
> The former approach is straightforward to patch into the compiler but
> can easily degrade as people who don't understand the issues work on
> the code.
>   

which is why the ability to regression test it is so important :-).  

Right now its my intention to modify the optimizations based on the flag 
settings. Some cases will be quite tricky.  If we're CSE'ing something 
in the absence of atomics, and it is shared memory, it is still possible 
to move it if there is already a load from that location on all paths.  
So the optimization itself will need to taught how to figure that out.

ie

if ()
  a_1 = glob
else
  if ()
      b_2 = glob
   else
      c_3 = glob

we can still common glob and produce

tmp_4 = glob
if ()
  a_1 = tmp_4
else
  if ()
    b_2 = tmp_4
  else
    c_3 = tmp4

all paths loaded glob before, so we can do this safely.

but if we had:

if ()
  a_1 = glob
else
  if ()
     b_2 = notglob
  else
     c_3 = glob

then we can no longer do anything since we'd be introducing a new load 
of 'glob' on the path that sets b_2 which wasn't performed before. If 
there was another load of glob somewhere before the first 'if', then 
commoning becomes possible again.

Some other cases won't be nearly so tricky, thankfully :-). I do think 
we need to do it in the optimizations because of some of the complex 
situations which can arise. We can at least try to do a good job and 
then punt if it gets too hard.

Now, thankfully, on most architectures we care about, hardware detection 
of data race loads isn't an issue.  So most of the time  its only the 
stores that we need to be careful about introducing new ones.  Im hoping 
the actual impact to codegen is low most of the time

> I don't agree with your proposed command line options.  They seem fine
> for internal use, but I think very very few users would know when or
> whether they should use -fno-data-race-stores.  I think you should
>   

I'm fine with alternatives. I'm focused mostly on the internals and I 
want an individual flag for each of those things to cleanly separate 
them out.  How we expose it I'm ambivalent about as long as testing can 
turn it them on and off individually. 

There will be people using software data race detectors which may want 
to be able to turn things on or off from the system default. I think 
-fmemory-model=   with options enabling at a minimum some form of 'off', 
'system default', and 'on' would probably work for external exposure.

Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-06 20:40 ` Ian Lance Taylor
  2010-05-06 22:02   ` Andrew MacLeod
@ 2010-05-07 13:55   ` Mark Mitchell
  1 sibling, 0 replies; 27+ messages in thread
From: Mark Mitchell @ 2010-05-07 13:55 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Andrew MacLeod, GCC

Ian Lance Taylor wrote:

>     -fmemory-model=single
>         Assume single threaded execution, which also means no signal
>         handlers.
>     -fmemory-model=fast
>         The user is responsible for all synchronization.  Accessing
>         the same memory words from different threads may break
>         unpredictably.
>     -fmemory-model=safe
>         The compiler will do its best to protect you.

That makes sense to me.  I think that's an appropriately user-oriented
view of the choices.

-- 
Mark Mitchell
CodeSourcery
mark@codesourcery.com
(650) 331-3385 x713

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
@ 2010-05-08 14:38 Jean-Marc Bourguet
  2010-05-08 20:18 ` Albert Cohen
  0 siblings, 1 reply; 27+ messages in thread
From: Jean-Marc Bourguet @ 2010-05-08 14:38 UTC (permalink / raw)
  To: gcc

>     -fmemory-model=single
>         Assume single threaded execution, which also means no signal
>         handlers.
>     -fmemory-model=fast
>         The user is responsible for all synchronization.  Accessing
>         the same memory words from different threads may break
>         unpredictably.
>     -fmemory-model=safe
>         The compiler will do its best to protect you.

With that description, I'd think that "safe" lets the user code assumes
the sequential consistency model.  I'd use -fmemory-model=conformant or
something like that for the model where the compiler assumes that the user
code respect the constraint led out for it by the standard.  As which
constraints are put on user code depend on the languages -- Java has its
own memory model which AFAIK is more constraining than C++ and I think Ada
has its own but my Ada programming days are too far for me to comment on
it -- one may prefer some other name.

Yours,

-- 
Jean-Marc Bourguet

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-08 14:38 Jean-Marc Bourguet
@ 2010-05-08 20:18 ` Albert Cohen
  2010-05-10  4:40   ` Ian Lance Taylor
  0 siblings, 1 reply; 27+ messages in thread
From: Albert Cohen @ 2010-05-08 20:18 UTC (permalink / raw)
  To: gcc

Jean-Marc Bourguet wrote:
>>     -fmemory-model=single
>>         Assume single threaded execution, which also means no signal
>>         handlers.
>>     -fmemory-model=fast
>>         The user is responsible for all synchronization.  Accessing
>>         the same memory words from different threads may break
>>         unpredictably.
>>     -fmemory-model=safe
>>         The compiler will do its best to protect you.
> 
> With that description, I'd think that "safe" lets the user code assumes
> the sequential consistency model.  I'd use -fmemory-model=conformant or
> something like that for the model where the compiler assumes that the user
> code respect the constraint led out for it by the standard.  As which
> constraints are put on user code depend on the languages -- Java has its
> own memory model which AFAIK is more constraining than C++ and I think Ada
> has its own but my Ada programming days are too far for me to comment on
> it -- one may prefer some other name.

I agree. Or even, =c++0x or =gnu++0x

On the other hand, I fail to see the differen between =single and =fast, 
and the explanation about "the same memory word" is not really relevant 
as memory models typically tell you about concurrent accesses to 
"different memory words".

Albert

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-08 20:18 ` Albert Cohen
@ 2010-05-10  4:40   ` Ian Lance Taylor
  2010-05-10 17:23     ` Andrew MacLeod
  0 siblings, 1 reply; 27+ messages in thread
From: Ian Lance Taylor @ 2010-05-10  4:40 UTC (permalink / raw)
  To: Albert Cohen; +Cc: gcc

Albert Cohen <Albert.Cohen@inria.fr> writes:

> Jean-Marc Bourguet wrote:
>>>     -fmemory-model=single
>>>         Assume single threaded execution, which also means no signal
>>>         handlers.
>>>     -fmemory-model=fast
>>>         The user is responsible for all synchronization.  Accessing
>>>         the same memory words from different threads may break
>>>         unpredictably.
>>>     -fmemory-model=safe
>>>         The compiler will do its best to protect you.
>>
>> With that description, I'd think that "safe" lets the user code assumes
>> the sequential consistency model.  I'd use -fmemory-model=conformant or
>> something like that for the model where the compiler assumes that the user
>> code respect the constraint led out for it by the standard.  As which
>> constraints are put on user code depend on the languages -- Java has its
>> own memory model which AFAIK is more constraining than C++ and I think Ada
>> has its own but my Ada programming days are too far for me to comment on
>> it -- one may prefer some other name.
>
> I agree. Or even, =c++0x or =gnu++0x
>
> On the other hand, I fail to see the differen between =single and
> =fast, and the explanation about "the same memory word" is not really
> relevant as memory models typically tell you about concurrent accesses
> to "different memory words".

What I was thinking is that the difference between =single and =fast
is that =single permits store speculation.  The difference between
=fast and =safe/=conformant is that =fast permits writing to a byte by
loading a word, changing the byte, and storing the word; in
particular, =fast permits write combining in cases where =safe does
not.

Memory models may not talk about memory words, but they exist
nevertheless.

Ian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-10  4:40   ` Ian Lance Taylor
@ 2010-05-10 17:23     ` Andrew MacLeod
  2010-05-11  6:20       ` Miles Bader
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew MacLeod @ 2010-05-10 17:23 UTC (permalink / raw)
  To: gcc

On 05/10/2010 12:39 AM, Ian Lance Taylor wrote:
> Albert Cohen<Albert.Cohen@inria.fr>  writes:
>    
>>
>> I agree. Or even, =c++0x or =gnu++0x
>>
>> On the other hand, I fail to see the differen between =single and
>> =fast, and the explanation about "the same memory word" is not really
>> relevant as memory models typically tell you about concurrent accesses
>> to "different memory words".
>>      
> What I was thinking is that the difference between =single and =fast
> is that =single permits store speculation.  The difference between
> =fast and =safe/=conformant is that =fast permits writing to a byte by
> loading a word, changing the byte, and storing the word; in
> particular, =fast permits write combining in cases where =safe does
> not.
>
> Memory models may not talk about memory words, but they exist
> nevertheless.
>
> Ian
>    

I've changed the documentation and code to --params suggestion and the 
following, for now.  we can work out the exact wording and other options 
later.

-fmemory-model=c++0x    - Disable data races as per architectural 
requirements to match the standard.
-fmemory-model=safe        - Disable all data race introductions. 
(enforce all 4 internal restrictions.)
-fmemory-model=single     - Enable all data races introductions, as they 
are today. (relax all 4 internal restrictions.)

Andrew


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-10 17:23     ` Andrew MacLeod
@ 2010-05-11  6:20       ` Miles Bader
  2010-05-11 12:49         ` Andrew MacLeod
  0 siblings, 1 reply; 27+ messages in thread
From: Miles Bader @ 2010-05-11  6:20 UTC (permalink / raw)
  To: gcc

Andrew MacLeod <amacleod@redhat.com> writes:
> -fmemory-model=single     - Enable all data races introductions, as they
> are today. (relax all 4 internal restrictions.)

One could still use this mode with a multi-threaded program as long as
explicit synchronization is done, right?

-Miles

-- 
Road, n. A strip of land along which one may pass from where it is too
tiresome to be to where it is futile to go.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-11  6:20       ` Miles Bader
@ 2010-05-11 12:49         ` Andrew MacLeod
  2010-05-12  5:21           ` Miles Bader
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew MacLeod @ 2010-05-11 12:49 UTC (permalink / raw)
  To: Miles Bader; +Cc: gcc

Miles Bader wrote:
> Andrew MacLeod <amacleod@redhat.com> writes:
>   
>> -fmemory-model=single     - Enable all data races introductions, as they
>> are today. (relax all 4 internal restrictions.)
>>     
>
> One could still use this mode with a multi-threaded program as long as
> explicit synchronization is done, right?
>   

Right.   Its just a single processor memory model, so it doesn't limit 
any optimizations.

Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-11 12:49         ` Andrew MacLeod
@ 2010-05-12  5:21           ` Miles Bader
  2010-05-12 13:10             ` Andrew MacLeod
  0 siblings, 1 reply; 27+ messages in thread
From: Miles Bader @ 2010-05-12  5:21 UTC (permalink / raw)
  To: gcc

Andrew MacLeod <amacleod@redhat.com> writes:
>>> -fmemory-model=single     - Enable all data races introductions, as they
>>> are today. (relax all 4 internal restrictions.)
>>
>> One could still use this mode with a multi-threaded program as long as
>> explicit synchronization is done, right?
>
> Right.  Its just a single processor memory model, so it doesn't limit
> any optimizations.

Hmm, though now that I think about it, I'm not exactly sure what I mean
by "explicit synchronization".  Standard libraries (boost threads, the
upcoming std::thread) provide things like mutexes and
conditional-variables, but does using those guarantee that the right
things happen with any shared data-structures they're used to
coordinate...?

Thanks,

-Miles

-- 
Vote, v. The instrument and symbol of a freeman's power to make a fool of
himself and a wreck of his country.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-12  5:21           ` Miles Bader
@ 2010-05-12 13:10             ` Andrew MacLeod
  2010-05-17 13:12               ` Michael Matz
  0 siblings, 1 reply; 27+ messages in thread
From: Andrew MacLeod @ 2010-05-12 13:10 UTC (permalink / raw)
  To: Miles Bader; +Cc: gcc

Miles Bader wrote:
> Andrew MacLeod <amacleod@redhat.com> writes:
>   
>>>> -fmemory-model=single     - Enable all data races introductions, as they
>>>> are today. (relax all 4 internal restrictions.)
>>>>         
>>> One could still use this mode with a multi-threaded program as long as
>>> explicit synchronization is done, right?
>>>       
>> Right.  Its just a single processor memory model, so it doesn't limit
>> any optimizations.
>>     
>
> Hmm, though now that I think about it, I'm not exactly sure what I mean
> by "explicit synchronization".  Standard libraries (boost threads, the
> upcoming std::thread) provide things like mutexes and
> conditional-variables, but does using those guarantee that the right
> things happen with any shared data-structures they're used to
> coordinate...?
>
>   

Well, you get the same thing you get today.  Any synchronization done 
via a function call will tend to be correct since we never move shared 
memory operations across calls.   Depending on your application, the 
types of data races the options deal with may not be an issue.   Using 
the options will eliminate having to think whether they are issues or 
not at a (hopefully) small cost.

Since the atomic operations are being built into the compiler,  the 
intent is to eventually optimize and inline them for speed... and in the 
best case, simply result in a load or store. That's further work of 
course, but these options are laying some of the groundwork.

Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-12 13:10             ` Andrew MacLeod
@ 2010-05-17 13:12               ` Michael Matz
  2010-05-17 14:05                 ` Ian Lance Taylor
  2010-05-17 14:09                 ` Andrew MacLeod
  0 siblings, 2 replies; 27+ messages in thread
From: Michael Matz @ 2010-05-17 13:12 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: Miles Bader, gcc

Hi,

On Wed, 12 May 2010, Andrew MacLeod wrote:

> Well, you get the same thing you get today.  Any synchronization done 
> via a function call will tend to be correct since we never move shared 
> memory operations across calls.  Depending on your application, the 
> types of data races the options deal with may not be an issue.  Using 
> the options will eliminate having to think whether they are issues or 
> not at a (hopefully) small cost.
> 
> Since the atomic operations are being built into the compiler, the 
> intent is to eventually optimize and inline them for speed... and in the 
> best case, simply result in a load or store. That's further work of 
> course, but these options are laying some of the groundwork.

Are you and the other proponents of that memory model seriously proposing 
it as an alternative to explicit locking via atomic builtins (that map to 
some form of atomic instructions)?


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-17 13:12               ` Michael Matz
@ 2010-05-17 14:05                 ` Ian Lance Taylor
  2010-05-17 14:24                   ` Michael Matz
  2010-05-17 14:09                 ` Andrew MacLeod
  1 sibling, 1 reply; 27+ messages in thread
From: Ian Lance Taylor @ 2010-05-17 14:05 UTC (permalink / raw)
  To: Michael Matz; +Cc: Andrew MacLeod, Miles Bader, gcc

Michael Matz <matz@suse.de> writes:

> On Wed, 12 May 2010, Andrew MacLeod wrote:
>
>> Well, you get the same thing you get today.  Any synchronization done 
>> via a function call will tend to be correct since we never move shared 
>> memory operations across calls.  Depending on your application, the 
>> types of data races the options deal with may not be an issue.  Using 
>> the options will eliminate having to think whether they are issues or 
>> not at a (hopefully) small cost.
>> 
>> Since the atomic operations are being built into the compiler, the 
>> intent is to eventually optimize and inline them for speed... and in the 
>> best case, simply result in a load or store. That's further work of 
>> course, but these options are laying some of the groundwork.
>
> Are you and the other proponents of that memory model seriously proposing 
> it as an alternative to explicit locking via atomic builtins (that map to 
> some form of atomic instructions)?

I'm not sure what you mean here.  Do you an alternative way to
implement the C++0x proposed standard?  Or are you questioning the
approach taken by the standard?

Ian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-17 14:05                 ` Ian Lance Taylor
@ 2010-05-17 14:24                   ` Michael Matz
  2010-05-17 20:22                     ` Ian Lance Taylor
  0 siblings, 1 reply; 27+ messages in thread
From: Michael Matz @ 2010-05-17 14:24 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Andrew MacLeod, Miles Bader, gcc

Hi,

On Mon, 17 May 2010, Ian Lance Taylor wrote:

> >> Since the atomic operations are being built into the compiler, the 
> >> intent is to eventually optimize and inline them for speed... and in 
> >> the best case, simply result in a load or store. That's further work 
> >> of course, but these options are laying some of the groundwork.
> >
> > Are you and the other proponents of that memory model seriously 
> > proposing it as an alternative to explicit locking via atomic builtins 
> > (that map to some form of atomic instructions)?
> 
> I'm not sure what you mean here.  Do you an alternative way to
> implement the C++0x proposed standard?

I actually see no way to implement the proposed memory model on common 
hardware, except by emitting locked instructions and memory barriers for 
all memory accesses to potentially shared (and hence all non-stack) data.  
And even then it only works on a subset of types, namely those for this 
the hardware provides such instructions with the associated guarantees.

> Or are you questioning the approach taken by the standard?

I do, yes.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-17 14:24                   ` Michael Matz
@ 2010-05-17 20:22                     ` Ian Lance Taylor
  0 siblings, 0 replies; 27+ messages in thread
From: Ian Lance Taylor @ 2010-05-17 20:22 UTC (permalink / raw)
  To: Michael Matz; +Cc: Andrew MacLeod, Miles Bader, gcc

Michael Matz <matz@suse.de> writes:

> On Mon, 17 May 2010, Ian Lance Taylor wrote:
>
>> >> Since the atomic operations are being built into the compiler, the 
>> >> intent is to eventually optimize and inline them for speed... and in 
>> >> the best case, simply result in a load or store. That's further work 
>> >> of course, but these options are laying some of the groundwork.
>> >
>> > Are you and the other proponents of that memory model seriously 
>> > proposing it as an alternative to explicit locking via atomic builtins 
>> > (that map to some form of atomic instructions)?
>> 
>> I'm not sure what you mean here.  Do you an alternative way to
>> implement the C++0x proposed standard?
>
> I actually see no way to implement the proposed memory model on common 
> hardware, except by emitting locked instructions and memory barriers for 
> all memory accesses to potentially shared (and hence all non-stack) data.  
> And even then it only works on a subset of types, namely those for this 
> the hardware provides such instructions with the associated guarantees.

I'm sure the C++ standards committee would like to hear a case for why
the proposal is unusable.  The standard has not yet been voted out.

As far as I understand the proposal, though, your statement turns out
not to be the case.  Those locked instructions and memory barries are
only required for loads and stores to atomic types, not to all types.

Ian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-17 13:12               ` Michael Matz
  2010-05-17 14:05                 ` Ian Lance Taylor
@ 2010-05-17 14:09                 ` Andrew MacLeod
  2010-05-17 14:55                   ` Michael Matz
  1 sibling, 1 reply; 27+ messages in thread
From: Andrew MacLeod @ 2010-05-17 14:09 UTC (permalink / raw)
  To: Michael Matz; +Cc: Miles Bader, gcc

Michael Matz wrote:
> Hi,
>
> On Wed, 12 May 2010, Andrew MacLeod wrote:
>
>   
>> Well, you get the same thing you get today.  Any synchronization done 
>> via a function call will tend to be correct since we never move shared 
>> memory operations across calls.  Depending on your application, the 
>> types of data races the options deal with may not be an issue.  Using 
>> the options will eliminate having to think whether they are issues or 
>> not at a (hopefully) small cost.
>>
>> Since the atomic operations are being built into the compiler, the 
>> intent is to eventually optimize and inline them for speed... and in the 
>> best case, simply result in a load or store. That's further work of 
>> course, but these options are laying some of the groundwork.
>>     
>
> Are you and the other proponents of that memory model seriously proposing 
> it as an alternative to explicit locking via atomic builtins (that map to 
> some form of atomic instructions)?
>
>   
Proposing what as an alternative?

These optimization restrictions defined by the memory model are there to 
create predictable memory behaviour across threads. This is applicable 
when you use the atomic built-ins for locking.  Especially in the case 
when the atomic operation is inlined.  One goal is to have unoptimized 
program behaviour be consistent with  the optimized version.  If the 
optimizers introduce new data races, there is a potential behaviour 
difference.

Lock free data structures which utilize the atomic built-ins but do not 
require explicit locking are potential applications built on top of that.

Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-17 14:09                 ` Andrew MacLeod
@ 2010-05-17 14:55                   ` Michael Matz
  2010-05-17 16:36                     ` Andrew MacLeod
  0 siblings, 1 reply; 27+ messages in thread
From: Michael Matz @ 2010-05-17 14:55 UTC (permalink / raw)
  To: Andrew MacLeod; +Cc: Miles Bader, gcc

Hi,

On Mon, 17 May 2010, Andrew MacLeod wrote:

> > > Well, you get the same thing you get today.  Any synchronization 
> > > done via a function call will tend to be correct since we never move 
> > > shared memory operations across calls.  Depending on your 
> > > application, the types of data races the options deal with may not 
> > > be an issue.  Using the options will eliminate having to think 
> > > whether they are issues or not at a (hopefully) small cost.
> > >
> > > Since the atomic operations are being built into the compiler, the 
> > > intent is to eventually optimize and inline them for speed... and in 
> > > the best case, simply result in a load or store. That's further work 
> > > of course, but these options are laying some of the groundwork.
> > >     
> >
> > Are you and the other proponents of that memory model seriously 
> > proposing it as an alternative to explicit locking via atomic builtins 
> > (that map to some form of atomic instructions)?
> 
> Proposing what as an alternative?

The guarantees you seem to want to establish by the proposed memory model.  
Possibly I misunderstood.

I'm not 100% sure on the guarantees you want to establish.  The proposed 
model seems to merge multiple concepts together, all related to 
memory access ordering and atomicity, but with different scope and 
difficulty to guarantee.

The mail to which I reacted seemed to me to imply that you would believe 
the guarantees from the memory model alone would relieve users from 
writing explicit atomic instructions for data synchronization.

If you didn't imply that, then I'm also interested to learn what other 
advantages you expect to derive from the guarantees.

And third, I'm interested to learn how you intend to actually guarantee 
the guarantees given by the model.

So, in short, I'd like to know
  What (guarantees are established),
  Why (are those sensible and useful), and
  How (are those intended to be implemented)

I've tried to find this in the Wiki, but it only states some broad goals 
it seems (not introduce data races).  I also find the papers of Boehm 
somewhat lacking when it comes to how to actually implement the whole 
model on hardware, especially because he himself acknowledges the obvious 
problems on real hardware, like:
  * load/store reorder buffers,
  * store->load forwarding,
  * cache-line granularity even for strict coherency models, 
  * existence of weak coherency machines (have to aquire whole cache line
    for exlusive write)
  * general slowness of locked (or atomic) instructions compared to normal
    stores/loads
  * existence of store granularity on some hardware (we don't even have to 
    enter bit-field business, alpha e.g. has only 64 bit accesses)

But for all of these to be relevant questions we first need to know what 
exactly are the intended guarantees of that model; say from the 
perspective of observable behaviour from other threads.

> These optimization restrictions defined by the memory model are there to 
> create predictable memory behaviour across threads.

With our without use of atomics?  I.e. is the mem behaviour supposed to be 
predictable also in absense of all mentions of explicitely written atomic 
builtins?  And you need to defined predictable.  Predictable == behaves 
according to rules.  What are the rules?

> This is applicable when you use the atomic built-ins for locking.  
> Especially in the case when the atomic operation is inlined.  One goal 
> is to have unoptimized program behaviour be consistent with the 
> optimized version.

We have that now (because atomics are memory barriers), so that's probably 
not why the model was devised.

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: C++0x Memory model and gcc
  2010-05-17 14:55                   ` Michael Matz
@ 2010-05-17 16:36                     ` Andrew MacLeod
  0 siblings, 0 replies; 27+ messages in thread
From: Andrew MacLeod @ 2010-05-17 16:36 UTC (permalink / raw)
  To: Michael Matz; +Cc: Miles Bader, gcc

Michael Matz wrote:
> Hi,
>
> On Mon, 17 May 2010, Andrew MacLeod wrote:
>   
> The guarantees you seem to want to establish by the proposed memory model.  
> Possibly I misunderstood.
>
> I'm not 100% sure on the guarantees you want to establish.  The proposed 
> model seems to merge multiple concepts together, all related to 
> memory access ordering and atomicity, but with different scope and 
> difficulty to guarantee.
>   

I think the standard is excessively confusing, and overly academic. I 
even find the term memory model adds to the confusion.  Some effort was 
clearly involved in defining behaviour for hardware which does not yet 
exist, but the language is "prepared" for.  I was particularly unhappy 
that they merged the whole synchronization thing to an atomic load or 
store, at least originally. I would hazard a guess that it evolved to 
this state based on an observation that synchronization is almost 
inevitably required when an atomic is being accessed. Thats just a guess 
however.

However, there is some fundamental goodness in it once you sort through it.

Lets see if I can paraphrase normal uses and map them to the standard :-)

The normal case would be when you have a system wide lock, and when you 
acquire the lock, you expect everything which occurred before the lock 
to be completed.
ie
process1 :    otherglob = 2;  global = 10;   set atomic_lock(1);
process2:   wait (atomic_lock() == 1);    print (global)

you expect 'global' in process 2 to always be 10. You are in effect 
using the lock as a ready flag for global.

In order for that to happen in a consistent manner, there is more 
involved than just waiting for the lock.  If process 1 and 2 are running 
on different machines, process 1 will have to flush its cache all the 
way to memory, and process  2 will have to wait for that to complete and 
visible before it can proceed with allowing the proper value of global 
to be loaded.  Otherwise the results will not be as expected.

Thats the synchronization model which maps to the default or 
'sequentially consistent' C++ model.  The cache flushing and whatever 
else is required is built into the library routines  for performing 
atomic loads and stores. There is no mechanism to specify that this lock 
is  for the value of 'global', so the standard extends the definition of 
the lock to say it applies to *all* shared memory before the atomic lock 
value is set.  so

process3:  wait (atomic_lock() == 1) print (otherglob);

will also work properly.  This memory model will always involve some 
form of synchronization instructions, and potentially waiting on other 
hardware to complete. I don't know much about this , but Im told 
machines are starting to provide instructions to accomplish this type of 
synchronization. The obvious conclusion is that once the hardware starts 
to be able to do this synchronization with a few instructions, the 
entire library call to set or read an atomic and perform 
synchronization  may be inlinable without having  a call of any kind, 
just straight line instructions.  At this point, the optimizer will need 
to understand that those instructions are barriers.

If you are using an atomic variable simply as an variable, and don't 
care about the synchronization aspects (ie, you just want to always see 
a valid value for the variable), then that maps to the 'relaxed' mode.  
There may be some academic babble about certain provisions, but this is 
effectively what it boils down to. The relaxed mode is what you use when 
you don't care about all that memory flushing and just want to see the 
values of the atomic itself. So this is the fastest model, but don't 
depend on the values of other shared variables.  This is also what you 
get when you use the basic atomic store and load macros in C.

The sequential mode has the possibility of being VERY slow if you have a 
widely distributed system. Thats where the third mode comes in, the 
release/acquire model.  Proper utilization of it can remove many of the 
waits present in the sequential model since different processes don't 
have to wait for *all* cache flushes, just ones directly related to a 
specific atomic variable in a specific other process. The model is 
provided to allow code to run more efficiently, but requires a better 
understanding of the subtleties of multi-processor side effects in the 
code you write.  I still don't really get it completely, but I'm not 
implementing the synchronization parts, so I only need to understand 
some of it :-)  It is possible to optimize these operations, ie you can 
do CSE and dead store elimination which can also help the code run 
faster. That comes later tho.

The optimization flags I'm currently working on are orthogonal to all 
this, even though it uses the term memory-model.  When a program is 
written for multi-processing the programmer usually attempts to write it 
such that there are no data races, otherwise there may be 
inconsistencies during execution.  If a program has been developed and 
is data race free,  the flags are meant to guarantee that the resulting 
code will also be data race free, regardless of whether optimizations is 
on or off. 

Does that make anything clearer?  Its true that a bunch of these things 
are all intertwined, and that's one of the reasons it comes across as 
being so complicated.

Its up to the library guys to make whatever process synchronization is 
required to happen, I leave that to them. They say they have a handle on 
it, we'll see. When they do, then we might get to inline it and do some 
interesting things.

Andrew

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2010-05-17 20:22 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-06 15:43 C++0x Memory model and gcc Andrew MacLeod
2010-05-06 15:51 ` Richard Guenther
2010-05-06 16:11   ` Richard Guenther
2010-05-06 16:23     ` Andrew MacLeod
2010-05-07  9:15       ` Richard Guenther
2010-05-07 13:27         ` Andrew MacLeod
2010-05-07 14:26           ` Ian Lance Taylor
2010-05-06 15:54 ` Joseph S. Myers
2010-05-06 16:12   ` Andrew MacLeod
2010-05-06 20:40 ` Ian Lance Taylor
2010-05-06 22:02   ` Andrew MacLeod
2010-05-07 13:55   ` Mark Mitchell
2010-05-08 14:38 Jean-Marc Bourguet
2010-05-08 20:18 ` Albert Cohen
2010-05-10  4:40   ` Ian Lance Taylor
2010-05-10 17:23     ` Andrew MacLeod
2010-05-11  6:20       ` Miles Bader
2010-05-11 12:49         ` Andrew MacLeod
2010-05-12  5:21           ` Miles Bader
2010-05-12 13:10             ` Andrew MacLeod
2010-05-17 13:12               ` Michael Matz
2010-05-17 14:05                 ` Ian Lance Taylor
2010-05-17 14:24                   ` Michael Matz
2010-05-17 20:22                     ` Ian Lance Taylor
2010-05-17 14:09                 ` Andrew MacLeod
2010-05-17 14:55                   ` Michael Matz
2010-05-17 16:36                     ` Andrew MacLeod

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).