* C++0x Memory model and gcc @ 2010-05-06 15:43 Andrew MacLeod 2010-05-06 15:51 ` Richard Guenther ` (2 more replies) 0 siblings, 3 replies; 27+ messages in thread From: Andrew MacLeod @ 2010-05-06 15:43 UTC (permalink / raw) To: GCC I've been working for a while on understanding how the new memory model and Atomics work, and what the impacts are on GCC. It would be ideal to get as many of these changes into GCC 4.6 as possible. I've started work on some of the modifications and testing, and the overall impact on GCC shouldn't be *too* bad :-) The plan is to localize the changes as much as possible, and any intrusive bits like optimization changes will be controlled by a flag enabling us to keep the current behaviour when we want it. I've put together a document summarizing how the memory model works, and how I propose to make the changes. I've converted it to wiki pages. Maybe no one will laugh at my choice of document format this time :-) The document is linked off the Atomics wiki page, or directly here: http://gcc.gnu.org/wiki/Atomic/GCCMM It consists mainly of describing the 2 primary aspects of the memory model which affects us - Optimization changes to avoid introducing new data races - Implementation of atomic variables and synchronization modes as well as a new infrastructure to test these types of things. I'm sure I've screwed something up while doing it, and I will proofread it later today again and tweak it further. Please point out anything that isn't clear, or is downright wrong. Especially in the testing methodology since its all new stuff. Suggestions for improvements on any of the plan are welcome as well. Andrew ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-06 15:43 C++0x Memory model and gcc Andrew MacLeod @ 2010-05-06 15:51 ` Richard Guenther 2010-05-06 16:11 ` Richard Guenther 2010-05-06 15:54 ` Joseph S. Myers 2010-05-06 20:40 ` Ian Lance Taylor 2 siblings, 1 reply; 27+ messages in thread From: Richard Guenther @ 2010-05-06 15:51 UTC (permalink / raw) To: Andrew MacLeod; +Cc: GCC On Thu, May 6, 2010 at 5:43 PM, Andrew MacLeod <amacleod@redhat.com> wrote: > I've been working for a while on understanding how the new memory model and > Atomics work, and what the impacts are on GCC. > > It would be ideal to get as many of these changes into GCC 4.6 as possible. > I've started work on some of the modifications and testing, and the overall > impact on GCC shouldn't be *too* bad :-) > > The plan is to localize the changes as much as possible, and any intrusive > bits like optimization changes will be controlled by a flag enabling us to > keep the current behaviour when we want it. > > I've put together a document summarizing how the memory model works, and how > I propose to make the changes. I've converted it to wiki pages. Maybe no > one will laugh at my choice of document format this time :-) > > The document is linked off the Atomics wiki page, or directly here: > http://gcc.gnu.org/wiki/Atomic/GCCMM > > It consists mainly of describing the 2 primary aspects of the memory model > which affects us > - Optimization changes to avoid introducing new data races > - Implementation of atomic variables and synchronization modes > as well as a new infrastructure to test these types of things. > > I'm sure I've screwed something up while doing it, and I will proofread it > later today again and tweak it further. > > Please point out anything that isn't clear, or is downright wrong. > Especially in the testing methodology since its all new stuff. > Suggestions for improvements on any of the plan are welcome as well. First let me say that the C++ memory model is crap when it forces data-races to be avoided for unannotated data like the examples for packed data. Well, I hope that instead of just disabling optimizations you will help to improve their implementation to be able to optimize in a conformant manner. Richard. > Andrew > > > > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-06 15:51 ` Richard Guenther @ 2010-05-06 16:11 ` Richard Guenther 2010-05-06 16:23 ` Andrew MacLeod 0 siblings, 1 reply; 27+ messages in thread From: Richard Guenther @ 2010-05-06 16:11 UTC (permalink / raw) To: Andrew MacLeod; +Cc: GCC On Thu, May 6, 2010 at 5:50 PM, Richard Guenther <richard.guenther@gmail.com> wrote: > On Thu, May 6, 2010 at 5:43 PM, Andrew MacLeod <amacleod@redhat.com> wrote: >> I've been working for a while on understanding how the new memory model and >> Atomics work, and what the impacts are on GCC. >> >> It would be ideal to get as many of these changes into GCC 4.6 as possible. >> I've started work on some of the modifications and testing, and the overall >> impact on GCC shouldn't be *too* bad :-) >> >> The plan is to localize the changes as much as possible, and any intrusive >> bits like optimization changes will be controlled by a flag enabling us to >> keep the current behaviour when we want it. >> >> I've put together a document summarizing how the memory model works, and how >> I propose to make the changes. I've converted it to wiki pages. Maybe no >> one will laugh at my choice of document format this time :-) >> >> The document is linked off the Atomics wiki page, or directly here: >> http://gcc.gnu.org/wiki/Atomic/GCCMM >> >> It consists mainly of describing the 2 primary aspects of the memory model >> which affects us >> - Optimization changes to avoid introducing new data races >> - Implementation of atomic variables and synchronization modes >> as well as a new infrastructure to test these types of things. >> >> I'm sure I've screwed something up while doing it, and I will proofread it >> later today again and tweak it further. >> >> Please point out anything that isn't clear, or is downright wrong. >> Especially in the testing methodology since its all new stuff. >> Suggestions for improvements on any of the plan are welcome as well. > > First let me say that the C++ memory model is crap when it > forces data-races to be avoided for unannotated data like > the examples for packed data. > > Well, I hope that instead of just disabling optimizations you > will help to improve their implementation to be able to optimize > in a conformant manner. And btw, if you are thinking on how to represent the extra data-dependencies required for the consistency models think of how to extend whatever you need in infrastructure for that to also allow FENV dependencies - it's a quite similar problem (FENV query/set are the atomic operations, usual arithmetic is what the dependency is to). It's completely non-trivial (because it's scalar code, not memory accesses). For atomics you should be able to just massage the alias-oracle data-dependence routines (maybe). Richard. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-06 16:11 ` Richard Guenther @ 2010-05-06 16:23 ` Andrew MacLeod 2010-05-07 9:15 ` Richard Guenther 0 siblings, 1 reply; 27+ messages in thread From: Andrew MacLeod @ 2010-05-06 16:23 UTC (permalink / raw) To: Richard Guenther; +Cc: GCC Richard Guenther wrote: > On Thu, May 6, 2010 at 5:50 PM, Richard Guenther > <richard.guenther@gmail.com> wrote: > >> First let me say that the C++ memory model is crap when it >> forces data-races to be avoided for unannotated data like >> the examples for packed data. >> And it isn't consistent across the board, since neighbouring bits normally don't qualify and can introduce data races. I don't like it when a solution has exceptions like that. It is what it is however, and last I heard the plan was for C to adopt the changes as well. >> Well, I hope that instead of just disabling optimizations you >> will help to improve their implementation to be able to optimize >> in a conformant manner. >> I don't want to disable any more than required. SSA names aren't affected since they are local variables only, its only operations on shared memory, and I am hopeful that I can minimize the restrictions placed on them. Some will be more interesting than others... like CSE... you can still perform CSE on a global as long as you don't introduce a NEW load on some execution path that didn't have before. What fun. > > And btw, if you are thinking on how to represent the extra > data-dependencies required for the consistency models think > of how to extend whatever you need in infrastructure for that > to also allow FENV dependencies - it's a quite similar problem > (FENV query/set are the atomic operations, usual arithmetic > is what the dependency is to). It's completely non-trivial > (because it's scalar code, not memory accesses). For > atomics you should be able to just massage the alias-oracle > data-dependence routines (maybe). > That's what I'm hoping actually.. Andrew. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-06 16:23 ` Andrew MacLeod @ 2010-05-07 9:15 ` Richard Guenther 2010-05-07 13:27 ` Andrew MacLeod 0 siblings, 1 reply; 27+ messages in thread From: Richard Guenther @ 2010-05-07 9:15 UTC (permalink / raw) To: Andrew MacLeod; +Cc: GCC On Thu, May 6, 2010 at 6:22 PM, Andrew MacLeod <amacleod@redhat.com> wrote: > Richard Guenther wrote: >> >> On Thu, May 6, 2010 at 5:50 PM, Richard Guenther >> <richard.guenther@gmail.com> wrote: >> >>> >>> First let me say that the C++ memory model is crap when it >>> forces data-races to be avoided for unannotated data like >>> the examples for packed data. >>> > > And it isn't consistent across the board, since neighbouring bits normally > don't qualify and can introduce data races. I don't like it when a solution > has exceptions like that. It is what it is however, and last I heard the > plan was for C to adopt the changes as well. I would have hoped that only data races between independent objects are covered, thus tmp = a.i; b.j = tmp; would qualify as a load of a and a store to b as far as dependencies are concerned. That would have been consistent with the exceptions for bitfields and much more friendly to architectures with weak support for unaligned accesses. >>> >>> Well, I hope that instead of just disabling optimizations you >>> will help to improve their implementation to be able to optimize >>> in a conformant manner. >>> > > I don't want to disable any more than required. SSA names aren't affected > since they are local variables only, its only operations on shared memory, > and I am hopeful that I can minimize the restrictions placed on them. Some > will be more interesting than others... like CSE... you can still perform > CSE on a global as long as you don't introduce a NEW load on some execution > path that didn't have before. What fun. I don't understand that restriction anyway - how can an extra load cause a data-race if the result is only used when it was used before? (You'd need to disable PPRE and GCSE completely if that's really a problem) Thus, if (p) tmp = load; ... if (q) use tmp; how can transforming that to tmp = load; ... if (q) use tmp; ever cause a problem? >> And btw, if you are thinking on how to represent the extra >> data-dependencies required for the consistency models think >> of how to extend whatever you need in infrastructure for that >> to also allow FENV dependencies - it's a quite similar problem >> (FENV query/set are the atomic operations, usual arithmetic >> is what the dependency is to). It's completely non-trivial >> (because it's scalar code, not memory accesses). For >> atomics you should be able to just massage the alias-oracle >> data-dependence routines (maybe). >> > > That's what I'm hoping actually.. We'll see. Richard. > Andrew. > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-07 9:15 ` Richard Guenther @ 2010-05-07 13:27 ` Andrew MacLeod 2010-05-07 14:26 ` Ian Lance Taylor 0 siblings, 1 reply; 27+ messages in thread From: Andrew MacLeod @ 2010-05-07 13:27 UTC (permalink / raw) To: Richard Guenther; +Cc: GCC Richard Guenther wrote: > On Thu, May 6, 2010 at 6:22 PM, Andrew MacLeod <amacleod@redhat.com> wrote: > >> Richard Guenther wrote: >> > > I would have hoped that only data races between independent > objects are covered, thus > > tmp = a.i; > b.j = tmp; > > would qualify as a load of a and a store to b as far as dependencies > are concerned. That would have been consistent with the > exceptions for bitfields and much more friendly to architectures > with weak support for unaligned accesses. > They are independent as far as dependencies within this compilation unit. The problem is if thread number 2 is performing a.j = val b.i = val2 now there are data races on both A and B if we load/store full words and the struct was something like: struct { char i; char j; } a, b; The store to B is particularly unpleasant since you may lose one of the 2 stores. The load data race on A is only in the territory of hardware or software race detectors. >> >> I don't want to disable any more than required. SSA names aren't affected >> since they are local variables only, its only operations on shared memory, >> and I am hopeful that I can minimize the restrictions placed on them. Some >> will be more interesting than others... like CSE... you can still perform >> CSE on a global as long as you don't introduce a NEW load on some execution >> path that didn't have before. What fun. >> > > I don't understand that restriction anyway - how can an extra > load cause a data-race if the result is only used when it was > used before? (You'd need to disable PPRE and GCSE completely > if that's really a problem) > > Thus, > > if (p) > tmp = load; > ... > if (q) > use tmp; > > how can transforming that to > > tmp = load; > ... > if (q) > use tmp; > > ever cause a problem? > If the other thread is doing something like: if (!p) load = something then there was no data race before since your thread wasn't performing a load when this thread was storing. ie, 'p' was being used as the synchronization guard that prevented a data race. When you do the transformation, there is now a potential race that wasn't there before. Some hardware can detect that and trigger an exception, or a software data race detector could trigger, and neither would have before, which means the behaviour is detectably different. Thats also why I've separated the loads and stores for handling separately. under normal circumstances, we want to allow this transformation. If there aren't any detection abilities in play, then the transformation is fine... you can't tell that there was a race. With stores you can actually get different results, so we do need to monitor those. Andrew ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-07 13:27 ` Andrew MacLeod @ 2010-05-07 14:26 ` Ian Lance Taylor 0 siblings, 0 replies; 27+ messages in thread From: Ian Lance Taylor @ 2010-05-07 14:26 UTC (permalink / raw) To: Andrew MacLeod; +Cc: Richard Guenther, GCC Andrew MacLeod <amacleod@redhat.com> writes: > They are independent as far as dependencies within this compilation unit. > The problem is if thread number 2 is performing > a.j = val > b.i = val2 > > now there are data races on both A and B if we load/store full words > and the struct was something like: struct { > char i; > char j; > } a, b; > > The store to B is particularly unpleasant since you may lose one of > the 2 stores. The load data race on A is only in the territory of > hardware or software race detectors. In this exmaple, if we do a word access to a, then we are running past the boundaries of the struct. We can only assume that is OK if a is aligned to a word boundary. And if both a and b are aligned to word boundaries, then there is no problem doing a word access to a. So the only potential problem here is if we have two small variables where one is aligned and the other is not. This is an unusual situation because small variables are not normally aligned. We can avoid trouble by forcing an alignment to a word boundary after every aligned variable. Or so it seems to me. Ian ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-06 15:43 C++0x Memory model and gcc Andrew MacLeod 2010-05-06 15:51 ` Richard Guenther @ 2010-05-06 15:54 ` Joseph S. Myers 2010-05-06 16:12 ` Andrew MacLeod 2010-05-06 20:40 ` Ian Lance Taylor 2 siblings, 1 reply; 27+ messages in thread From: Joseph S. Myers @ 2010-05-06 15:54 UTC (permalink / raw) To: Andrew MacLeod; +Cc: GCC On Thu, 6 May 2010, Andrew MacLeod wrote: > - Implementation of atomic variables and synchronization modes > as well as a new infrastructure to test these types of things. I presume you've read the long thread starting at <http://gcc.gnu.org/ml/gcc/2009-08/msg00199.html> regarding the issues involved in implementing the atomics (involving compiler and libc cooperation to provide stdatomic.h), and in particular ensuring that code built for one CPU remains safe on later CPU variants that may have more native atomic operations. -- Joseph S. Myers joseph@codesourcery.com ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-06 15:54 ` Joseph S. Myers @ 2010-05-06 16:12 ` Andrew MacLeod 0 siblings, 0 replies; 27+ messages in thread From: Andrew MacLeod @ 2010-05-06 16:12 UTC (permalink / raw) To: Joseph S. Myers; +Cc: GCC Joseph S. Myers wrote: > On Thu, 6 May 2010, Andrew MacLeod wrote: > > >> - Implementation of atomic variables and synchronization modes >> as well as a new infrastructure to test these types of things. >> > > I presume you've read the long thread starting at > <http://gcc.gnu.org/ml/gcc/2009-08/msg00199.html> regarding the issues > involved in implementing the atomics (involving compiler and libc > cooperation to provide stdatomic.h), and in particular ensuring that code > built for one CPU remains safe on later CPU variants that may have more > native atomic operations. > > Im not actually doing the implementation of Atomics themselves right now, Lawrence is looking at that. Im focusing on the GCC optimization requirements, changes and testing. I'll leave issues like you point out there to the guys like like that stuff :-) I couldn't understand a lot of the atomic synchronization stuff from existing documentation, so I figured that part might help others understand it better too. It still gives me a headache. Andrew ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-06 15:43 C++0x Memory model and gcc Andrew MacLeod 2010-05-06 15:51 ` Richard Guenther 2010-05-06 15:54 ` Joseph S. Myers @ 2010-05-06 20:40 ` Ian Lance Taylor 2010-05-06 22:02 ` Andrew MacLeod 2010-05-07 13:55 ` Mark Mitchell 2 siblings, 2 replies; 27+ messages in thread From: Ian Lance Taylor @ 2010-05-06 20:40 UTC (permalink / raw) To: Andrew MacLeod; +Cc: GCC Andrew MacLeod <amacleod@redhat.com> writes: > I've been working for a while on understanding how the new memory > model and Atomics work, and what the impacts are on GCC. Thanks for looking at this. One issue I didn't see clearly was how to actually implement this in the compiler. For example, speculated stores are fine for local stack variables, but not for global variables or heap memory. We can implement that in the compiler via a set of tests at each potential speculated store. Or we can implement it via a constraint expressed directly in the IR--perhaps some indicator that this specific store may not merge with conditionals. The latter approach is harder to design but I suspect will be more likely to be reliable over time. The former approach is straightforward to patch into the compiler but can easily degrade as people who don't understand the issues work on the code. I don't agree with your proposed command line options. They seem fine for internal use, but I think very very few users would know when or whether they should use -fno-data-race-stores. I think you should downgrade those options to a --param value, and think about a multi-layered -fmemory-model option. E.g., -fmemory-model=single Assume single threaded execution, which also means no signal handlers. -fmemory-model=fast The user is responsible for all synchronization. Accessing the same memory words from different threads may break unpredictably. -fmemory-model=safe The compiler will do its best to protect you. Ian ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-06 20:40 ` Ian Lance Taylor @ 2010-05-06 22:02 ` Andrew MacLeod 2010-05-07 13:55 ` Mark Mitchell 1 sibling, 0 replies; 27+ messages in thread From: Andrew MacLeod @ 2010-05-06 22:02 UTC (permalink / raw) To: Ian Lance Taylor; +Cc: GCC Ian Lance Taylor wrote: > Andrew MacLeod <amacleod@redhat.com> writes: > > >> I've been working for a while on understanding how the new memory >> model and Atomics work, and what the impacts are on GCC. >> > > Thanks for looking at this. > > One issue I didn't see clearly was how to actually implement this in > the compiler. For example, speculated stores are fine for local stack > variables, but not for global variables or heap memory. We can > implement that in the compiler via a set of tests at each potential > speculated store. Or we can implement it via a constraint expressed > directly in the IR--perhaps some indicator that this specific store > may not merge with conditionals. The latter approach is harder to > design but I suspect will be more likely to be reliable over time. > The former approach is straightforward to patch into the compiler but > can easily degrade as people who don't understand the issues work on > the code. > which is why the ability to regression test it is so important :-). Right now its my intention to modify the optimizations based on the flag settings. Some cases will be quite tricky. If we're CSE'ing something in the absence of atomics, and it is shared memory, it is still possible to move it if there is already a load from that location on all paths. So the optimization itself will need to taught how to figure that out. ie if () a_1 = glob else if () b_2 = glob else c_3 = glob we can still common glob and produce tmp_4 = glob if () a_1 = tmp_4 else if () b_2 = tmp_4 else c_3 = tmp4 all paths loaded glob before, so we can do this safely. but if we had: if () a_1 = glob else if () b_2 = notglob else c_3 = glob then we can no longer do anything since we'd be introducing a new load of 'glob' on the path that sets b_2 which wasn't performed before. If there was another load of glob somewhere before the first 'if', then commoning becomes possible again. Some other cases won't be nearly so tricky, thankfully :-). I do think we need to do it in the optimizations because of some of the complex situations which can arise. We can at least try to do a good job and then punt if it gets too hard. Now, thankfully, on most architectures we care about, hardware detection of data race loads isn't an issue. So most of the time its only the stores that we need to be careful about introducing new ones. Im hoping the actual impact to codegen is low most of the time > I don't agree with your proposed command line options. They seem fine > for internal use, but I think very very few users would know when or > whether they should use -fno-data-race-stores. I think you should > I'm fine with alternatives. I'm focused mostly on the internals and I want an individual flag for each of those things to cleanly separate them out. How we expose it I'm ambivalent about as long as testing can turn it them on and off individually. There will be people using software data race detectors which may want to be able to turn things on or off from the system default. I think -fmemory-model= with options enabling at a minimum some form of 'off', 'system default', and 'on' would probably work for external exposure. Andrew ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-06 20:40 ` Ian Lance Taylor 2010-05-06 22:02 ` Andrew MacLeod @ 2010-05-07 13:55 ` Mark Mitchell 1 sibling, 0 replies; 27+ messages in thread From: Mark Mitchell @ 2010-05-07 13:55 UTC (permalink / raw) To: Ian Lance Taylor; +Cc: Andrew MacLeod, GCC Ian Lance Taylor wrote: > -fmemory-model=single > Assume single threaded execution, which also means no signal > handlers. > -fmemory-model=fast > The user is responsible for all synchronization. Accessing > the same memory words from different threads may break > unpredictably. > -fmemory-model=safe > The compiler will do its best to protect you. That makes sense to me. I think that's an appropriately user-oriented view of the choices. -- Mark Mitchell CodeSourcery mark@codesourcery.com (650) 331-3385 x713 ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc
@ 2010-05-08 14:38 Jean-Marc Bourguet
2010-05-08 20:18 ` Albert Cohen
0 siblings, 1 reply; 27+ messages in thread
From: Jean-Marc Bourguet @ 2010-05-08 14:38 UTC (permalink / raw)
To: gcc
> -fmemory-model=single
> Assume single threaded execution, which also means no signal
> handlers.
> -fmemory-model=fast
> The user is responsible for all synchronization. Accessing
> the same memory words from different threads may break
> unpredictably.
> -fmemory-model=safe
> The compiler will do its best to protect you.
With that description, I'd think that "safe" lets the user code assumes
the sequential consistency model. I'd use -fmemory-model=conformant or
something like that for the model where the compiler assumes that the user
code respect the constraint led out for it by the standard. As which
constraints are put on user code depend on the languages -- Java has its
own memory model which AFAIK is more constraining than C++ and I think Ada
has its own but my Ada programming days are too far for me to comment on
it -- one may prefer some other name.
Yours,
--
Jean-Marc Bourguet
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-08 14:38 Jean-Marc Bourguet @ 2010-05-08 20:18 ` Albert Cohen 2010-05-10 4:40 ` Ian Lance Taylor 0 siblings, 1 reply; 27+ messages in thread From: Albert Cohen @ 2010-05-08 20:18 UTC (permalink / raw) To: gcc Jean-Marc Bourguet wrote: >> -fmemory-model=single >> Assume single threaded execution, which also means no signal >> handlers. >> -fmemory-model=fast >> The user is responsible for all synchronization. Accessing >> the same memory words from different threads may break >> unpredictably. >> -fmemory-model=safe >> The compiler will do its best to protect you. > > With that description, I'd think that "safe" lets the user code assumes > the sequential consistency model. I'd use -fmemory-model=conformant or > something like that for the model where the compiler assumes that the user > code respect the constraint led out for it by the standard. As which > constraints are put on user code depend on the languages -- Java has its > own memory model which AFAIK is more constraining than C++ and I think Ada > has its own but my Ada programming days are too far for me to comment on > it -- one may prefer some other name. I agree. Or even, =c++0x or =gnu++0x On the other hand, I fail to see the differen between =single and =fast, and the explanation about "the same memory word" is not really relevant as memory models typically tell you about concurrent accesses to "different memory words". Albert ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-08 20:18 ` Albert Cohen @ 2010-05-10 4:40 ` Ian Lance Taylor 2010-05-10 17:23 ` Andrew MacLeod 0 siblings, 1 reply; 27+ messages in thread From: Ian Lance Taylor @ 2010-05-10 4:40 UTC (permalink / raw) To: Albert Cohen; +Cc: gcc Albert Cohen <Albert.Cohen@inria.fr> writes: > Jean-Marc Bourguet wrote: >>> -fmemory-model=single >>> Assume single threaded execution, which also means no signal >>> handlers. >>> -fmemory-model=fast >>> The user is responsible for all synchronization. Accessing >>> the same memory words from different threads may break >>> unpredictably. >>> -fmemory-model=safe >>> The compiler will do its best to protect you. >> >> With that description, I'd think that "safe" lets the user code assumes >> the sequential consistency model. I'd use -fmemory-model=conformant or >> something like that for the model where the compiler assumes that the user >> code respect the constraint led out for it by the standard. As which >> constraints are put on user code depend on the languages -- Java has its >> own memory model which AFAIK is more constraining than C++ and I think Ada >> has its own but my Ada programming days are too far for me to comment on >> it -- one may prefer some other name. > > I agree. Or even, =c++0x or =gnu++0x > > On the other hand, I fail to see the differen between =single and > =fast, and the explanation about "the same memory word" is not really > relevant as memory models typically tell you about concurrent accesses > to "different memory words". What I was thinking is that the difference between =single and =fast is that =single permits store speculation. The difference between =fast and =safe/=conformant is that =fast permits writing to a byte by loading a word, changing the byte, and storing the word; in particular, =fast permits write combining in cases where =safe does not. Memory models may not talk about memory words, but they exist nevertheless. Ian ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-10 4:40 ` Ian Lance Taylor @ 2010-05-10 17:23 ` Andrew MacLeod 2010-05-11 6:20 ` Miles Bader 0 siblings, 1 reply; 27+ messages in thread From: Andrew MacLeod @ 2010-05-10 17:23 UTC (permalink / raw) To: gcc On 05/10/2010 12:39 AM, Ian Lance Taylor wrote: > Albert Cohen<Albert.Cohen@inria.fr> writes: > >> >> I agree. Or even, =c++0x or =gnu++0x >> >> On the other hand, I fail to see the differen between =single and >> =fast, and the explanation about "the same memory word" is not really >> relevant as memory models typically tell you about concurrent accesses >> to "different memory words". >> > What I was thinking is that the difference between =single and =fast > is that =single permits store speculation. The difference between > =fast and =safe/=conformant is that =fast permits writing to a byte by > loading a word, changing the byte, and storing the word; in > particular, =fast permits write combining in cases where =safe does > not. > > Memory models may not talk about memory words, but they exist > nevertheless. > > Ian > I've changed the documentation and code to --params suggestion and the following, for now. we can work out the exact wording and other options later. -fmemory-model=c++0x - Disable data races as per architectural requirements to match the standard. -fmemory-model=safe - Disable all data race introductions. (enforce all 4 internal restrictions.) -fmemory-model=single - Enable all data races introductions, as they are today. (relax all 4 internal restrictions.) Andrew ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-10 17:23 ` Andrew MacLeod @ 2010-05-11 6:20 ` Miles Bader 2010-05-11 12:49 ` Andrew MacLeod 0 siblings, 1 reply; 27+ messages in thread From: Miles Bader @ 2010-05-11 6:20 UTC (permalink / raw) To: gcc Andrew MacLeod <amacleod@redhat.com> writes: > -fmemory-model=single - Enable all data races introductions, as they > are today. (relax all 4 internal restrictions.) One could still use this mode with a multi-threaded program as long as explicit synchronization is done, right? -Miles -- Road, n. A strip of land along which one may pass from where it is too tiresome to be to where it is futile to go. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-11 6:20 ` Miles Bader @ 2010-05-11 12:49 ` Andrew MacLeod 2010-05-12 5:21 ` Miles Bader 0 siblings, 1 reply; 27+ messages in thread From: Andrew MacLeod @ 2010-05-11 12:49 UTC (permalink / raw) To: Miles Bader; +Cc: gcc Miles Bader wrote: > Andrew MacLeod <amacleod@redhat.com> writes: > >> -fmemory-model=single - Enable all data races introductions, as they >> are today. (relax all 4 internal restrictions.) >> > > One could still use this mode with a multi-threaded program as long as > explicit synchronization is done, right? > Right. Its just a single processor memory model, so it doesn't limit any optimizations. Andrew ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-11 12:49 ` Andrew MacLeod @ 2010-05-12 5:21 ` Miles Bader 2010-05-12 13:10 ` Andrew MacLeod 0 siblings, 1 reply; 27+ messages in thread From: Miles Bader @ 2010-05-12 5:21 UTC (permalink / raw) To: gcc Andrew MacLeod <amacleod@redhat.com> writes: >>> -fmemory-model=single - Enable all data races introductions, as they >>> are today. (relax all 4 internal restrictions.) >> >> One could still use this mode with a multi-threaded program as long as >> explicit synchronization is done, right? > > Right. Its just a single processor memory model, so it doesn't limit > any optimizations. Hmm, though now that I think about it, I'm not exactly sure what I mean by "explicit synchronization". Standard libraries (boost threads, the upcoming std::thread) provide things like mutexes and conditional-variables, but does using those guarantee that the right things happen with any shared data-structures they're used to coordinate...? Thanks, -Miles -- Vote, v. The instrument and symbol of a freeman's power to make a fool of himself and a wreck of his country. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-12 5:21 ` Miles Bader @ 2010-05-12 13:10 ` Andrew MacLeod 2010-05-17 13:12 ` Michael Matz 0 siblings, 1 reply; 27+ messages in thread From: Andrew MacLeod @ 2010-05-12 13:10 UTC (permalink / raw) To: Miles Bader; +Cc: gcc Miles Bader wrote: > Andrew MacLeod <amacleod@redhat.com> writes: > >>>> -fmemory-model=single - Enable all data races introductions, as they >>>> are today. (relax all 4 internal restrictions.) >>>> >>> One could still use this mode with a multi-threaded program as long as >>> explicit synchronization is done, right? >>> >> Right. Its just a single processor memory model, so it doesn't limit >> any optimizations. >> > > Hmm, though now that I think about it, I'm not exactly sure what I mean > by "explicit synchronization". Standard libraries (boost threads, the > upcoming std::thread) provide things like mutexes and > conditional-variables, but does using those guarantee that the right > things happen with any shared data-structures they're used to > coordinate...? > > Well, you get the same thing you get today. Any synchronization done via a function call will tend to be correct since we never move shared memory operations across calls. Depending on your application, the types of data races the options deal with may not be an issue. Using the options will eliminate having to think whether they are issues or not at a (hopefully) small cost. Since the atomic operations are being built into the compiler, the intent is to eventually optimize and inline them for speed... and in the best case, simply result in a load or store. That's further work of course, but these options are laying some of the groundwork. Andrew ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-12 13:10 ` Andrew MacLeod @ 2010-05-17 13:12 ` Michael Matz 2010-05-17 14:05 ` Ian Lance Taylor 2010-05-17 14:09 ` Andrew MacLeod 0 siblings, 2 replies; 27+ messages in thread From: Michael Matz @ 2010-05-17 13:12 UTC (permalink / raw) To: Andrew MacLeod; +Cc: Miles Bader, gcc Hi, On Wed, 12 May 2010, Andrew MacLeod wrote: > Well, you get the same thing you get today. Any synchronization done > via a function call will tend to be correct since we never move shared > memory operations across calls. Depending on your application, the > types of data races the options deal with may not be an issue. Using > the options will eliminate having to think whether they are issues or > not at a (hopefully) small cost. > > Since the atomic operations are being built into the compiler, the > intent is to eventually optimize and inline them for speed... and in the > best case, simply result in a load or store. That's further work of > course, but these options are laying some of the groundwork. Are you and the other proponents of that memory model seriously proposing it as an alternative to explicit locking via atomic builtins (that map to some form of atomic instructions)? Ciao, Michael. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-17 13:12 ` Michael Matz @ 2010-05-17 14:05 ` Ian Lance Taylor 2010-05-17 14:24 ` Michael Matz 2010-05-17 14:09 ` Andrew MacLeod 1 sibling, 1 reply; 27+ messages in thread From: Ian Lance Taylor @ 2010-05-17 14:05 UTC (permalink / raw) To: Michael Matz; +Cc: Andrew MacLeod, Miles Bader, gcc Michael Matz <matz@suse.de> writes: > On Wed, 12 May 2010, Andrew MacLeod wrote: > >> Well, you get the same thing you get today. Any synchronization done >> via a function call will tend to be correct since we never move shared >> memory operations across calls. Depending on your application, the >> types of data races the options deal with may not be an issue. Using >> the options will eliminate having to think whether they are issues or >> not at a (hopefully) small cost. >> >> Since the atomic operations are being built into the compiler, the >> intent is to eventually optimize and inline them for speed... and in the >> best case, simply result in a load or store. That's further work of >> course, but these options are laying some of the groundwork. > > Are you and the other proponents of that memory model seriously proposing > it as an alternative to explicit locking via atomic builtins (that map to > some form of atomic instructions)? I'm not sure what you mean here. Do you an alternative way to implement the C++0x proposed standard? Or are you questioning the approach taken by the standard? Ian ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-17 14:05 ` Ian Lance Taylor @ 2010-05-17 14:24 ` Michael Matz 2010-05-17 20:22 ` Ian Lance Taylor 0 siblings, 1 reply; 27+ messages in thread From: Michael Matz @ 2010-05-17 14:24 UTC (permalink / raw) To: Ian Lance Taylor; +Cc: Andrew MacLeod, Miles Bader, gcc Hi, On Mon, 17 May 2010, Ian Lance Taylor wrote: > >> Since the atomic operations are being built into the compiler, the > >> intent is to eventually optimize and inline them for speed... and in > >> the best case, simply result in a load or store. That's further work > >> of course, but these options are laying some of the groundwork. > > > > Are you and the other proponents of that memory model seriously > > proposing it as an alternative to explicit locking via atomic builtins > > (that map to some form of atomic instructions)? > > I'm not sure what you mean here. Do you an alternative way to > implement the C++0x proposed standard? I actually see no way to implement the proposed memory model on common hardware, except by emitting locked instructions and memory barriers for all memory accesses to potentially shared (and hence all non-stack) data. And even then it only works on a subset of types, namely those for this the hardware provides such instructions with the associated guarantees. > Or are you questioning the approach taken by the standard? I do, yes. Ciao, Michael. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-17 14:24 ` Michael Matz @ 2010-05-17 20:22 ` Ian Lance Taylor 0 siblings, 0 replies; 27+ messages in thread From: Ian Lance Taylor @ 2010-05-17 20:22 UTC (permalink / raw) To: Michael Matz; +Cc: Andrew MacLeod, Miles Bader, gcc Michael Matz <matz@suse.de> writes: > On Mon, 17 May 2010, Ian Lance Taylor wrote: > >> >> Since the atomic operations are being built into the compiler, the >> >> intent is to eventually optimize and inline them for speed... and in >> >> the best case, simply result in a load or store. That's further work >> >> of course, but these options are laying some of the groundwork. >> > >> > Are you and the other proponents of that memory model seriously >> > proposing it as an alternative to explicit locking via atomic builtins >> > (that map to some form of atomic instructions)? >> >> I'm not sure what you mean here. Do you an alternative way to >> implement the C++0x proposed standard? > > I actually see no way to implement the proposed memory model on common > hardware, except by emitting locked instructions and memory barriers for > all memory accesses to potentially shared (and hence all non-stack) data. > And even then it only works on a subset of types, namely those for this > the hardware provides such instructions with the associated guarantees. I'm sure the C++ standards committee would like to hear a case for why the proposal is unusable. The standard has not yet been voted out. As far as I understand the proposal, though, your statement turns out not to be the case. Those locked instructions and memory barries are only required for loads and stores to atomic types, not to all types. Ian ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-17 13:12 ` Michael Matz 2010-05-17 14:05 ` Ian Lance Taylor @ 2010-05-17 14:09 ` Andrew MacLeod 2010-05-17 14:55 ` Michael Matz 1 sibling, 1 reply; 27+ messages in thread From: Andrew MacLeod @ 2010-05-17 14:09 UTC (permalink / raw) To: Michael Matz; +Cc: Miles Bader, gcc Michael Matz wrote: > Hi, > > On Wed, 12 May 2010, Andrew MacLeod wrote: > > >> Well, you get the same thing you get today. Any synchronization done >> via a function call will tend to be correct since we never move shared >> memory operations across calls. Depending on your application, the >> types of data races the options deal with may not be an issue. Using >> the options will eliminate having to think whether they are issues or >> not at a (hopefully) small cost. >> >> Since the atomic operations are being built into the compiler, the >> intent is to eventually optimize and inline them for speed... and in the >> best case, simply result in a load or store. That's further work of >> course, but these options are laying some of the groundwork. >> > > Are you and the other proponents of that memory model seriously proposing > it as an alternative to explicit locking via atomic builtins (that map to > some form of atomic instructions)? > > Proposing what as an alternative? These optimization restrictions defined by the memory model are there to create predictable memory behaviour across threads. This is applicable when you use the atomic built-ins for locking. Especially in the case when the atomic operation is inlined. One goal is to have unoptimized program behaviour be consistent with the optimized version. If the optimizers introduce new data races, there is a potential behaviour difference. Lock free data structures which utilize the atomic built-ins but do not require explicit locking are potential applications built on top of that. Andrew ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-17 14:09 ` Andrew MacLeod @ 2010-05-17 14:55 ` Michael Matz 2010-05-17 16:36 ` Andrew MacLeod 0 siblings, 1 reply; 27+ messages in thread From: Michael Matz @ 2010-05-17 14:55 UTC (permalink / raw) To: Andrew MacLeod; +Cc: Miles Bader, gcc Hi, On Mon, 17 May 2010, Andrew MacLeod wrote: > > > Well, you get the same thing you get today. Any synchronization > > > done via a function call will tend to be correct since we never move > > > shared memory operations across calls. Depending on your > > > application, the types of data races the options deal with may not > > > be an issue. Using the options will eliminate having to think > > > whether they are issues or not at a (hopefully) small cost. > > > > > > Since the atomic operations are being built into the compiler, the > > > intent is to eventually optimize and inline them for speed... and in > > > the best case, simply result in a load or store. That's further work > > > of course, but these options are laying some of the groundwork. > > > > > > > Are you and the other proponents of that memory model seriously > > proposing it as an alternative to explicit locking via atomic builtins > > (that map to some form of atomic instructions)? > > Proposing what as an alternative? The guarantees you seem to want to establish by the proposed memory model. Possibly I misunderstood. I'm not 100% sure on the guarantees you want to establish. The proposed model seems to merge multiple concepts together, all related to memory access ordering and atomicity, but with different scope and difficulty to guarantee. The mail to which I reacted seemed to me to imply that you would believe the guarantees from the memory model alone would relieve users from writing explicit atomic instructions for data synchronization. If you didn't imply that, then I'm also interested to learn what other advantages you expect to derive from the guarantees. And third, I'm interested to learn how you intend to actually guarantee the guarantees given by the model. So, in short, I'd like to know What (guarantees are established), Why (are those sensible and useful), and How (are those intended to be implemented) I've tried to find this in the Wiki, but it only states some broad goals it seems (not introduce data races). I also find the papers of Boehm somewhat lacking when it comes to how to actually implement the whole model on hardware, especially because he himself acknowledges the obvious problems on real hardware, like: * load/store reorder buffers, * store->load forwarding, * cache-line granularity even for strict coherency models, * existence of weak coherency machines (have to aquire whole cache line for exlusive write) * general slowness of locked (or atomic) instructions compared to normal stores/loads * existence of store granularity on some hardware (we don't even have to enter bit-field business, alpha e.g. has only 64 bit accesses) But for all of these to be relevant questions we first need to know what exactly are the intended guarantees of that model; say from the perspective of observable behaviour from other threads. > These optimization restrictions defined by the memory model are there to > create predictable memory behaviour across threads. With our without use of atomics? I.e. is the mem behaviour supposed to be predictable also in absense of all mentions of explicitely written atomic builtins? And you need to defined predictable. Predictable == behaves according to rules. What are the rules? > This is applicable when you use the atomic built-ins for locking. > Especially in the case when the atomic operation is inlined. One goal > is to have unoptimized program behaviour be consistent with the > optimized version. We have that now (because atomics are memory barriers), so that's probably not why the model was devised. Ciao, Michael. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: C++0x Memory model and gcc 2010-05-17 14:55 ` Michael Matz @ 2010-05-17 16:36 ` Andrew MacLeod 0 siblings, 0 replies; 27+ messages in thread From: Andrew MacLeod @ 2010-05-17 16:36 UTC (permalink / raw) To: Michael Matz; +Cc: Miles Bader, gcc Michael Matz wrote: > Hi, > > On Mon, 17 May 2010, Andrew MacLeod wrote: > > The guarantees you seem to want to establish by the proposed memory model. > Possibly I misunderstood. > > I'm not 100% sure on the guarantees you want to establish. The proposed > model seems to merge multiple concepts together, all related to > memory access ordering and atomicity, but with different scope and > difficulty to guarantee. > I think the standard is excessively confusing, and overly academic. I even find the term memory model adds to the confusion. Some effort was clearly involved in defining behaviour for hardware which does not yet exist, but the language is "prepared" for. I was particularly unhappy that they merged the whole synchronization thing to an atomic load or store, at least originally. I would hazard a guess that it evolved to this state based on an observation that synchronization is almost inevitably required when an atomic is being accessed. Thats just a guess however. However, there is some fundamental goodness in it once you sort through it. Lets see if I can paraphrase normal uses and map them to the standard :-) The normal case would be when you have a system wide lock, and when you acquire the lock, you expect everything which occurred before the lock to be completed. ie process1 : otherglob = 2; global = 10; set atomic_lock(1); process2: wait (atomic_lock() == 1); print (global) you expect 'global' in process 2 to always be 10. You are in effect using the lock as a ready flag for global. In order for that to happen in a consistent manner, there is more involved than just waiting for the lock. If process 1 and 2 are running on different machines, process 1 will have to flush its cache all the way to memory, and process 2 will have to wait for that to complete and visible before it can proceed with allowing the proper value of global to be loaded. Otherwise the results will not be as expected. Thats the synchronization model which maps to the default or 'sequentially consistent' C++ model. The cache flushing and whatever else is required is built into the library routines for performing atomic loads and stores. There is no mechanism to specify that this lock is for the value of 'global', so the standard extends the definition of the lock to say it applies to *all* shared memory before the atomic lock value is set. so process3: wait (atomic_lock() == 1) print (otherglob); will also work properly. This memory model will always involve some form of synchronization instructions, and potentially waiting on other hardware to complete. I don't know much about this , but Im told machines are starting to provide instructions to accomplish this type of synchronization. The obvious conclusion is that once the hardware starts to be able to do this synchronization with a few instructions, the entire library call to set or read an atomic and perform synchronization may be inlinable without having a call of any kind, just straight line instructions. At this point, the optimizer will need to understand that those instructions are barriers. If you are using an atomic variable simply as an variable, and don't care about the synchronization aspects (ie, you just want to always see a valid value for the variable), then that maps to the 'relaxed' mode. There may be some academic babble about certain provisions, but this is effectively what it boils down to. The relaxed mode is what you use when you don't care about all that memory flushing and just want to see the values of the atomic itself. So this is the fastest model, but don't depend on the values of other shared variables. This is also what you get when you use the basic atomic store and load macros in C. The sequential mode has the possibility of being VERY slow if you have a widely distributed system. Thats where the third mode comes in, the release/acquire model. Proper utilization of it can remove many of the waits present in the sequential model since different processes don't have to wait for *all* cache flushes, just ones directly related to a specific atomic variable in a specific other process. The model is provided to allow code to run more efficiently, but requires a better understanding of the subtleties of multi-processor side effects in the code you write. I still don't really get it completely, but I'm not implementing the synchronization parts, so I only need to understand some of it :-) It is possible to optimize these operations, ie you can do CSE and dead store elimination which can also help the code run faster. That comes later tho. The optimization flags I'm currently working on are orthogonal to all this, even though it uses the term memory-model. When a program is written for multi-processing the programmer usually attempts to write it such that there are no data races, otherwise there may be inconsistencies during execution. If a program has been developed and is data race free, the flags are meant to guarantee that the resulting code will also be data race free, regardless of whether optimizations is on or off. Does that make anything clearer? Its true that a bunch of these things are all intertwined, and that's one of the reasons it comes across as being so complicated. Its up to the library guys to make whatever process synchronization is required to happen, I leave that to them. They say they have a handle on it, we'll see. When they do, then we might get to inline it and do some interesting things. Andrew ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2010-05-17 20:22 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-05-06 15:43 C++0x Memory model and gcc Andrew MacLeod 2010-05-06 15:51 ` Richard Guenther 2010-05-06 16:11 ` Richard Guenther 2010-05-06 16:23 ` Andrew MacLeod 2010-05-07 9:15 ` Richard Guenther 2010-05-07 13:27 ` Andrew MacLeod 2010-05-07 14:26 ` Ian Lance Taylor 2010-05-06 15:54 ` Joseph S. Myers 2010-05-06 16:12 ` Andrew MacLeod 2010-05-06 20:40 ` Ian Lance Taylor 2010-05-06 22:02 ` Andrew MacLeod 2010-05-07 13:55 ` Mark Mitchell 2010-05-08 14:38 Jean-Marc Bourguet 2010-05-08 20:18 ` Albert Cohen 2010-05-10 4:40 ` Ian Lance Taylor 2010-05-10 17:23 ` Andrew MacLeod 2010-05-11 6:20 ` Miles Bader 2010-05-11 12:49 ` Andrew MacLeod 2010-05-12 5:21 ` Miles Bader 2010-05-12 13:10 ` Andrew MacLeod 2010-05-17 13:12 ` Michael Matz 2010-05-17 14:05 ` Ian Lance Taylor 2010-05-17 14:24 ` Michael Matz 2010-05-17 20:22 ` Ian Lance Taylor 2010-05-17 14:09 ` Andrew MacLeod 2010-05-17 14:55 ` Michael Matz 2010-05-17 16:36 ` Andrew MacLeod
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).