Implementing C++1x and C1x atomics

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

* Implementing C++1x and C1x atomics
       [not found]               ` <84fc9c000908121127o6588fe52u581fc62bfb8cba9e@mail.gmail.com>
@ 2009-08-12 21:07                 ` Joseph S. Myers
  2009-08-13  8:44                   ` Lawrence Crowl
  0 siblings, 1 reply; 19+ messages in thread
From: Joseph S. Myers @ 2009-08-12 21:07 UTC (permalink / raw)
  To: Richard Guenther
  Cc: Boehm, Hans, Andrew Haley, Paolo Bonzini, gcc, libc-alpha

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2702 bytes --]

(moved from gcc-patches to gcc and libc-alpha)

On Wed, 12 Aug 2009, Richard Guenther wrote:

> On Wed, Aug 12, 2009 at 8:24 PM, Boehm, Hans<hans.boehm@hp.com> wrote:
> > [Partially replying to myself]
> >> From:  Boehm, Hans
> >>
> >> At the risk of asking a stupid question, shouldn't all the
> >> code inside gcc gradually migrate towards using the C++0x
> >> (and probably C1x) atomics, which seem to be generally
> >> supported by gcc 4.4?
> >>
> >> There are known issues with __sync (no atomic loads and
> >> stores, underspecified ordering), which is why there wasn't
> >> much of an effort topush the __sync interface into C++0x.
> >>
> >> Hans
> >>
> > OK.  That was largely a stupid question, since we're talking about the 
> > compiler implementation of those primitives, which presumably are 
> > shared with the atomic<T> implementation?
> 
> I'm not aware of a proper implementation of the C++1x atomics or the memory
> model for gcc.

And a proper implementation would sure have underlying built-in functions, 
added to the __sync_* set.

The C1x atomics specification 
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1349.htm> does not 
mention any amendment to the list of headers to be provided by 
freestanding implementations (clause 4 paragraph 6), which suggests that 
providing this header is actually in the domain of libc, not the compiler 
(which accords with it providing (generic) library functions, which the 
headers for freestanding implementations do not).  It seems clear however 
that close cooperation with the compiler will be needed in the 
implementation to ensure that the right semantics are available from the 
compiler for the header to use.

The information the compiler should provide the library would include 
which operations are available built-in or in libgcc and which the C 
library has to emulate with locks (if e.g. a particular architecture or 
subarchitecture does not have 64-bit atomic operations).  Given how many 
variations there are on what targets support and how, and given that 
what's supported may depend on -march etc. options passed when compiling 
code that includes the header, I think a single version of the header for 
all targets, that contains code conditional on various predefined macros, 
would be a good ideal to aim for.

The compiler would also need to meet the underlying memory model 
requirements - avoiding optimizations that break the memory model 
assumptions (writes to locations that may not be written in the memory 
model, in particular) unless given an option to say it doesn't need to 
follow the model.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing C++1x and C1x atomics
  2009-08-12 21:07                 ` Implementing C++1x and C1x atomics Joseph S. Myers
@ 2009-08-13  8:44                   ` Lawrence Crowl
  2009-08-13 13:23                     ` Joseph S. Myers
  0 siblings, 1 reply; 19+ messages in thread
From: Lawrence Crowl @ 2009-08-13  8:44 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Richard Guenther, Boehm, Hans, Andrew Haley, Paolo Bonzini, gcc,
	libc-alpha

On 8/12/09, Joseph S. Myers <joseph@codesourcery.com> wrote:
> (moved from gcc-patches to gcc and libc-alpha)
>
> On Wed, 12 Aug 2009, Richard Guenther wrote:
> > On Aug 12, 2009, Boehm, Hans<hans.boehm@hp.com> wrote:
> > > [Partially replying to myself]
> > > > From:  Boehm, Hans
> > > >
> > > > At the risk of asking a stupid question, shouldn't all the
> > > > code inside gcc gradually migrate towards using the C++0x
> > > > (and probably C1x) atomics, which seem to be generally
> > > > supported by gcc 4.4?
> > > >
> > > > There are known issues with __sync (no atomic loads and
> > > > stores, underspecified ordering), which is why there wasn't
> > > > much of an effort to push the __sync interface into C++0x.
> > >
> > > OK.  That was largely a stupid question, since we're talking
> > > about the compiler implementation of those primitives, which
> > > presumably are shared with the atomic<T> implementation?

I worked very hard to C and C++ able to share the atomics.  If for
some reason we cannot share them, I want to know so that I can
fix it.

> > I'm not aware of a proper implementation of the C++1x atomics
> > or the memory model for gcc.

The atomic types and operations themselves are doable in relatively
short order, but the full memory model will take a while because
it requires that some optimizations change, which means auditing
the compiler back ends.

> And a proper implementation would sure have underlying built-in
> functions, added to the __sync_* set.

Or perhaps different names, but yes.

> The C1x atomics specification
> <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1349.htm> does
> not mention any amendment to the list of headers to be provided by
> freestanding implementations (clause 4 paragraph 6), which suggests
> that providing this header is actually in the domain of libc, not
> the compiler (which accords with it providing (generic) library
> functions, which the headers for freestanding implementations
> do not).  It seems clear however that close cooperation with the
> compiler will be needed in the implementation to ensure that the
> right semantics are available from the compiler for the header
> to use.

I think the C++ specification currently has the same problem.
The atomics are really a compiler issue.

> The information the compiler should provide the library would
> include which operations are available built-in or in libgcc
> and which the C library has to emulate with locks (if e.g. a
> particular architecture or subarchitecture does not have 64-bit
> atomic operations).  Given how many variations there are on what
> targets support and how, and given that what's supported may
> depend on -march etc. options passed when compiling code that
> includes the header, I think a single version of the header for
> all targets, that contains code conditional on various predefined
> macros, would be a good ideal to aim for.

The compiler must recognize all the atomic operations, so that it can
respect their implications, which means that they must be intrinsics.
Then there is direct instruction support for all the operations of
a type, it should emit that code.  When not, the compiler should
emit a call into a shared library that comes with the system.
You really do not want to take the risk of getting two different
implementations of the atomics.  They will fail to synchronize with
each other, and result in intermitent concurrency bugs.  Yuck.

> The compiler would also need to meet the underlying memory model
> requirements - avoiding optimizations that break the memory model
> assumptions (writes to locations that may not be written in the
> memory model, in particular) unless given an option to say it
> doesn't need to follow the model.

My recommendation is to just do a good enough job on _changing_
the optimizations so that you don't need an option.  It would
probably avoid some rather elusive bugs when someone links in a
library compiled the wrong way.

-- 
Lawrence Crowl

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing C++1x and C1x atomics
  2009-08-13  8:44                   ` Lawrence Crowl
@ 2009-08-13 13:23                     ` Joseph S. Myers
  2009-08-14  0:03                       ` Lawrence Crowl
  0 siblings, 1 reply; 19+ messages in thread
From: Joseph S. Myers @ 2009-08-13 13:23 UTC (permalink / raw)
  To: Lawrence Crowl
  Cc: Richard Guenther, Boehm, Hans, Andrew Haley, Paolo Bonzini, gcc,
	libc-alpha

On Wed, 12 Aug 2009, Lawrence Crowl wrote:

> > The C1x atomics specification
> > <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1349.htm> does
> > not mention any amendment to the list of headers to be provided by
> > freestanding implementations (clause 4 paragraph 6), which suggests
> > that providing this header is actually in the domain of libc, not
> > the compiler (which accords with it providing (generic) library
> > functions, which the headers for freestanding implementations
> > do not).  It seems clear however that close cooperation with the
> > compiler will be needed in the implementation to ensure that the
> > right semantics are available from the compiler for the header
> > to use.
> 
> I think the C++ specification currently has the same problem.
> The atomics are really a compiler issue.

In that it defines functions, <stdatomic.h> is unlike all the headers 
presently required of freestanding implementations, and seems much more 
like <tgmath.h> - a header that involves library functions, depends on the 
library to some extent (if you implement the ISO 24747 functions then that 
has additions to <tgmath.h> as well as <math.h>) but is nevertheless 
almost entirely compiler-specific.

> > The information the compiler should provide the library would
> > include which operations are available built-in or in libgcc
> > and which the C library has to emulate with locks (if e.g. a
> > particular architecture or subarchitecture does not have 64-bit
> > atomic operations).  Given how many variations there are on what
> > targets support and how, and given that what's supported may
> > depend on -march etc. options passed when compiling code that
> > includes the header, I think a single version of the header for
> > all targets, that contains code conditional on various predefined
> > macros, would be a good ideal to aim for.
> 
> The compiler must recognize all the atomic operations, so that it can
> respect their implications, which means that they must be intrinsics.

It is of course valid for a user program to wrap a call to an atomic 
operation in a call to its own function that calls an intrinsic - and a 
macro or inline function in the header could just as much use intrinsics 
in some cases and fall back to library functions, or more complicated 
sequences of intrinsics, in other cases.  It doesn't seem immediately 
obvious whether the compiler should provide intrinsics for each case and 
handle falling back to the library as needed or whether the header should 
handle fallback.  If the compiler provides all the intrinsics, it needs to 
agree with the library on what all the underlying functions are.

> Then there is direct instruction support for all the operations of
> a type, it should emit that code.  When not, the compiler should
> emit a call into a shared library that comes with the system.

Existing practice is to use static-only libgcc functions for __sync_* 
where they go in libgcc (for ARM, SH and PA GNU/Linux, where kernel help 
is needed for atomic operations in some cases), not shared libraries.  
But those are cases where atomic operations are supported with kernel help 
rather than ones where you are trying to make a userspace emulation with 
locks - and it's the kernel's job to make the kernel-supported 
implementations interoperate with native hardware instructions.  "comes 
with the system" could mean either libgcc or libc (or a vDSO provided by 
the kernel).

> You really do not want to take the risk of getting two different
> implementations of the atomics.  They will fail to synchronize with
> each other, and result in intermitent concurrency bugs.  Yuck.

Is the point here that it's problematic to use real atomic operations on 
some subarchitectures that support them for a given type but emulations 
using locks for other subarchitectures, because they will not interoperate 
properly?  That does seem a good point, that which types direct atomic 
operations are used on must be considered part of the platform ABI, and so 
if newer hardware adds 64-bit atomic operations (say) they must still not 
be used if some other code (accessing the same object) might be using 
emulations.

Since static libgcc is generally useful and so programs may end up using 
functions from more than one version of libgcc your suggestion of putting 
functions in a shared library to avoid this issue would imply:

* The out-of-line functions go in shared libc.

* The header therefore comes with libc.

* The header never uses an inline operation when compiling for a 
particular subarchitecture unless the corresponding version of libc, when 
executing on hardware capable of executing code for that subarchitecture, 
will always use an atomic operation that interoperates correctly with the 
header.  (libc might need in some cases to determine the hardware in use 
at runtime.)

So whether an operation is inlined would be a function both of what the 
compiler knows the hardware supports and what the header knows about what 
libc will do at runtime.  (But much of the complexity only arises when a 
single ABI supports hardware with different sets of atomic operations.)

(Using functions present in libgcc_s but not static libgcc might be 
possible here as an alternative to using libc, but libc can more readily 
access HWCAP information for hardware identification at runtime than 
libgcc can.)

> > The compiler would also need to meet the underlying memory model
> > requirements - avoiding optimizations that break the memory model
> > assumptions (writes to locations that may not be written in the
> > memory model, in particular) unless given an option to say it
> > doesn't need to follow the model.
> 
> My recommendation is to just do a good enough job on _changing_
> the optimizations so that you don't need an option.  It would
> probably avoid some rather elusive bugs when someone links in a
> library compiled the wrong way.

I'm thinking of such an option as being one to use when building a 
single-threaded program, not for use when building libraries.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing C++1x and C1x atomics
  2009-08-13 13:23                     ` Joseph S. Myers
@ 2009-08-14  0:03                       ` Lawrence Crowl
  2009-08-14  2:31                         ` Joseph S. Myers
  0 siblings, 1 reply; 19+ messages in thread
From: Lawrence Crowl @ 2009-08-14  0:03 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Richard Guenther, Boehm, Hans, Andrew Haley, Paolo Bonzini, gcc,
	libc-alpha

On 8/12/09, Joseph S. Myers <joseph@codesourcery.com> wrote:
> On Wed, 12 Aug 2009, Lawrence Crowl wrote:
> > > The C1x atomics specification
> > > <http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1349.htm>
> > > does not mention any amendment to the list of headers to be
> > > provided by freestanding implementations (clause 4 paragraph
> > > 6), which suggests that providing this header is actually in
> > > the domain of libc, not the compiler (which accords with it
> > > providing (generic) library functions, which the headers for
> > > freestanding implementations do not).  It seems clear however
> > > that close cooperation with the compiler will be needed in
> > > the implementation to ensure that the right semantics are
> > > available from the compiler for the header to use.
> >
> > I think the C++ specification currently has the same problem.
> > The atomics are really a compiler issue.
>
> In that it defines functions, <stdatomic.h> is unlike all the
> headers presently required of freestanding implementations,

But <exception>, <new>, and <typeinfo> all define functions.

> and seems much more like <tgmath.h> - a header that involves
> library functions, depends on the library to some extent (if
> you implement the ISO 24747 functions then that has additions
> to <tgmath.h> as well as <math.h>) but is nevertheless almost
> entirely compiler-specific.

We agree that is almost entirely compiler-specific.

> > > The information the compiler should provide the library would
> > > include which operations are available built-in or in libgcc
> > > and which the C library has to emulate with locks (if e.g. a
> > > particular architecture or subarchitecture does not have 64-bit
> > > atomic operations).  Given how many variations there are on
> > > what targets support and how, and given that what's supported
> > > may depend on -march etc. options passed when compiling code
> > > that includes the header, I think a single version of the
> > > header for all targets, that contains code conditional on
> > > various predefined macros, would be a good ideal to aim for.
> >
> > The compiler must recognize all the atomic operations, so that
> > it can respect their implications, which means that they must
> > be intrinsics.
>
> It is of course valid for a user program to wrap a call to an
> atomic operation in a call to its own function that calls an
> intrinsic - and a macro or inline function in the header could
> just as much use intrinsics in some cases and fall back to library
> functions, or more complicated sequences of intrinsics, in other
> cases.  It doesn't seem immediately obvious whether the compiler
> should provide intrinsics for each case and handle falling back to
> the library as needed or whether the header should handle fallback.

I think it is best if the compiler provides all of the intrinsics.
This approach will enable a smoother upgrade to newer processors.

> If the compiler provides all the intrinsics, it needs to agree
> with the library on what all the underlying functions are.

We really need to clarify what "the library" is here.  If you mean
the C++ standard library, as delivered by Dinkumware, RogueWave or
STLport, then I think the answer is definitely not.  Those libraries
should just use the atomic header as though they were client code.

If you mean the OS-supplied platform-dependent library, then I
think the answer is yes.  The names of the routines that back up
the intrinsics should be part of the platform ABI.

> > Then there is direct instruction support for all the operations
> > of a type, it should emit that code.  When not, the compiler
> > should emit a call into a shared library that comes with
> > the system.
>
> Existing practice is to use static-only libgcc functions for
> __sync_* where they go in libgcc (for ARM, SH and PA GNU/Linux,
> where kernel help is needed for atomic operations in some cases),
> not shared libraries.  But those are cases where atomic operations
> are supported with kernel help rather than ones where you are
> trying to make a userspace emulation with locks - and it's
> the kernel's job to make the kernel-supported implementations
> interoperate with native hardware instructions.  "comes with the
> system" could mean either libgcc or libc (or a vDSO provided by
> the kernel).
>
> > You really do not want to take the risk of getting two different
> > implementations of the atomics.  They will fail to synchronize
> > with each other, and result in intermitent concurrency bugs.
> > Yuck.
>
> Is the point here that it's problematic to use real atomic
> operations on some subarchitectures that support them for a given
> type but emulations using locks for other subarchitectures, because
> they will not interoperate properly?  That does seem a good point,
> that which types direct atomic operations are used on must be
> considered part of the platform ABI, and so if newer hardware adds
> 64-bit atomic operations (say) they must still not be used if some
> other code (accessing the same object) might be using emulations.

That is the point.  Any type that is not known to have a direct
implementation, should be implemented on the platform via a library.
That way all uses of that type will share the same implementation,
which may be busy-waiting on older machines and atomic instructions
on newer machines.

> Since static libgcc is generally useful and so programs may end
> up using functions from more than one version of libgcc your
> suggestion of putting functions in a shared library to avoid this
> issue would imply:

> * The out-of-line functions go in shared libc.

Yes.

> * The header therefore comes with libc.

I don't think we need a header.  These calls are directly generated
by the compiler, not referenced by the user.

> * The header never uses an inline operation when compiling for a
> particular subarchitecture unless the corresponding version of
> libc, when executing on hardware capable of executing code for
> that subarchitecture, will always use an atomic operation that
> interoperates correctly with the header.  (libc might need in
> some cases to determine the hardware in use at runtime.)

I'm not quite following that.  Any any event, since I don't see
the need for a header, I think it is moot.

> So whether an operation is inlined would be a function both
> of what the compiler knows the hardware supports and what the
> header knows about what libc will do at runtime.  (But much of
> the complexity only arises when a single ABI supports hardware
> with different sets of atomic operations.)

The i386 architecture will run on later processors.  A dynamic
library on those later processors can implement the atomics via
atomic instructions rather than locking.

> (Using functions present in libgcc_s but not static libgcc might
> be possible here as an alternative to using libc, but libc can
> more readily access HWCAP information for hardware identification
> at runtime than libgcc can.)

My preference would be to put all this information in libc.  We want
all compilers, regardless of vendor, to use the same functions on
a given platform.

> > > The compiler would also need to meet the underlying memory
> > > model requirements - avoiding optimizations that break the
> > > memory model assumptions (writes to locations that may not
> > > be written in the memory model, in particular) unless given
> > > an option to say it doesn't need to follow the model.
> >
> > My recommendation is to just do a good enough job on _changing_
> > the optimizations so that you don't need an option.  It would
> > probably avoid some rather elusive bugs when someone links in
> > a library compiled the wrong way.
>
> I'm thinking of such an option as being one to use when building
> a single-threaded program, not for use when building libraries.

That is the scenario that works.  However, my experience is that
if there is no option to get back to the old optimization, there
is more incentive to do a good job on the new optimization.  :-)

Furthermore, fewer options yields a better user experience, and
very few users will fine-tune their options, so it is usually better
for the compiler implementors to do more work to avoid the option.

-- 
Lawrence Crowl

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing C++1x and C1x atomics
  2009-08-14  0:03                       ` Lawrence Crowl
@ 2009-08-14  2:31                         ` Joseph S. Myers
  2009-08-14  7:14                           ` Lawrence Crowl
  0 siblings, 1 reply; 19+ messages in thread
From: Joseph S. Myers @ 2009-08-14  2:31 UTC (permalink / raw)
  To: Lawrence Crowl
  Cc: Richard Guenther, Boehm, Hans, Andrew Haley, Paolo Bonzini, gcc,
	libc-alpha

On Thu, 13 Aug 2009, Lawrence Crowl wrote:

> > In that it defines functions, <stdatomic.h> is unlike all the
> > headers presently required of freestanding implementations,
> 
> But <exception>, <new>, and <typeinfo> all define functions.

I'm not familiar with the C++ requirements for freestanding 
implementations, so am just comparing with the requirements for C.

> If you mean the OS-supplied platform-dependent library, then I
> think the answer is yes.  The names of the routines that back up
> the intrinsics should be part of the platform ABI.

I believe two relevant points are:

* An implementation with kernel help (such as those for ARM, SH and PA 
GNU/Linux presently in libgcc), that is guaranteed to interoperate 
correctly with atomic instructions added in later subarchitectures or 
present in some subarchitectures, can be considered equivalent to a 
hardware instruction for most purposes; in particular, there is no need 
for programs to use only one such implementation and having them in libgcc 
is fine.  It's only lock-based implementations that might have 
interoperation problems that need to go in libc.

* libc only needs to export these functions for types that lack the 
operations in hardware on at least some subarchitectures.  This will mean 
that the libc ABI does not generally need to contain the 1-byte, 2-byte or 
4-byte operations, but on some targets it will need to export functions 
for 8-byte operations.  These functions will in general have 
target-specific definitions, and certainly would appear in the 
target-specific Versions files.

> > * The header therefore comes with libc.
> 
> I don't think we need a header.  These calls are directly generated
> by the compiler, not referenced by the user.

The header I am referring to is the C header <stdatomic.h> that C1x users 
wanting atomic operations should be using.

> > * The header never uses an inline operation when compiling for a
> > particular subarchitecture unless the corresponding version of
> > libc, when executing on hardware capable of executing code for
> > that subarchitecture, will always use an atomic operation that
> > interoperates correctly with the header.  (libc might need in
> > some cases to determine the hardware in use at runtime.)
> 
> I'm not quite following that.  Any any event, since I don't see
> the need for a header, I think it is moot.

Suppose you have an architecture X.  Processors A, B and C for this 
architecture do not have 8-byte atomic operations, so glibc 2.12 provides 
a fallback lock-based implementation in the port to X.  GCC 4.6, 
targetting X (processors A, B and C), together with the stdatomic.h 
header, generates code using the fallback functions, and everything works 
OK.

Now a processor D for this architecture comes out.  All code for A, B and 
C will work on D, but D also has 8-byte atomic operations.  GCC 4.7, with 
-march=D, generates code that uses these operations inline.  If code built 
with GCC 4.7 -march=D, and code built with GCC 4.6 or without -march=D, 
are used together with the glibc 2.12 shared library, both implementations 
of the atomic operations are now used and things don't work.

glibc 2.13 changes the out-of-line implementation to test at runtime 
whether it is running on D, and use the new instruction instead of the 
lock-based implementation if so (probably using STT_GNU_IFUNC so this test 
is only run the first time the symbol is resolved).  That new glibc will 
now work with objects built with either 4.6 or 4.7.

But on GNU/Linux - unlike BSDs, say - it is expected that the compiler, 
libc and kernel versions can be updated more or less independently, and 
that it should be possible to use a newer compiler to build code that will 
run with an older C library.  So the case of GCC 4.7 with glibc 2.12 needs 
to work.  This means that code built with GCC 4.7 against the 
<stdatomic.h> header provided with glibc 2.12 must not use the 8-byte 
atomic instruction that GCC 4.7 knows how to use, because glibc 2.12 will 
not use it in the out-of-line implementation at runtime.

Are you proposing to avoid this issue by saying that the platform ABI for 
GNU/Linux on an X processor is that the 8-byte operations must never be 
inlined, and so making GCC not use the inline operations with -march=D 
(for GNU/Linux - it might be different for another OS)?  That would work, 
but I don't think it's necessary (and these are operations you'd really 
like to use as few instructions as possible, so avoiding shared library 
overhead if a single inline instruction will do) if you require programs 
to go via the standard <stdatomic.h> header.  You could have libc provide 
<stdatomic.h> that does

#include <bits/stdatomic.h>

#ifndef __atomic_whatever_8
#define __atomic_whatever_8 __builtin_atomic_whatever_8
#endif

(and then uses __atomic_whatever_8 in the implementation of the 
type-generic macro), and <bits/stdatomic.h> would in glibc 2.12 for X do

#define __atomic_whatever_8 __out_of_line_atomic_whatever_8

(repeated for each type that may lack atomic instructions on X)

and in 2.13, knowing what implementations are present in libc, it could 
instead do

#ifndef __arch_D__
#define __atomic_whatever_8 __out_of_line_atomic_whatever_8
#endif

and adjust that condition in future versions if there are other future 
variants, not defining __arch_D__, for which libc uses a hardware atomic 
operation.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing C++1x and C1x atomics
  2009-08-14  2:31                         ` Joseph S. Myers
@ 2009-08-14  7:14                           ` Lawrence Crowl
  2009-08-14  9:37                             ` Joseph S. Myers
  0 siblings, 1 reply; 19+ messages in thread
From: Lawrence Crowl @ 2009-08-14  7:14 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Richard Guenther, Boehm, Hans, Andrew Haley, Paolo Bonzini, gcc,
	libc-alpha

On 8/13/09, Joseph S. Myers <joseph@codesourcery.com> wrote:
> On Thu, 13 Aug 2009, Lawrence Crowl wrote:
> > > In that it defines functions, <stdatomic.h> is unlike all the
> > > headers presently required of freestanding implementations,
> >
> > But <exception>, <new>, and <typeinfo> all define functions.
>
> I'm not familiar with the C++ requirements for freestanding
> implementations, so am just comparing with the requirements for C.
>
> > If you mean the OS-supplied platform-dependent library, then I
> > think the answer is yes.  The names of the routines that back up
> > the intrinsics should be part of the platform ABI.
>
> I believe two relevant points are:
>
> * An implementation with kernel help (such as those for ARM,
> SH and PA GNU/Linux presently in libgcc), that is guaranteed
> to interoperate correctly with atomic instructions added in
> later subarchitectures or present in some subarchitectures,
> can be considered equivalent to a hardware instruction for most
> purposes; in particular, there is no need for programs to use
> only one such implementation and having them in libgcc is fine.
> It's only lock-based implementations that might have interoperation
> problems that need to go in libc.

Yes.

>
> * libc only needs to export these functions for types that lack
> the operations in hardware on at least some subarchitectures.
> This will mean that the libc ABI does not generally need to
> contain the 1-byte, 2-byte or 4-byte operations, but on some
> targets it will need to export functions for 8-byte operations.
> These functions will in general have target-specific definitions,
> and certainly would appear in the target-specific Versions files.

Yes.  However, we still have supported targets that are pretty weak.
"i386 < i486 < i586".  If one can link two object files compiled for
these targets into the same program, you have the synchronization
problem.

> > > * The header therefore comes with libc.
> >
> > I don't think we need a header.  These calls are directly
> > generated by the compiler, not referenced by the user.
>
> The header I am referring to is the C header <stdatomic.h> that
> C1x users wanting atomic operations should be using.

Okay, lets call this the languaged-defined compiler header.

> > > * The header never uses an inline operation when compiling for a
> > > particular subarchitecture unless the corresponding version of
> > > libc, when executing on hardware capable of executing code for
> > > that subarchitecture, will always use an atomic operation that
> > > interoperates correctly with the header.  (libc might need in
> > > some cases to determine the hardware in use at runtime.)
> >
> > I'm not quite following that.  Any any event, since I don't see
> > the need for a header, I think it is moot.
>
> Suppose you have an architecture X.  Processors A, B and C for this
> architecture do not have 8-byte atomic operations, so glibc 2.12
> provides a fallback lock-based implementation in the port to X.
> GCC 4.6, targetting X (processors A, B and C), together with the
> stdatomic.h header, generates code using the fallback functions,
> and everything works OK.

I tend to think of these not as fallback functions, but as
platform functions, which can be optimized through inlining in
some circumstances.

  Now a processor D for this architecture comes out.  All code for A,
  B and C will work on D, but D also has 8-byte atomic operations.
  GCC 4.7, with -march=D, generates code that uses these operations
  inline.  If code built with GCC 4.7 -march=D, and code built
  with GCC 4.6 or without -march=D, are used together with the
  glibc 2.12 shared library, both implementations of the atomic
  operations are now used and things don't work.

Here is where the heavy platform nature of the atomics comes in.
The installed shared glibc 2.12 on the system with a D processor
must have been built with -march=D.  If so, then all operations
share the same implementation, and everything works.

> glibc 2.13 changes the out-of-line implementation to test at
> runtime whether it is running on D, and use the new instruction
> instead of the lock-based implementation if so (probably using
> STT_GNU_IFUNC so this test is only run the first time the symbol
> is resolved).  That new glibc will now work with objects built
> with either 4.6 or 4.7.

Well, okay, but there would be less exposure to problems if -march=D
implied that the library was compiled with -march=D (or better).


> But on GNU/Linux - unlike BSDs, say - it is expected that the
> compiler, libc and kernel versions can be updated more or less
> independently, and that it should be possible to use a newer
> compiler to build code that will run with an older C library.
> So the case of GCC 4.7 with glibc 2.12 needs to work.  This means
> that code built with GCC 4.7 against the <stdatomic.h> header
> provided with glibc 2.12 must not use the 8-byte atomic instruction
> that GCC 4.7 knows how to use, because glibc 2.12 will not use
> it in the out-of-line implementation at runtime.

I am suggesting that certain highly processor-dependent routines
should be updated with the processor.  That is, I don't think the
taxonomy is quite right.

> Are you proposing to avoid this issue by saying that the
> platform ABI for GNU/Linux on an X processor is that the 8-byte
> operations must never be inlined, and so making GCC not use the
> inline operations with -march=D (for GNU/Linux - it might be
> different for another OS)?

No, I am proposing that an object compiled with -march=D should
fail to load on a system that doesn't have both a >=D processor
and a >=D library.

> That would work, but I don't think it's necessary (and these
> are operations you'd really like to use as few instructions as
> possible, so avoiding shared library overhead if a single inline
> instruction will do) if you require programs to go via the standard
> <stdatomic.h> header.

The overhead in cycles of atomic operations is often high.  With the
exception of a load, the minimum cycle count of the right instruction
sequence may well be greater than the call wrapping it, so I am
not terribly concerned about atomic operations being implemented
out of line.

> You could have libc provide <stdatomic.h> that does
>
> #include <bits/stdatomic.h>
>
> #ifndef __atomic_whatever_8
> #define __atomic_whatever_8 __builtin_atomic_whatever_8
> #endif
>
> (and then uses __atomic_whatever_8 in the implementation of the
> type-generic macro), and <bits/stdatomic.h> would in glibc 2.12
> for X do
>
> #define __atomic_whatever_8 __out_of_line_atomic_whatever_8
>
> (repeated for each type that may lack atomic instructions on X)
>
> and in 2.13, knowing what implementations are present in libc,
> it could instead do
>
> #ifndef __arch_D__
> #define __atomic_whatever_8 __out_of_line_atomic_whatever_8
> #endif
>
> and adjust that condition in future versions if there are other
> future variants, not defining __arch_D__, for which libc uses a
> hardware atomic operation.

I am suggesting something different.  I am suggesting that the
<stdatomic.h> header always generate __builtin_atomic... and that
the conversion from RTL to assembler generate either the instructions
or the call.

The target of the call should never appear in any header.  As a
consequence, it needs to be part of the platform ABI.  (I'm hoping
we mean the same by platform, but I'm beginning to doubt that.)

We also need the __builtin_atomic... operations in the intermediate
language so that the compiler can recognize the operations effect
on the memory model and (sometime later) optimize those operations.

-- 
Lawrence Crowl

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing C++1x and C1x atomics
  2009-08-14  7:14                           ` Lawrence Crowl
@ 2009-08-14  9:37                             ` Joseph S. Myers
  2009-08-14 20:52                               ` Lawrence Crowl
  0 siblings, 1 reply; 19+ messages in thread
From: Joseph S. Myers @ 2009-08-14  9:37 UTC (permalink / raw)
  To: Lawrence Crowl
  Cc: Richard Guenther, Boehm, Hans, Andrew Haley, Paolo Bonzini, gcc,
	libc-alpha

On Thu, 13 Aug 2009, Lawrence Crowl wrote:

>   Now a processor D for this architecture comes out.  All code for A,
>   B and C will work on D, but D also has 8-byte atomic operations.
>   GCC 4.7, with -march=D, generates code that uses these operations
>   inline.  If code built with GCC 4.7 -march=D, and code built
>   with GCC 4.6 or without -march=D, are used together with the
>   glibc 2.12 shared library, both implementations of the atomic
>   operations are now used and things don't work.
> 
> Here is where the heavy platform nature of the atomics comes in.
> The installed shared glibc 2.12 on the system with a D processor
> must have been built with -march=D.  If so, then all operations
> share the same implementation, and everything works.

The practicalities of GNU/Linux distribution deployment mean that you 
don't want to have many different versions of libc built for different 
processors and that you don't want to require users of code optimized for 
a new processor to have a correspondingly new or newly built libc.  This 
is why STT_GNU_IFUNC is useful; in theory you can build libc for every 
processor variant, in practice it's better to have one or very few 
binaries of it and have those dynamically use optimized versions of the 
few functions for which CPU optimization make a major difference.

> > glibc 2.13 changes the out-of-line implementation to test at
> > runtime whether it is running on D, and use the new instruction
> > instead of the lock-based implementation if so (probably using
> > STT_GNU_IFUNC so this test is only run the first time the symbol
> > is resolved).  That new glibc will now work with objects built
> > with either 4.6 or 4.7.
> 
> Well, okay, but there would be less exposure to problems if -march=D
> implied that the library was compiled with -march=D (or better).

It doesn't.  It doesn't imply then processor D had even been invented when 
the C library was built.  You can reasonably build a program with a new 
compiler, against an old libc, that uses its own dynamic choice of code 
for different CPUs, will run on a range of distributions and will use code 
good for the particular CPU it is run on; it simply needs to avoid 
inlining these particular operations, while exploiting all other new 
features of D.

> > But on GNU/Linux - unlike BSDs, say - it is expected that the
> > compiler, libc and kernel versions can be updated more or less
> > independently, and that it should be possible to use a newer
> > compiler to build code that will run with an older C library.
> > So the case of GCC 4.7 with glibc 2.12 needs to work.  This means
> > that code built with GCC 4.7 against the <stdatomic.h> header
> > provided with glibc 2.12 must not use the 8-byte atomic instruction
> > that GCC 4.7 knows how to use, because glibc 2.12 will not use
> > it in the out-of-line implementation at runtime.
> 
> I am suggesting that certain highly processor-dependent routines
> should be updated with the processor.  That is, I don't think the
> taxonomy is quite right.

I am saying it is desirable not to have to update those with the 
processor; at most, to have to update the kernel for a new processor and 
keep the same userspace distribution.  Users routinely run distributions 
predating their hardware (copying an installing from one system to 
another, or keeping the same distribution deployed across multiple 
systems); they should be able to exploit the features of the new hardware 
in their own binaries as far as possible (meaning -march=D should work 
fine, just not inline these operations) while doing so.

> > Are you proposing to avoid this issue by saying that the
> > platform ABI for GNU/Linux on an X processor is that the 8-byte
> > operations must never be inlined, and so making GCC not use the
> > inline operations with -march=D (for GNU/Linux - it might be
> > different for another OS)?
> 
> No, I am proposing that an object compiled with -march=D should
> fail to load on a system that doesn't have both a >=D processor
> and a >=D library.

Whereas I say -march=D should not cause the operations to be inlined 
unless the library (whose headers are used at compile time) is D-aware.  
Which means the library communicates to the compiler whether it is 
D-aware.  The simplest way of doing this is through macro definitions such 
as those I suggested, but it could also use pragmas to declare properties 
of the out-of-line functions it provides.  Those pragmas could even go in 
the stdc-predef.h header, implicitly preincluded, that I have proposed for 
other reasons, and thereby affect all uses of the relevant compiler 
intrinsics, if you wish to ensure that users can always use the intrinsics 
directly if they wish.  (Note that the stdc-predef.h header has not so far 
been accepted by glibc.)

/* in stdc-predef.h (or stdatomic.h if you don't like stdc-predef.h) */
#ifdef __arch_D__
#pragma GCC atomic_8byte_ok_to_inline
#endif

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing C++1x and C1x atomics
  2009-08-14  9:37                             ` Joseph S. Myers
@ 2009-08-14 20:52                               ` Lawrence Crowl
  2009-08-14 21:06                                 ` Joseph S. Myers
  0 siblings, 1 reply; 19+ messages in thread
From: Lawrence Crowl @ 2009-08-14 20:52 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Richard Guenther, Boehm, Hans, Andrew Haley, Paolo Bonzini, gcc,
	libc-alpha

On 8/13/09, Joseph S. Myers <joseph@codesourcery.com> wrote:
> On Thu, 13 Aug 2009, Lawrence Crowl wrote:
> > > Now a processor D for this architecture comes out.  All code
> > > for A, B and C will work on D, but D also has 8-byte atomic
> > > operations.  GCC 4.7, with -march=D, generates code that
> > > uses these operations inline.  If code built with GCC 4.7
> > > -march=D, and code built with GCC 4.6 or without -march=D,
> > > are used together with the glibc 2.12 shared library, both
> > > implementations of the atomic operations are now used and
> > > things don't work.
> >
> > Here is where the heavy platform nature of the atomics comes in.
> > The installed shared glibc 2.12 on the system with a D processor
> > must have been built with -march=D.  If so, then all operations
> > share the same implementation, and everything works.
>
> The practicalities of GNU/Linux distribution deployment mean that
> you don't want to have many different versions of libc built for
> different processors and that you don't want to require users
> of code optimized for a new processor to have a correspondingly
> new or newly built libc.  This is why STT_GNU_IFUNC is useful;
> in theory you can build libc for every processor variant, in
> practice it's better to have one or very few binaries of it and
> have those dynamically use optimized versions of the few functions
> for which CPU optimization make a major difference.

I was thinking of STT_GNU_IFUNC as an optimization on producing
multiple binaries.

> > > glibc 2.13 changes the out-of-line implementation to test
> > > at runtime whether it is running on D, and use the new
> > > instruction instead of the lock-based implementation if so
> > > (probably using STT_GNU_IFUNC so this test is only run the
> > > first time the symbol is resolved).  That new glibc will now
> > > work with objects built with either 4.6 or 4.7.
> >
> > Well, okay, but there would be less exposure to problems if
> > -march=D implied that the library was compiled with -march=D
> > (or better).
>
> It doesn't.  It doesn't imply then processor D had even been
> invented when the C library was built.  You can reasonably build
> a program with a new compiler, against an old libc, that uses its
> own dynamic choice of code for different CPUs, will run on a range
> of distributions and will use code good for the particular CPU
> it is run on; it simply needs to avoid inlining these particular
> operations, while exploiting all other new features of D.

The atomics need a consistency of implementation, which makes them
particularly imporant to have in the system.

> > > But on GNU/Linux - unlike BSDs, say - it is expected that the
> > > compiler, libc and kernel versions can be updated more or less
> > > independently, and that it should be possible to use a newer
> > > compiler to build code that will run with an older C library.
> > > So the case of GCC 4.7 with glibc 2.12 needs to work.  This
> > > means that code built with GCC 4.7 against the <stdatomic.h>
> > > header provided with glibc 2.12 must not use the 8-byte atomic
> > > instruction that GCC 4.7 knows how to use, because glibc 2.12
> > > will not use it in the out-of-line implementation at runtime.
> >
> > I am suggesting that certain highly processor-dependent routines
> > should be updated with the processor.  That is, I don't think
> > the taxonomy is quite right.
>
> I am saying it is desirable not to have to update those with
> the processor; at most, to have to update the kernel for
> a new processor and keep the same userspace distribution.
> Users routinely run distributions predating their hardware
> (copying an installing from one system to another, or keeping the
> same distribution deployed across multiple systems); they should
> be able to exploit the features of the new hardware in their own
> binaries as far as possible (meaning -march=D should work fine,
> just not inline these operations) while doing so.

So, if -march=D should not imply inlining of the atomic operations,
we need another option that does.  That other option in turn
must require the dynamic library use compatible implementations.
(I'd really like to see errors caught by the loader.)

> > > Are you proposing to avoid this issue by saying that the
> > > platform ABI for GNU/Linux on an X processor is that the
> > > 8-byte operations must never be inlined, and so making GCC
> > > not use the inline operations with -march=D (for GNU/Linux -
> > > it might be different for another OS)?
> >
> > No, I am proposing that an object compiled with -march=D should
> > fail to load on a system that doesn't have both a >=D processor
> > and a >=D library.
>
> Whereas I say -march=D should not cause the operations to be
> inlined unless the library (whose headers are used at compile time)
> is D-aware.  Which means the library communicates to the compiler
> whether it is D-aware.  The simplest way of doing this is through
> macro definitions such as those I suggested, but it could also
> use pragmas to declare properties of the out-of-line functions it
> provides.  Those pragmas could even go in the stdc-predef.h header,
> implicitly preincluded, that I have proposed for other reasons,
> and thereby affect all uses of the relevant compiler intrinsics,
> if you wish to ensure that users can always use the intrinsics
> directly if they wish.  (Note that the stdc-predef.h header has
> not so far been accepted by glibc.)
>
> /* in stdc-predef.h (or stdatomic.h if you don't like stdc-predef.h) */
> #ifdef __arch_D__
> #pragma GCC atomic_8byte_ok_to_inline
> #endif

I must say the header approach really worries me.  The problem is
that the library that you compile under may not be the library that
you execute under.  I shudder to think what shipping preprocessor
output from one system to another will do.

I have no objection to "don't worry about coordinating" -march=D,
but as you say, the atomics cannot use that option.  My conclusion
is that we need another option.

-- 
Lawrence Crowl

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing C++1x and C1x atomics
  2009-08-14 20:52                               ` Lawrence Crowl
@ 2009-08-14 21:06                                 ` Joseph S. Myers
  2009-08-14 21:25                                   ` Lawrence Crowl
  0 siblings, 1 reply; 19+ messages in thread
From: Joseph S. Myers @ 2009-08-14 21:06 UTC (permalink / raw)
  To: Lawrence Crowl
  Cc: Richard Guenther, Boehm, Hans, Andrew Haley, Paolo Bonzini, gcc,
	libc-alpha

On Fri, 14 Aug 2009, Lawrence Crowl wrote:

> So, if -march=D should not imply inlining of the atomic operations,
> we need another option that does.  That other option in turn
> must require the dynamic library use compatible implementations.
> (I'd really like to see errors caught by the loader.)

The loader doesn't catch you using a processor extension that adds new 
call-preserved registers when setjmp/longjmp haven't been updated to 
handle those registers (one existing case that I think unavoidably does 
require a new libc to use a feature of a new architecture).  Fortunately, 
most people producing ABIs for such processor extensions realise it's best 
to make the new registers call-clobbered so that setjmp/longjmp and 
unwinding don't need to handle them.

> > /* in stdc-predef.h (or stdatomic.h if you don't like stdc-predef.h) */
> > #ifdef __arch_D__
> > #pragma GCC atomic_8byte_ok_to_inline
> > #endif
> 
> I must say the header approach really worries me.  The problem is
> that the library that you compile under may not be the library that
> you execute under.  I shudder to think what shipping preprocessor
> output from one system to another will do.

It is required when using glibc that:

(A) Code compiled against headers from libc version X must be run with 
libc version X or later.

(B) Code compiled against headers from libc version X must be linked into 
an executable or shared library against a binary of libc version X 
(exactly).  .o and .a files are not necessarily compatible from one 
version to the next, since compiling against headers from version X means 
the object needs the versions of functions that are provided by libc 
version X, but the symbol references are only bound to particular symbol 
versions at the point at which you link with libc.so.  (It's been a long 
time since there was major breakage of this sort affecting lots of static 
libraries - I think maybe the 2.2-to-2.3 transition - but the principle 
remains that symbol versioning does not assure compatibility for .o and .a 
files, only for executables and shared libraries.)

As long as you follow (A) and each libc version knows about all the atomic 
instruction cases the previous one know about, having libc declare to the 
compiler when it's safe to inline atomic operations (with the default in 
the absence of such a declaration from libc being only to inline 
operations if they are present on all subarchitectures and there is no 
lock-based libc implementation with which to be incompatible) should work 
fine.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing C++1x and C1x atomics
  2009-08-14 21:06                                 ` Joseph S. Myers
@ 2009-08-14 21:25                                   ` Lawrence Crowl
  2009-08-15  5:53                                     ` Joseph S. Myers
  0 siblings, 1 reply; 19+ messages in thread
From: Lawrence Crowl @ 2009-08-14 21:25 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Richard Guenther, Boehm, Hans, Andrew Haley, Paolo Bonzini, gcc,
	libc-alpha

On 8/14/09, Joseph S. Myers <joseph@codesourcery.com> wrote:
> On Fri, 14 Aug 2009, Lawrence Crowl wrote:
> > So, if -march=D should not imply inlining of the atomic
> > operations, we need another option that does.  That other
> > option in turn must require the dynamic library use compatible
> > implementations.  (I'd really like to see errors caught by
> > the loader.)
>
> The loader doesn't catch you using a processor extension that
> adds new call-preserved registers when setjmp/longjmp haven't
> been updated to handle those registers (one existing case that
> I think unavoidably does require a new libc to use a feature of
> a new architecture).  Fortunately, most people producing ABIs
> for such processor extensions realise it's best to make the new
> registers call-clobbered so that setjmp/longjmp and unwinding
> don't need to handle them.
>
> > > /* in stdc-predef.h (or stdatomic.h if you don't like stdc-predef.h) */
> > > #ifdef __arch_D__
> > > #pragma GCC atomic_8byte_ok_to_inline
> > > #endif
> >
> > I must say the header approach really worries me.  The problem is
> > that the library that you compile under may not be the library
> > that you execute under.  I shudder to think what shipping
> > preprocessor output from one system to another will do.
>
> It is required when using glibc that:
>
> (A) Code compiled against headers from libc version X must be
> run with libc version X or later.
>
> (B) Code compiled against headers from libc version X must be
> linked into an executable or shared library against a binary of
> libc version X (exactly).  .o and .a files are not necessarily
> compatible from one version to the next, since compiling against
> headers from version X means the object needs the versions of
> functions that are provided by libc version X, but the symbol
> references are only bound to particular symbol versions at the
> point at which you link with libc.so.  (It's been a long time
> since there was major breakage of this sort affecting lots of
> static libraries - I think maybe the 2.2-to-2.3 transition -
> but the principle remains that symbol versioning does not assure
> compatibility for .o and .a files, only for executables and
> shared libraries.)
>
> As long as you follow (A) and each libc version knows about
> all the atomic instruction cases the previous one know about,
> having libc declare to the compiler when it's safe to inline
> atomic operations (with the default in the absence of such a
> declaration from libc being only to inline operations if they are
> present on all subarchitectures and there is no lock-based libc
> implementation with which to be incompatible) should work fine.

So, suppose I compile my program A, using libc version X, on
a processor of type D, which permits me to inline the atomic
operations.  Then suppose that I execute A on a processor of type E,
which has libc version X, but which supports fewer atomic operations
and thus requires a locking implementation.  I have met all the
versioning requirements.  What happens?

-- 
Lawrence Crowl

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing C++1x and C1x atomics
  2009-08-14 21:25                                   ` Lawrence Crowl
@ 2009-08-15  5:53                                     ` Joseph S. Myers
  2009-08-18  0:22                                       ` Lawrence Crowl
  0 siblings, 1 reply; 19+ messages in thread
From: Joseph S. Myers @ 2009-08-15  5:53 UTC (permalink / raw)
  To: Lawrence Crowl
  Cc: Richard Guenther, Boehm, Hans, Andrew Haley, Paolo Bonzini, gcc,
	libc-alpha

On Fri, 14 Aug 2009, Lawrence Crowl wrote:

> So, suppose I compile my program A, using libc version X, on
> a processor of type D, which permits me to inline the atomic
> operations.  Then suppose that I execute A on a processor of type E,
> which has libc version X, but which supports fewer atomic operations
> and thus requires a locking implementation.  I have met all the
> versioning requirements.  What happens?

(I will suppose that by "compile ... on a processor of type D" you mean 
compile with the compiler told to target D (explicitly with -march=D, or 
implicitly with -march=native, or with a compiler configured 
--with-arch=D), since the processor of the target is in general 
independent of that of the host.)

You get SIGILL when the atomic instruction that is present on D but not E 
is executed.  That a program built for one processor (D) but executed on 
another (E), whose features are not a superset of those of D, will get 
SIGILL, or execute incorrectly if E does something else on encountering 
that instruction, is not in any way specific to atomic instructions; it 
applies to every architecture with more than one supported variant.  Ways 
for the kernel or dynamic linker to detect such incompatibilities may be 
useful, but would apply to this issue in general, not specifically to 
atomic operations.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing C++1x and C1x atomics
  2009-08-15  5:53                                     ` Joseph S. Myers
@ 2009-08-18  0:22                                       ` Lawrence Crowl
  2009-08-18  1:53                                         ` Joseph S. Myers
  0 siblings, 1 reply; 19+ messages in thread
From: Lawrence Crowl @ 2009-08-18  0:22 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Richard Guenther, Boehm, Hans, Andrew Haley, Paolo Bonzini, gcc,
	libc-alpha

On 8/14/09, Joseph S. Myers <joseph@codesourcery.com> wrote:
> On Fri, 14 Aug 2009, Lawrence Crowl wrote:
> > So, suppose I compile my program A, using libc version X, on
> > a processor of type D, which permits me to inline the atomic
> > operations.  Then suppose that I execute A on a processor of
> > type E, which has libc version X, but which supports fewer
> > atomic operations and thus requires a locking implementation.
> > I have met all the versioning requirements.  What happens?
>
> (I will suppose that by "compile ... on a processor of type D" you
> mean compile with the compiler told to target D (explicitly with
> -march=D, or implicitly with -march=native, or with a compiler
> configured --with-arch=D), since the processor of the target is
> in general independent of that of the host.)

Yes.

> You get SIGILL when the atomic instruction that is present on D
> but not E is executed.  That a program built for one processor
> (D) but executed on another (E), whose features are not a superset
> of those of D, will get SIGILL, or execute incorrectly if E does
> something else on encountering that instruction, is not in any way
> specific to atomic instructions; it applies to every architecture
> with more than one supported variant.

Yuck.

> Ways for the kernel or dynamic linker to detect such
> incompatibilities may be useful, but would apply to this issue
> in general, not specifically to atomic operations.

The difference with the atomics is that if an application uses the
D instructions, we also need the dynamic library to also use the
D instructions (on a D or later processor).  How do we ensure that?

If we cannot, then I am concerned that we would be able to inline
no atomic operations without dropping support for the 80386 as
a subset of the later processors.  The same situation applies to
other processor families.

-- 
Lawrence Crowl

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing C++1x and C1x atomics
  2009-08-18  0:22                                       ` Lawrence Crowl
@ 2009-08-18  1:53                                         ` Joseph S. Myers
  2009-08-19 23:51                                           ` Lawrence Crowl
  0 siblings, 1 reply; 19+ messages in thread
From: Joseph S. Myers @ 2009-08-18  1:53 UTC (permalink / raw)
  To: Lawrence Crowl
  Cc: Richard Guenther, Boehm, Hans, Andrew Haley, Paolo Bonzini, gcc,
	libc-alpha

On Mon, 17 Aug 2009, Lawrence Crowl wrote:

> > Ways for the kernel or dynamic linker to detect such
> > incompatibilities may be useful, but would apply to this issue
> > in general, not specifically to atomic operations.
> 
> The difference with the atomics is that if an application uses the
> D instructions, we also need the dynamic library to also use the
> D instructions (on a D or later processor).  How do we ensure that?

I have suggested that the library inform the compiler, via its headers 
(whatever the details - pragmas, macros, etc. - and whether or not the 
header in question is implicitly preincluded) of whether the library will 
be using these instructions, with the compiler making safe assumptions if 
the library does not give it this information.  (The information passed 
from the library to the compiler would be an assertion that that library 
version, and all later versions, when used on a D or later processor, will 
always use the D instructions or later instructions that safely 
interoperate with them.)

> If we cannot, then I am concerned that we would be able to inline
> no atomic operations without dropping support for the 80386 as
> a subset of the later processors.  The same situation applies to
> other processor families.

80386 support is already dropped (effectively) in glibc, and has been for 
quite some time; you have to use -march=i486 or later to build glibc for 
IA32, or it will fail to link with missing atomic operations.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing C++1x and C1x atomics
  2009-08-18  1:53                                         ` Joseph S. Myers
@ 2009-08-19 23:51                                           ` Lawrence Crowl
  2009-08-20  0:48                                             ` Joseph S. Myers
  2009-08-21  2:30                                             ` Implementing C++1x and C1x atomics (really an aside on SFENCE) Boehm, Hans
  0 siblings, 2 replies; 19+ messages in thread
From: Lawrence Crowl @ 2009-08-19 23:51 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Richard Guenther, Boehm, Hans, Andrew Haley, Paolo Bonzini, gcc,
	libc-alpha

I am quoting from several different messages.

On 8/17/09, Joseph S. Myers <joseph@codesourcery.com> wrote:
> (A) Code compiled against headers from libc version X must be run
> with libc version X or later.

What is the symptom of failing to meet this constraint?

> (B) Code compiled against headers from libc version X must be linked
> into an executable or shared library against a binary of libc version X
> (exactly).

Note, however, that gcc is used on some systems that do not have the
second constraint.

> > So, suppose I compile my program A, using libc version X, on
> > a processor of type D, which permits me to inline the atomic
> > operations.  Then suppose that I execute A on a processor of
> > type E, which has libc version X, but which supports fewer
> > atomic operations and thus requires a locking implementation.
> > I have met all the versioning requirements.  What happens?
>
> You get SIGILL when the atomic instruction that is present on D
> but not E is executed.  That a program built for one processor
> (D) but executed on another (E), whose features are not a superset
> of those of D, will get SIGILL, or execute incorrectly if E does
> something else on encountering that instruction, is not in any way
> specific to atomic instructions; it applies to every architecture
> with more than one supported variant.

There is a very similar scenario, that is problematic.  Suppose that
the application A uses only instructions from E.  It will execute every
instruction correctly, but because the application uses a non-locking
implementation and the library uses a locking implementation, there
will be synchronization failure.

Such failures will be very difficult to detect.  The problem is stronger
than running an instruction on an earlier processor.

I suppose that atomic types and any linker warnings could be developed
separately for quite some time.

> > > Ways for the kernel or dynamic linker to detect such
> > > incompatibilities may be useful, but would apply to this issue
> > > in general, not specifically to atomic operations.
> >
> > The difference with the atomics is that if an application uses the
> > D instructions, we also need the dynamic library to also use the
> > D instructions (on a D or later processor).  How do we ensure that?
>
> I have suggested that the library inform the compiler, via its headers
> (whatever the details - pragmas, macros, etc. - and whether or not
> the header in question is implicitly preincluded) of whether the
> library will be using these instructions, with the compiler making
> safe assumptions if the library does not give it this information.
> (The information passed from the library to the compiler would be an
> assertion that that library version, and all later versions, when
> used on a D or later processor, will always use the D instructions
> or later instructions that safely interoperate with them.)

Is this header provided with the compiler or the operating system?
In part, I am concerned about gcc on non-Linux systems.

The information from the library header could be as simple as which
level of architecture that the library uses to implement atomics.
This requires a fairly strong mapping between the architecture and
which types are implemented in a lock-free manner.

> > If we cannot, then I am concerned that we would be able to inline
> > no atomic operations without dropping support for the 80386 as
> > a subset of the later processors.  The same situation applies to
> > other processor families.
>
> 80386 support is already dropped (effectively) in glibc, and has been
> for quite some time; you have to use -march=i486 or later to build
> glibc for IA32, or it will fail to link with missing atomic operations.

The problem is that gcc does support 80386.  It also supports other
processors that have less-than-complete support for concurrency.  Just in
the x86 line, we get some additional capability in many new layers.

  8086        LOCK XCHG
  80486       CMPXCHG XADD
  Pentium     CMPXCHG8B
  SSE         SFENCE
  SSE2        MFENCE
  late AMD64  CMPXCHG16B

So, we do not get to ignore the problem as a relic of 80386.

Hm.  We also need that mapping between architecture and type, which
probably needs to indicate a lock-free implementation for each operation.
Alexander Terekhov did a large chunk of that.

-- 
Lawrence Crowl

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing C++1x and C1x atomics
  2009-08-19 23:51                                           ` Lawrence Crowl
@ 2009-08-20  0:48                                             ` Joseph S. Myers
  2009-08-20  8:16                                               ` Lawrence Crowl
  2009-08-21  2:30                                             ` Implementing C++1x and C1x atomics (really an aside on SFENCE) Boehm, Hans
  1 sibling, 1 reply; 19+ messages in thread
From: Joseph S. Myers @ 2009-08-20  0:48 UTC (permalink / raw)
  To: Lawrence Crowl
  Cc: Richard Guenther, Boehm, Hans, Andrew Haley, Paolo Bonzini, gcc,
	libc-alpha

On Wed, 19 Aug 2009, Lawrence Crowl wrote:

> I am quoting from several different messages.
> 
> On 8/17/09, Joseph S. Myers <joseph@codesourcery.com> wrote:
> > (A) Code compiled against headers from libc version X must be run
> > with libc version X or later.
> 
> What is the symptom of failing to meet this constraint?

Depending on what features the code uses and when you pass it to the 
static linker, there may be errors from the static linker or the dynamic 
linker, misbehavior at runtime (e.g. if you pass a new flag to a function 
that doesn't know about that flag in the old libc) or no symptoms.

> > > So, suppose I compile my program A, using libc version X, on
> > > a processor of type D, which permits me to inline the atomic
> > > operations.  Then suppose that I execute A on a processor of
> > > type E, which has libc version X, but which supports fewer
> > > atomic operations and thus requires a locking implementation.
> > > I have met all the versioning requirements.  What happens?
> >
> > You get SIGILL when the atomic instruction that is present on D
> > but not E is executed.  That a program built for one processor
> > (D) but executed on another (E), whose features are not a superset
> > of those of D, will get SIGILL, or execute incorrectly if E does
> > something else on encountering that instruction, is not in any way
> > specific to atomic instructions; it applies to every architecture
> > with more than one supported variant.
> 
> There is a very similar scenario, that is problematic.  Suppose that
> the application A uses only instructions from E.  It will execute every
> instruction correctly, but because the application uses a non-locking
> implementation and the library uses a locking implementation, there
> will be synchronization failure.

If you built with -march=E, the library should not have told the compiler 
that it could inline the atomic operations.

> > I have suggested that the library inform the compiler, via its headers
> > (whatever the details - pragmas, macros, etc. - and whether or not
> > the header in question is implicitly preincluded) of whether the
> > library will be using these instructions, with the compiler making
> > safe assumptions if the library does not give it this information.
> > (The information passed from the library to the compiler would be an
> > assertion that that library version, and all later versions, when
> > used on a D or later processor, will always use the D instructions
> > or later instructions that safely interoperate with them.)
> 
> Is this header provided with the compiler or the operating system?
> In part, I am concerned about gcc on non-Linux systems.

The header would be provided by the provider of the library which includes 
any out-of-line atomic functions that may be lock-based, which we had 
previously concluded would be libc (rather than libgcc) on Linux systems.  
(We also concluded that the set of such functions and their names would be 
considered part of the platform ABI for that system.)  If the OS provides 
a header that is not suitable for GCC then it may need fixing with 
fixincludes.

This works best for C++ if we do not try to support the C++ atomics (or at 
least those that might need lock-based implementations) in the absence of 
libc support for the C atomics; certainly it seems lock-based 
implementations in libstdc++ must be avoided to avoid conflict with such 
implementations in libc.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing C++1x and C1x atomics
  2009-08-20  0:48                                             ` Joseph S. Myers
@ 2009-08-20  8:16                                               ` Lawrence Crowl
  0 siblings, 0 replies; 19+ messages in thread
From: Lawrence Crowl @ 2009-08-20  8:16 UTC (permalink / raw)
  To: Joseph S. Myers
  Cc: Richard Guenther, Boehm, Hans, Andrew Haley, Paolo Bonzini, gcc,
	libc-alpha

On 8/19/09, Joseph S. Myers <joseph@codesourcery.com> wrote:
> On Wed, 19 Aug 2009, Lawrence Crowl wrote:
>
>  > I am quoting from several different messages.
>  >
>  > On 8/17/09, Joseph S. Myers <joseph@codesourcery.com> wrote:
>  > > (A) Code compiled against headers from libc version X must be run
>  > > with libc version X or later.
>  >
>  > What is the symptom of failing to meet this constraint?
>
>
> Depending on what features the code uses and when you pass it to the
>  static linker, there may be errors from the static linker or the dynamic
>  linker, misbehavior at runtime (e.g. if you pass a new flag to a function
>  that doesn't know about that flag in the old libc) or no symptoms.
>
>
>  > > > So, suppose I compile my program A, using libc version X, on
>  > > > a processor of type D, which permits me to inline the atomic
>  > > > operations.  Then suppose that I execute A on a processor of
>  > > > type E, which has libc version X, but which supports fewer
>  > > > atomic operations and thus requires a locking implementation.
>  > > > I have met all the versioning requirements.  What happens?
>  > >
>  > > You get SIGILL when the atomic instruction that is present on D
>  > > but not E is executed.  That a program built for one processor
>  > > (D) but executed on another (E), whose features are not a superset
>  > > of those of D, will get SIGILL, or execute incorrectly if E does
>  > > something else on encountering that instruction, is not in any way
>  > > specific to atomic instructions; it applies to every architecture
>  > > with more than one supported variant.
>  >
>  > There is a very similar scenario, that is problematic.  Suppose that
>  > the application A uses only instructions from E.  It will execute every
>  > instruction correctly, but because the application uses a non-locking
>  > implementation and the library uses a locking implementation, there
>  > will be synchronization failure.
>
>
> If you built with -march=E, the library should not have told the compiler
>  that it could inline the atomic operations.
>
>
> > > I have suggested that the library inform the compiler, via its headers
> > > (whatever the details - pragmas, macros, etc. - and whether or not
> > > the header in question is implicitly preincluded) of whether the
> > > library will be using these instructions, with the compiler making
> > > safe assumptions if the library does not give it this information.
> > > (The information passed from the library to the compiler would be an
> > > assertion that that library version, and all later versions, when
> > > used on a D or later processor, will always use the D instructions
> > > or later instructions that safely interoperate with them.)
> >
> > Is this header provided with the compiler or the operating system?
> > In part, I am concerned about gcc on non-Linux systems.
>
> The header would be provided by the provider of the library which
> includes any out-of-line atomic functions that may be lock-based,
> which we had previously concluded would be libc (rather than
> libgcc) on Linux systems.  (We also concluded that the set of
> such functions and their names would be considered part of the
> platform ABI for that system.)  If the OS provides a header that
> is not suitable for GCC then it may need fixing with fixincludes.

Okay.  Just to be clear, any lock-based implementation should exist
only in the library, never inlined.

> This works best for C++ if we do not try to support the C++ atomics
> (or at least those that might need lock-based implementations)
> in the absence of libc support for the C atomics; certainly it
> seems lock-based implementations in libstdc++ must be avoided to
> avoid conflict with such implementations in libc.

Agreed 100%.  The intent was that all the C++-specific stuff be
implemented in terms of the C-specific stuff, independent of the
platform.

-- 
Lawrence Crowl

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: Implementing C++1x and C1x atomics (really an aside on SFENCE)
  2009-08-19 23:51                                           ` Lawrence Crowl
  2009-08-20  0:48                                             ` Joseph S. Myers
@ 2009-08-21  2:30                                             ` Boehm, Hans
  2009-09-09 22:52                                               ` Lawrence Crowl
  1 sibling, 1 reply; 19+ messages in thread
From: Boehm, Hans @ 2009-08-21  2:30 UTC (permalink / raw)
  To: Lawrence Crowl, Joseph S. Myers
  Cc: Richard Guenther, Andrew Haley, Paolo Bonzini, gcc, libc-alpha

> -----Original Message-----
> From: Lawrence Crowl [mailto:crowl@google.com] 
> The problem is that gcc does support 80386.  It also supports 
> other processors that have less-than-complete support for 
> concurrency.  Just in the x86 line, we get some additional 
> capability in many new layers.
> 
>   8086        LOCK XCHG
>   80486       CMPXCHG XADD
>   Pentium     CMPXCHG8B
>   SSE         SFENCE
Aside to an interesting discussion:

I believe the current conclusion is that SFENCE should be ignored, except for library or compiler-generated code that uses non-temporal/coalescing stores, which I believe are also a recent addition.  Normal stores are ordered anyway, so it's not needed.  Thus you are faced with a choice of either (a) implementing fences on the assumption that ordinary code may contain non-temporal stores, or (b) making sure that non-temporal stores are always surrounded by the appropriate fences.  This is really an important ABI issue, but it's something that I believe no ABI currently specifies.  Our conclusion in earlier discussions among a different group of people was that (b) made more sense, since non-temporal stores of various kinds seemed to be largely confined to a few library routines.

It would be really nice if everyone somehow managed to agree on this.  Inconsistency here, probably even between Windows and Linux, seems likely to result in really subtle bugs.

Note that this also affects correctness of spinlock implementations, not just atomics.  A simple store to release a lock doesn't work if the critical section may contain unfenced non-temporal stores.

Hans

>   SSE2        MFENCE
>   late AMD64  CMPXCHG16B
> 
> So, we do not get to ignore the problem as a relic of 80386.
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Implementing C++1x and C1x atomics (really an aside on SFENCE)
  2009-08-21  2:30                                             ` Implementing C++1x and C1x atomics (really an aside on SFENCE) Boehm, Hans
@ 2009-09-09 22:52                                               ` Lawrence Crowl
  2009-09-09 23:42                                                 ` Boehm, Hans
  0 siblings, 1 reply; 19+ messages in thread
From: Lawrence Crowl @ 2009-09-09 22:52 UTC (permalink / raw)
  To: Boehm, Hans
  Cc: Joseph S. Myers, Richard Guenther, Andrew Haley, Paolo Bonzini,
	gcc, libc-alpha

On 8/20/09, Boehm, Hans <hans.boehm@hp.com> wrote:
> > -----Original Message-----
> > From: Lawrence Crowl [mailto:crowl@google.com]
> > The problem is that gcc does support 80386.  It also supports
> > other processors that have less-than-complete support for
> > concurrency.  Just in the x86 line, we get some additional
> > capability in many new layers.
> >
> >   8086        LOCK XCHG
> >   80486       CMPXCHG XADD
> >   Pentium     CMPXCHG8B
> >   SSE         SFENCE
>
> Aside to an interesting discussion:
>
> I believe the current conclusion is that SFENCE should be ignored,
> except for library or compiler-generated code that uses
> non-temporal/coalescing stores, which I believe are also a recent
> addition.  Normal stores are ordered anyway, so it's not needed.
> Thus you are faced with a choice of either (a) implementing fences
> on the assumption that ordinary code may contain non-temporal stores,
> or (b) making sure that non-temporal stores are always surrounded by
> the appropriate fences.  This is really an important ABI issue, but
> it's something that I believe no ABI currently specifies.  Our
> conclusion in earlier discussions among a different group of people
> was that (b) made more sense, since non-temporal stores of various
> kinds seemed to be largely confined to a few library routines.

Hm.  I would expect that given the C++0x memory model, compilers could
be much more aggressive about using non-temporal stores, potentially
improving performance substantially.  That is, it may be better to
accept a slightly less efficient ABI for today's compilers to gain a
more efficient ABI for tomorrow's compilers.

> It would be really nice if everyone somehow managed to agree on this.
> Inconsistency here, probably even between Windows and Linux, seems
> likely to result in really subtle bugs.
>
> Note that this also affects correctness of spinlock implementations,
> not just atomics.  A simple store to release a lock doesn't work if
> the critical section may contain unfenced non-temporal stores.

Yes, but the spinning acquire doesn't require the fence, only the the
release.  So, is this additional instruction a performance problem?

>
> >   SSE2        MFENCE
> >   late AMD64  CMPXCHG16B
> >
> > So, we do not get to ignore the problem as a relic of 80386.


This email seems to have gotten side-tracked by my filters.  Sorry
for the delay.

-- 
Lawrence Crowl

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: Implementing C++1x and C1x atomics (really an aside on SFENCE)
  2009-09-09 22:52                                               ` Lawrence Crowl
@ 2009-09-09 23:42                                                 ` Boehm, Hans
  0 siblings, 0 replies; 19+ messages in thread
From: Boehm, Hans @ 2009-09-09 23:42 UTC (permalink / raw)
  To: Lawrence Crowl
  Cc: Joseph S. Myers, Richard Guenther, Andrew Haley, Paolo Bonzini,
	gcc, libc-alpha

> From: Lawrence Crowl [mailto:crowl@google.com] 
> 
> On 8/20/09, Boehm, Hans <hans.boehm@hp.com> wrote:
> > > -----Original Message-----
> > > From: Lawrence Crowl [mailto:crowl@google.com] The 
> problem is that 
> > > gcc does support 80386.  It also supports other 
> processors that have 
> > > less-than-complete support for concurrency.  Just in the 
> x86 line, 
> > > we get some additional capability in many new layers.
> > >
> > >   8086        LOCK XCHG
> > >   80486       CMPXCHG XADD
> > >   Pentium     CMPXCHG8B
> > >   SSE         SFENCE
> >
> > Aside to an interesting discussion:
> >
> > I believe the current conclusion is that SFENCE should be ignored, 
> > except for library or compiler-generated code that uses 
> > non-temporal/coalescing stores, which I believe are also a recent 
> > addition.  Normal stores are ordered anyway, so it's not needed.
> > Thus you are faced with a choice of either (a) implementing 
> fences on 
> > the assumption that ordinary code may contain non-temporal 
> stores, or 
> > (b) making sure that non-temporal stores are always 
> surrounded by the 
> > appropriate fences.  This is really an important ABI issue, 
> but it's 
> > something that I believe no ABI currently specifies.  Our 
> conclusion 
> > in earlier discussions among a different group of people 
> was that (b) 
> > made more sense, since non-temporal stores of various kinds 
> seemed to 
> > be largely confined to a few library routines.
> 
> Hm.  I would expect that given the C++0x memory model, 
> compilers could be much more aggressive about using 
> non-temporal stores, potentially improving performance 
> substantially.  That is, it may be better to accept a 
> slightly less efficient ABI for today's compilers to gain a 
> more efficient ABI for tomorrow's compilers.
> 
> > It would be really nice if everyone somehow managed to 
> agree on this.
> > Inconsistency here, probably even between Windows and Linux, seems 
> > likely to result in really subtle bugs.
> >
> > Note that this also affects correctness of spinlock 
> implementations, 
> > not just atomics.  A simple store to release a lock doesn't work if 
> > the critical section may contain unfenced non-temporal stores.
> 
> Yes, but the spinning acquire doesn't require the fence, only 
> the the release.  So, is this additional instruction a 
> performance problem?
> 
I haven't looked at this terribly systematically.  I do know that in Pentium 4 days, sfence was tremendously expensive (basically equivalent to mfence or cmpxchg, i.e. 100+ cycles), even in contexts in which it was a no-op.  Thus ABI convention (a) roughly doubles the (already very high) cost of an uncontended spin-lock on a Pentium 4.  I suspect that got better on later implementations, but I'm not sure by how much.

I think the only nontemporal stores on X86 are vector instructions.  I would guess that for many applications neither these nor spin-lock times matter a lot, and for most of the rest, these vector instructions won't make up for the cost of doubling spin-lock execution times.  If you do manage to automatically generate non-temporal stores at all, you will usually generate a bunch of them between potential synchronization operations, so that you can amortize the sfence.  As I recall, we did look briefly during earlier discussions, and didn't find them used much even in hand-crafted libc code.

But this is all hand-waving and guessing.  Certainly real measurements would be much better.

The most important issue of course is that we need to stick to one convention or the other.  Currently a lot of code seems to assume that an X86 spin lock can be released with a simple store, so invalidating that would be tricky, especially since sfence was a fairly recent introduction.

Hans

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2009-09-09 23:42 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <4A82E93B.5010504@redhat.com>
     [not found] ` <Pine.LNX.4.64.0908121648100.15254@digraph.polyomino.org.uk>
     [not found]   ` <4A82F34B.2080404@redhat.com>
     [not found]     ` <4A82F47A.7060708@gnu.org>
     [not found]       ` <4A82F5C0.5000300@redhat.com>
     [not found]         ` <4A82F88D.7030708@redhat.com>
     [not found]           ` <238A96A773B3934685A7269CC8A8D042577A01E71A@GVW0436EXB.americas.hpqcorp.net>
     [not found]             ` <238A96A773B3934685A7269CC8A8D042577A01E754@GVW0436EXB.americas.hpqcorp.net>
     [not found]               ` <84fc9c000908121127o6588fe52u581fc62bfb8cba9e@mail.gmail.com>
2009-08-12 21:07                 ` Implementing C++1x and C1x atomics Joseph S. Myers
2009-08-13  8:44                   ` Lawrence Crowl
2009-08-13 13:23                     ` Joseph S. Myers
2009-08-14  0:03                       ` Lawrence Crowl
2009-08-14  2:31                         ` Joseph S. Myers
2009-08-14  7:14                           ` Lawrence Crowl
2009-08-14  9:37                             ` Joseph S. Myers
2009-08-14 20:52                               ` Lawrence Crowl
2009-08-14 21:06                                 ` Joseph S. Myers
2009-08-14 21:25                                   ` Lawrence Crowl
2009-08-15  5:53                                     ` Joseph S. Myers
2009-08-18  0:22                                       ` Lawrence Crowl
2009-08-18  1:53                                         ` Joseph S. Myers
2009-08-19 23:51                                           ` Lawrence Crowl
2009-08-20  0:48                                             ` Joseph S. Myers
2009-08-20  8:16                                               ` Lawrence Crowl
2009-08-21  2:30                                             ` Implementing C++1x and C1x atomics (really an aside on SFENCE) Boehm, Hans
2009-09-09 22:52                                               ` Lawrence Crowl
2009-09-09 23:42                                                 ` Boehm, Hans

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).