This is a patch implementing builtins for an atomic exchange with full, acquire, and release memory barrier semantics. It is similar to __sync_lock_test_and_set(), but the target does not have the option of implementing a reduced functionality of only implementing a store of 1. Also, unlike __sync_lock_test_and_set(), we have all three memory barrier variants. The compiler will fall back to a full barrier if the user requests an acquire/release and it is not available in the target. Also, if no variant is available, we will fall back to a compare and swap loop with a full barrier at the end. The real reason for this patch is to implement atomic stores in the C++ runtime library, which can currently incorrectly move prior stores past an atomic store, thus invalidating the happens-before promise for the sequentially consistent model. I am attaching the corresponding patch to libstdc++ to show how I intend to use the builtin. This is not an official submission for the C++ library bits, as I have not yet fully tested the library. I will do so separately. In a followup patch I will be implementing acq/rel/full variants for all the __sync_* builtins which we can use for the atomic loads and for some of the OpenMP atomics Jakub has been working on. Oh yeah, I would gladly accept patterns/patches for other architectures :). Tested on x86-64 Linux. OK for mainline?