I've tried to implement a software transactional memory algorithm The characteristics are: * Single writer, multiple readers. * Two copies of the data. * A 64-bit version counter tracks modifications. The lowest bit of the counter tells the reader which copy of the data to read. * The writer increments the counter by 2, modifies the STM-protected data, and increments the counter by 1. * The reader loads the counter, reads the STM-protected data, loads the counter for a second time, and retries if the counter does not match. I've attached a model implementation. The glibc implementation has a wrapper around the counter updates to support 32-bit implementations as well. In both implementations, the writer uses release MO stores for the version update, and the reader uses acquire MO loads. The stores and loads of the STM data itself are unordered (not even volatile). It turns out that this does not work: In the reader, loads of the STM-protected data can be reordered past the final acquire MO load of the STM version. As a result, the reader can observe incoherent data. In practice, I only saw this on powerpc64le, where the *compiler* performed the reordering: _dl_find_object miscompilation on powerpc64le Emprically, my stm.c model does not exhibit this behavior. To fix this, I think it is sufficient to add a release fence just before the second version load in the reader. However, from a C memory model perspective, I don't quite see what this fence would synchronize with. 8-/ And of course once there is one concurrency bug, there might be others as well. Do I need to change the writer to use acquire-release MO for the version updates? I think there should be a canned recipe for this scenario (single writer, two data copies), but I couldn't find one. Thanks, Florian