public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/95750] New: [x86] Use dummy atomic insn instead of mfence in __atomic_thread_fence(seq_cst)
@ 2020-06-18 19:32 andysem at mail dot ru
  2020-06-18 19:36 ` [Bug target/95750] " pinskia at gcc dot gnu.org
                   ` (16 more replies)
  0 siblings, 17 replies; 18+ messages in thread
From: andysem at mail dot ru @ 2020-06-18 19:32 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95750

            Bug ID: 95750
           Summary: [x86] Use dummy atomic insn instead of mfence in
                    __atomic_thread_fence(seq_cst)
           Product: gcc
           Version: 10.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: andysem at mail dot ru
  Target Milestone: ---

Currently, __atomic_thread_fence(seq_cst) on x86 and x86-64 generates mfence
instruction. A dummy atomic instruction (a lock-prefixed instruction or xchg
with a memory operand) would provide the same sequential consistency guarantees
while being more efficient on most current CPUs. The mfence instruction
additionally orders non-temporal stores, which is not relevant for atomic
operations and are not ordered by seq_cst atomic operations anyway.

Regarding performance, some data is available in Agner Fog's instruction
tables:

https://www.agner.org/optimize/

Also, there is this article:

https://shipilev.net/blog/2014/on-the-fence-with-dependencies/

TL;DR: There is benefit on every CPU except Atom; on Atom there is no
difference.

Regarding the dummy instruction and target memory location, here are some
considerations:

- The lock-prefixed instruction should preferably not alter flags or registers
and should require minimum number of registers.
- The memory location should not be shared with other threads.
- The memory location should likely be in cache.
- The memory location should not alias existing data on the stack, so that we
don't introduce a false data dependency on previous or subsequent instructions.

Based on the above, a good candidate is "lock not" on a dummy variable on the
top of the stack. Note that the variable would be accessible through esp/rsp,
it is likely to be in hot memory and is likely thread-private.

I've implemented this optimization in Boost.Atomic, and a similar optimization
is done in MSVC:

https://github.com/microsoft/STL/pull/740

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-12-10  9:42 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-18 19:32 [Bug target/95750] New: [x86] Use dummy atomic insn instead of mfence in __atomic_thread_fence(seq_cst) andysem at mail dot ru
2020-06-18 19:36 ` [Bug target/95750] " pinskia at gcc dot gnu.org
2020-06-18 19:39 ` andysem at mail dot ru
2020-06-18 21:08 ` ubizjak at gmail dot com
2020-06-18 21:16 ` ubizjak at gmail dot com
2020-06-18 22:01 ` andysem at mail dot ru
2020-06-18 22:04 ` andysem at mail dot ru
2020-06-19  9:14 ` ubizjak at gmail dot com
2020-06-19  9:35 ` jakub at gcc dot gnu.org
2020-06-19  9:49 ` ubizjak at gmail dot com
2020-06-19 12:47 ` ubizjak at gmail dot com
2020-07-20  9:55 ` ubizjak at gmail dot com
2020-07-20 18:37 ` cvs-commit at gcc dot gnu.org
2020-07-21 18:22 ` cvs-commit at gcc dot gnu.org
2020-07-23 17:22 ` josephcsible at gmail dot com
2020-07-23 20:42 ` ubizjak at gmail dot com
2020-07-24 14:00 ` cvs-commit at gcc dot gnu.org
2020-12-10  9:42 ` ubizjak at gmail dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).