public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Jonathan Wakely <jwakely.gcc@gmail.com>,
	Peter Zijlstra <peterz@infradead.org>
Cc: Olivier Dion <odion@efficios.com>,
	rnk@google.com, Alan Stern <stern@rowland.harvard.edu>,
	Andrea Parri <parri.andrea@gmail.com>,
	Will Deacon <will@kernel.org>, Boqun Feng <boqun.feng@gmail.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	David Howells <dhowells@redhat.com>,
	Jade Alglave <j.alglave@ucl.ac.uk>,
	Luc Maranget <luc.maranget@inria.fr>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Nathan Chancellor <nathan@kernel.org>,
	Nick Desaulniers <ndesaulniers@google.com>,
	Tom Rix <trix@redhat.com>,
	linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org,
	gcc@gcc.gnu.org, llvm@lists.linux.dev
Subject: Re: [RFC] Bridging the gap between the Linux Kernel Memory Consistency Model (LKMM) and C11/C++11 atomics
Date: Fri, 7 Jul 2023 11:31:46 -0400	[thread overview]
Message-ID: <ecf1b725-1351-3460-4a99-7a03f15af152@efficios.com> (raw)
In-Reply-To: <CAH6eHdQQWO2AYQRXnAATt6nvcyDjKj-_5Ktt2ze6F158hBon=Q@mail.gmail.com>

On 7/4/23 06:23, Jonathan Wakely wrote:
> On Tue, 4 Jul 2023 at 10:47, Peter Zijlstra wrote:
>>
>> On Mon, Jul 03, 2023 at 03:20:31PM -0400, Olivier Dion wrote:
>>
>>>    int x = 0;
>>>    int y = 0;
>>>    int r0, r1;
>>>
>>>    int dummy;
>>>
>>>    void t0(void)
>>>    {
>>>            __atomic_store_n(&x, 1, __ATOMIC_RELAXED);
>>>
>>>            __atomic_exchange_n(&dummy, 1, __ATOMIC_SEQ_CST);
>>>            __atomic_thread_fence(__ATOMIC_SEQ_CST);
>>>
>>>            r0 = __atomic_load_n(&y, __ATOMIC_RELAXED);
>>>    }
>>>
>>>    void t1(void)
>>>    {
>>>            __atomic_store_n(&y, 1, __ATOMIC_RELAXED);
>>>            __atomic_thread_fence(__ATOMIC_SEQ_CST);
>>>            r1 = __atomic_load_n(&x, __ATOMIC_RELAXED);
>>>    }
>>>
>>>    // BUG_ON(r0 == 0 && r1 == 0)
>>>
>>> On x86-64 (gcc 13.1 -O2) we get:
>>>
>>>    t0():
>>>            movl    $1, x(%rip)
>>>            movl    $1, %eax
>>>            xchgl   dummy(%rip), %eax
>>>            lock orq $0, (%rsp)       ;; Redundant with previous exchange.
>>>            movl    y(%rip), %eax
>>>            movl    %eax, r0(%rip)
>>>            ret
>>>    t1():
>>>            movl    $1, y(%rip)
>>>            lock orq $0, (%rsp)
>>>            movl    x(%rip), %eax
>>>            movl    %eax, r1(%rip)
>>>            ret
>>
>> So I would expect the compilers to do better here. It should know those
>> __atomic_thread_fence() thingies are superfluous and simply not emit
>> them. This could even be done as a peephole pass later, where it sees
>> consecutive atomic ops and the second being a no-op.
> 
> Right, I don't see why we need a whole set of new built-ins that say
> "this fence isn't needed if the adjacent atomic op already implies a
> fence". If the adjacent atomic op already implies a fence for a given
> ISA, then the compiler should already be able to elide the explicit
> fence.
> 
> So just write your code with the explicit fence, and rely on the
> compiler to optimize it properly. Admittedly, today's compilers don't
> do that optimization well, but they also don't support your proposed
> built-ins, so you're going to have to wait for compilers to make
> improvements either way.

Emitting the redundant fences is the plan we have for liburcu.  The
current situation unfortunately requires users to choose between
generation of inefficient code with C11 or implement their own inline
assembler until the compilers catch up.

> 
> https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html
> discusses that compilers could (and should) optimize around atomics
> better.

Our understanding of the C11/C++11 memory model is that it aims at
defining the weakest possible guarantees for each ordering to be as
efficient as possible on weakly ordered architectures.  However, when
writing portable code in practice, the C11/C++11 memory model force the
programmer to insert memory fences which are redundant on strongly
ordered architectures.

We want something that can apply across procedures from different
modules: e.g. a mutex lock operation (glibc) has an acquire semantic
using a RMW operation that the caller could promote to a full fence.
The peephole optimizations cannot do this because they focus on a single
basic block.  PRE can apply across procedures, but would rely on LTO and
possibly function annotation across modules.  I am not aware of any
progress in that research field in the past 6 years. [1-2]

The new atomic builtins we propose allow the user to better express its
intent to the compiler, allowing for better code generation.  Therefore,
reducing the number of emitted redundant fences, without having to rely on
optimizations.

It should be noted that the builtins extensions we propose are not
entirely free.  Here are our perceived downsides of introducing those
APIs:

- They add complexity to the atomic builtins API.

- They add constraints which need to be taken into account for future
   architecture-specific backend optimizations, as an example the (broken)
   xchg RELEASE | RELAXED -> store on x86 (Clang) [3].

   If an atomic op class (e.g. rmw) can be optimized to a weaker
   instruction by the architecture backend, then the emission of a
   before/after-fence associated with this class of atomic op, must be
   pessimistic and assume the weakest instruction pattern which can
   be generated.

There are optimizations of atomics and redundant fences in Clang.  The
redundant fences optimizations appear to be limited to a peephole, which
does not appear to leverage the fact that lock-prefixed atomic
operations act as implicit fences on x86.  Perhaps this could be a
low-hanging fruit for optimization.

We have not observed any similar optimizations in gcc as of today, which
appears to be a concern for many users. [4-7]

Thanks,

Mathieu

[1] https://dl.acm.org/doi/10.1145/3033019.3033021
[2] https://reviews.llvm.org/D5758
[3] https://github.com/llvm/llvm-project/issues/60418
[4] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86056
[5] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68622
[6] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86072
[7] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63273

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


  reply	other threads:[~2023-07-07 15:31 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-03 19:20 Olivier Dion
2023-07-03 20:27 ` Alan Stern
2023-07-04 17:19   ` Olivier Dion
2023-07-04 20:25     ` Alan Stern
2023-07-04 21:25       ` Paul E. McKenney
2023-07-06 16:37       ` Olivier Dion
2023-07-04  9:46 ` Peter Zijlstra
2023-07-04 10:23   ` Jonathan Wakely
2023-07-07 15:31     ` Mathieu Desnoyers [this message]
2023-07-07 14:04   ` Olivier Dion
2023-07-07 15:45     ` Peter Zijlstra
2023-07-05  7:05 ` Boqun Feng
2023-07-05 13:16   ` Mathieu Desnoyers
2023-07-07 10:40 ` Jonas Oberhauser
2023-07-07 17:25   ` Olivier Dion
2023-07-10 14:32     ` Jonas Oberhauser
2023-08-16 14:31       ` Mathieu Desnoyers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ecf1b725-1351-3460-4a99-7a03f15af152@efficios.com \
    --to=mathieu.desnoyers@efficios.com \
    --cc=boqun.feng@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=gcc@gcc.gnu.org \
    --cc=j.alglave@ucl.ac.uk \
    --cc=jwakely.gcc@gmail.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=luc.maranget@inria.fr \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=npiggin@gmail.com \
    --cc=odion@efficios.com \
    --cc=parri.andrea@gmail.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rnk@google.com \
    --cc=stern@rowland.harvard.edu \
    --cc=trix@redhat.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).