Re: GCC libatomic ABI specification draft

public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed

From: Torvald Riegel <triegel@redhat.com>
To: Bin Fan <bin.x.fan@oracle.com>
Cc: "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>,
	Richard Henderson <rth@redhat.com>,
	       Jakub Jelinek <jakub@redhat.com>
Subject: Re: GCC libatomic ABI specification draft
Date: Tue, 17 Jan 2017 17:00:00 -0000	[thread overview]
Message-ID: <1484672405.5606.420.camel@redhat.com> (raw)
In-Reply-To: <c2d90b1a-0b0e-0460-d28f-a02ad15f5ab3@oracle.com>

On Thu, 2016-11-17 at 12:12 -0800, Bin Fan wrote:
> On 11/14/2016 4:34 PM, Bin Fan wrote:
> > Hi All,
> >
> > I have an updated version of libatomic ABI specification draft. Please 
> > take a look to see if it matches GCC implementation. The purpose of 
> > this document is to establish an official GCC libatomic ABI, and allow 
> > compatible compiler and runtime implementations on the affected 
> > platforms.

Thanks for the update, and sorry for the late reply.  Comments below.

> > - Rewrite section 3 to replace "lock-free" operations with "hardware 
> > backed" instructions. The digest of this section is: 1) inlineable 
> > atomics must be implemented with the hardware backed atomic 
> > instructions. 2) for non-inlineable atomics, the compiler must 
> > generate a runtime call, and the runtime support function is free to 
> > use any implementation.

OK.

I still think that using hardware-backed instructions for a particular
type requires that there is a true atomic load instruction for that
type.  Emulating a load with an idempotent store (eg, cmpxchg16b) is not
useful, overall.

One could argue that an idempotent atomic HW store such as a cmpxchg16b
in a loop is indeed lock-free.  However, IMO the intention behind
"lock-free" atomics in C and C++ is to offer atomics that are both
lock-free *and* as fast as one would assume for a fully HW-backed
solution for atomic accesses.  This includes that loads must be cheaper
than stores, in particular under contention / concurrent accesses by
several threads.
I believe that "fast" is much more often part of the motivation for
using lock-free atomics than the actual "lock-free", so the
progress-guarantee aspect (which isn't even lock-free but
obstruction-free, see below).  If we do see a sufficiently strong need
for lock-free atomics, which should build something just for that (eg,
if removing the address-free requirement, we can support lock-free (in
the progress-guarantee sense) operations for a lot more types).

Also, while that previous issue is "just" a performance issue, the fact
that we could issue a store when calling to atomic_load() is a
correctness issue, I think.
One example are volatile atomic loads; while C/C++ don't really
constrain what a volatile load needs to be in the underlying
implementation, I think most users would assume that a load really means
a hardware load instruction of some sort, and nothing else.  cmpxchg16b
conflicts with such an assumption.
Another example is read-only mapped memory.

Bottom line: we shouldn't rely solely on cmpxchg16b and similar.
(Though this doesn't necessarily mean that there can't be compiler flags
that enable its use.)

I think the ABI should set a baseline for each architecture, and the
baseline decides whether something is inlinable or not.  Thus, the
x86_64 ABI would make __int128 operations not imlinable (because of the
issues with cmpxchg16b, see above).

If users want to use capabilities beyond the baseline, they can choose
to use flags that alter/extend the ABI.  For example, if they use a flag
that explicitly enables the use of cmpxchg16b for atomics, they also
need to use a libatomic implementation built in the same way (if
possible).  This then creates a new ABI(-variant), basically.

I've made a few tests on my x86_64 machine a few weeks ago, and I didn't
see cmpxchg16b being used.  IIRC, I also looked at libatomic and didn't
see it (but I don't remember for sure).  Either way, if I should have
been wrong, and we are using cmpxchg16b for loads, this should be fixed.
Ideally, this should be fixed before the stage 3 deadline this Friday.
Such a fix might potentially break existing uses, but the earlier we fix
this, the better.

Section 3 Rationale, alternative 1: I'm wondering if the example is
correct.  For a 4-byte-aligned type of size 3, the implementation cannot
simply use 4-byte hardware-backed atomics because this will inevitably
touch the 4th byte I think, and the implementation can't know whether
this is padding or not.  Or do we expect that things like packed structs
are disallowed?

N3.1:  Why do you assume that 8-byte HW atomics are available on i386?
Because cmpxchg8b is available for CPUs that are the lowest i?86 we
still intend to support?

I'd also use "hardware-backed" instead of "hardware backed".

> > - The Rationale section in section 3 is also revised to remove the 
> > mentioning of "lock-free", but there is not major change of concept.
> >
> > - Add note N3.1 to emphasize the assumption of general hardware 
> > supported atomic instruction
> >
> > - Add note N3.2 to discuss the issues of cmpxchg16b

See above.

> > - Add a paragraph in section 4.1 to specify memory_order_consume must 
> > be implemented through memory_order_acquire. Section 4.2 emphasizes it 
> > again.
> >
> > - The specification of each runtime functions mostly maps to the 
> > corresponding generic functions in the C11 standard. Two functions are 
> > worth noting:
> > 1) C11 atomic_compare_exchange compares and updates the "value" while 
> > __atomic_compare_exchange functions in this ABI compare and update the 
> > "memory", which implies the memcmp and memcpy semantics.

In Section 4, parts about atomic_compare_exchange: should there be a
back-reference to the memcmp point made earlier in the document?

> > 2) The specification of __atomic_is_lock_free allows both a per-object 
> > result and a per-type result. A per-type implementation could pass 
> > NULL, or a faked address as the address of the object. A per-object 
> > implementation could pass the actual address of the object.

The __atomic_is_lock_free description should specify that "lock-free"
refers to the definition of "lock-free" in C++14, which includes
"address-free".  I'm referring to C++14 specifically because this
contains an update which is relevant for (1) LL/SC-based architectures
(ie, that "lock-free" is actually what is called obstruction-free in the
literature) and (2) for any libatomic implementation that wants to use
HW atomics for things like the example in Section 3's Rationale,
alternative 1 (see above).

This ABI needs to also specify how hardware-backed atomics are
implemented on a particular architecture.  For example, on architectures
where there is more than one choice for how to certain memory orders
(eg, ARM), the ABI should pick a certain mapping.  I guess this should
be a note in Section 4, maybe as a separate subsection and/or an
additional note around the memory_order enum description; I'd keep the
note about implementing something equivalent to C11/C++11 semantics.
What we would document is something like the possible mappings discussed
here: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

There are typos in Section 2.4.

next prev parent reply	other threads:[~2017-01-17 17:00 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cbd2c83a-b50b-b2ac-b62d-b2d26178c2b1@oracle.com>
2016-07-06 17:50 ` Fwd: Re: GCC libatomic questions Richard Henderson
2016-07-06 19:41   ` Richard Henderson
2016-07-07 23:56     ` Bin Fan
     [not found]       ` <ac2d60ed-a659-f018-1f11-63fa8f5847f5@oracle.com>
     [not found]         ` <1470412312.14544.4.camel@localhost.localdomain>
     [not found]           ` <4a182edd-41a8-4ad9-444a-bf0af567ae98@oracle.com>
     [not found]             ` <8317ec9d-41ad-d806-9144-eac2984cdd38@oracle.com>
2016-11-17 20:12               ` GCC libatomic ABI specification draft Bin Fan
2016-11-29 11:12                 ` Szabolcs Nagy
2016-12-01 19:14                   ` Bin Fan at Work
2016-12-02 11:13                     ` Gabriel Paubert
2016-12-19 16:33                       ` Torvald Riegel
2016-12-20 13:27                         ` Ulrich Weigand
2016-12-20 13:58                           ` Szabolcs Nagy
2016-12-22 14:29                             ` Ulrich Weigand
2016-12-22 17:38                               ` Segher Boessenkool
2017-01-04 11:25                                 ` Szabolcs Nagy
2017-01-19 15:18                                 ` Torvald Riegel
2017-01-17 17:00                 ` Torvald Riegel [this message]
2017-01-18 22:23                   ` Richard Henderson
2017-01-19 15:02                     ` Torvald Riegel
2017-01-20 13:42                     ` Michael Matz
2017-01-20 17:17                       ` Richard Henderson
2017-01-23 14:00                         ` Michael Matz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1484672405.5606.420.camel@redhat.com \
    --to=triegel@redhat.com \
    --cc=bin.x.fan@oracle.com \
    --cc=gcc@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=rth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).