public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella <azanella@linux.vnet.ibm.com>
To: libc-alpha@sourceware.org
Subject: Re: Future atomic operation cleanup
Date: Fri, 29 Aug 2014 13:32:00 -0000	[thread overview]
Message-ID: <540080DF.6030205@linux.vnet.ibm.com> (raw)
In-Reply-To: <1409318874.6045.73.camel@triegel.csb>

On 29-08-2014 10:27, Torvald Riegel wrote:
> On Fri, 2014-08-29 at 08:10 -0300, Adhemerval Zanella wrote:
>> On 29-08-2014 06:55, Torvald Riegel wrote:
>>> On Thu, 2014-08-28 at 13:27 -0300, Adhemerval Zanella wrote:
>>>> On 28-08-2014 12:37, Torvald Riegel wrote:
>>>>> On Thu, 2014-08-28 at 10:57 -0300, Adhemerval Zanella wrote:
>>>>>> On 22-08-2014 11:00, Richard Henderson wrote:
>>>>>>> On 08/22/2014 06:50 AM, Adhemerval Zanella wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Following comments from my first patch to optimize single-thread internal
>>>>>>>> glibc locking/atomics [1], I have changed the implementation to use now
>>>>>>>> relaxed atomics instead.  Addresing the concerns raised in last discussion, 
>>>>>>>> the primitives are still signal-safe (although not thread-safe), so if future
>>>>>>>> malloc implementation is changed to be async-safe, it won't require to a
>>>>>>>> adjust powerpc atomics.
>>>>>>>>
>>>>>>>> For catomic_and and catomic_or I follow the definition at 'include/atomic.h'
>>>>>>>> (which powerpc is currently using) and implemented the atomics with acquire
>>>>>>>> semantics.  The new implementation also is simpler.
>>>>>>>>
>>>>>>>> On synthetic benchmarks it shows an improvement of 5-10% for malloc
>>>>>>>> calls and an performance increase of 7-8% in 483.xalancbmk from
>>>>>>>> speccpu2006 (number from an POWER8 machine).
>>>>>>>>
>>>>>>>> Checked on powerpc64, powerpc32 and powerpc64le.
>>>>>>> Wow, that's a lot of boiler-plate.
>>>>>>>
>>>>>>> When development opens again, can we simplify all of these atomic operations by
>>>>>>> assuming compiler primitives?  That is, use the __atomic builtins for gcc 4.8
>>>>>>> and later, fall back to the __sync builtins for earlier gcc, and completely
>>>>>>> drop support for truly ancient compilers that support neither.
>>>>>>>
>>>>>>> As a bonus we'd get to unify the implementations across all of the targets.
>>>>>>>
>>>>>>>
>>>>>>> r~
>>>>>>>
>>>>>> I also agree we should move to more a unified implementation (in fact, I
>>>>>> plan to get rid of powerpc lowlevellock.h when devel opens again).  However
>>>>>> I really don't want to either wait or reimplement all the custom atomic to push
>>>>>> this optimization... 
>>>>> I believe that, unless the caveat Joseph mentioned actually applies to
>>>>> the archs you're concerned about, using the compiler builtins and doing
>>>>> the unification would simplify your patch considerably.  It would also
>>>>> avoid having to iterate over the changes of your patch again once we do
>>>>> all of the unification.
>>>>>
>>>>>> I think such change will require a lot of iteration and testing, which is not
>>>>>> the intend of this patch.
>>>>> If we move to C11-like atomics, then those will certainly go in
>>>>> incrementally, and exist in parallel with the current atomics for a
>>>>> while (until we reviewed all existing code using the old atomics and
>>>>> moved it over to the new atomics).
>>>>>
>>>>> If we use the compiler builtins, we'd also have to test less because we
>>>>> have no additional custom atomics implementation we need to maintain.
>>>>>
>>>> I do agree moving to a compiler specific is the best way, although for current
>>>> status where we support GCC 4.4 we still need hand crafted assembly for atomics.
>>>> Also, the GCC atomics are only support for 4.7+ afaik [1] and C11 for 4.9.
>>> Note that I'm not arguing for using C11 at this point, but using atomics
>>> that are very much like the atomic_*_explicit operations.  The function
>>> names can be different, but functionality, parameter ordering, etc.,
>>> would be similar.
>>>
>>>> We can make what Joseph suggested and move minimum compiler required for GCC 4.7.
>>>>
>>>> But if we intend to actually add mininum GCC 4.7, I can work on changing all the
>>>> atomics to GCC builtins.
>>> Another option would be to only enable the new atomics that we don't
>>> have right now (e.g., memory_order_relaxed) when building with 4.7 or
>>> higher, and falling back to some of the custom assembly ones otherwise.
>>> This would mean no performance regressions but the optimizations would
>>> only be effective for 4.7 or higher.  memory_order_relaxed loads/stores
>>> could be an exception, because I think we need to make them explicity,
>>> and we don't want to slow down code that now uses plain nonatomic
>>> loads/stores in cases where it should actually be memory_order_relaxed
>>> accesses.
>>>
>>>
>> I don't like this conditional enable because it will require more logic
>> to check for compiler version and add different code generation depending
>> of this. If the idea is to simplify and cleanup code I think it is better
>> to either add custom arch-specific code that will build regardless of
>> compiler or use compiler builtins and get rid off hand-craft assembly.
>> I see no point in enabling both.
> *If* we can make 4.7 a requirement, then I'm certainly all for using
> just the builtins on architectures where they are available and get
> inlined.
>
>
And that is exactly what I am asking: are we willing to raise GCC requirements
for 2.21?  If we will, I see no issue on rework the patch to use GCC builtins.
If we won't, I see to no point in trying to adjust this patch to use conditionally
the GCC builtins for GCC 4.7+.

  reply	other threads:[~2014-08-29 13:32 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-22 13:50 [PATCH v2] PowerPC: libc single-thread lock optimization Adhemerval Zanella
2014-08-22 14:00 ` Future atomic operation cleanup Richard Henderson
2014-08-22 14:50   ` Joseph S. Myers
2014-08-22 17:26   ` Torvald Riegel
2014-08-28 13:57   ` Adhemerval Zanella
2014-08-28 15:37     ` Torvald Riegel
2014-08-28 16:28       ` Adhemerval Zanella
2014-08-29  9:55         ` Torvald Riegel
2014-08-29 11:10           ` Adhemerval Zanella
2014-08-29 13:28             ` Torvald Riegel
2014-08-29 13:32               ` Adhemerval Zanella [this message]
2014-08-29 16:33                 ` Chris Metcalf
2014-10-23 17:12                   ` Carlos O'Donell
2014-08-29 21:52                 ` Roland McGrath
2014-09-05 17:51                   ` Torvald Riegel
2016-03-11 18:36                 ` [PATCH v3] PowerPC: libc single-thread lock optimization Tulio Magno Quites Machado Filho
2016-03-11 18:39                   ` Florian Weimer
2016-03-11 21:12                     ` Tulio Magno Quites Machado Filho
2016-03-28 17:36                   ` [PING][PATCH " Tulio Magno Quites Machado Filho
2016-04-07 11:24                   ` [PATCH " Torvald Riegel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=540080DF.6030205@linux.vnet.ibm.com \
    --to=azanella@linux.vnet.ibm.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).