From: Adhemerval Zanella <azanella@linux.vnet.ibm.com>
To: libc-alpha@sourceware.org
Subject: Re: Future atomic operation cleanup
Date: Fri, 29 Aug 2014 13:32:00 -0000 [thread overview]
Message-ID: <540080DF.6030205@linux.vnet.ibm.com> (raw)
In-Reply-To: <1409318874.6045.73.camel@triegel.csb>
On 29-08-2014 10:27, Torvald Riegel wrote:
> On Fri, 2014-08-29 at 08:10 -0300, Adhemerval Zanella wrote:
>> On 29-08-2014 06:55, Torvald Riegel wrote:
>>> On Thu, 2014-08-28 at 13:27 -0300, Adhemerval Zanella wrote:
>>>> On 28-08-2014 12:37, Torvald Riegel wrote:
>>>>> On Thu, 2014-08-28 at 10:57 -0300, Adhemerval Zanella wrote:
>>>>>> On 22-08-2014 11:00, Richard Henderson wrote:
>>>>>>> On 08/22/2014 06:50 AM, Adhemerval Zanella wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Following comments from my first patch to optimize single-thread internal
>>>>>>>> glibc locking/atomics [1], I have changed the implementation to use now
>>>>>>>> relaxed atomics instead. Addresing the concerns raised in last discussion,
>>>>>>>> the primitives are still signal-safe (although not thread-safe), so if future
>>>>>>>> malloc implementation is changed to be async-safe, it won't require to a
>>>>>>>> adjust powerpc atomics.
>>>>>>>>
>>>>>>>> For catomic_and and catomic_or I follow the definition at 'include/atomic.h'
>>>>>>>> (which powerpc is currently using) and implemented the atomics with acquire
>>>>>>>> semantics. The new implementation also is simpler.
>>>>>>>>
>>>>>>>> On synthetic benchmarks it shows an improvement of 5-10% for malloc
>>>>>>>> calls and an performance increase of 7-8% in 483.xalancbmk from
>>>>>>>> speccpu2006 (number from an POWER8 machine).
>>>>>>>>
>>>>>>>> Checked on powerpc64, powerpc32 and powerpc64le.
>>>>>>> Wow, that's a lot of boiler-plate.
>>>>>>>
>>>>>>> When development opens again, can we simplify all of these atomic operations by
>>>>>>> assuming compiler primitives? That is, use the __atomic builtins for gcc 4.8
>>>>>>> and later, fall back to the __sync builtins for earlier gcc, and completely
>>>>>>> drop support for truly ancient compilers that support neither.
>>>>>>>
>>>>>>> As a bonus we'd get to unify the implementations across all of the targets.
>>>>>>>
>>>>>>>
>>>>>>> r~
>>>>>>>
>>>>>> I also agree we should move to more a unified implementation (in fact, I
>>>>>> plan to get rid of powerpc lowlevellock.h when devel opens again). However
>>>>>> I really don't want to either wait or reimplement all the custom atomic to push
>>>>>> this optimization...
>>>>> I believe that, unless the caveat Joseph mentioned actually applies to
>>>>> the archs you're concerned about, using the compiler builtins and doing
>>>>> the unification would simplify your patch considerably. It would also
>>>>> avoid having to iterate over the changes of your patch again once we do
>>>>> all of the unification.
>>>>>
>>>>>> I think such change will require a lot of iteration and testing, which is not
>>>>>> the intend of this patch.
>>>>> If we move to C11-like atomics, then those will certainly go in
>>>>> incrementally, and exist in parallel with the current atomics for a
>>>>> while (until we reviewed all existing code using the old atomics and
>>>>> moved it over to the new atomics).
>>>>>
>>>>> If we use the compiler builtins, we'd also have to test less because we
>>>>> have no additional custom atomics implementation we need to maintain.
>>>>>
>>>> I do agree moving to a compiler specific is the best way, although for current
>>>> status where we support GCC 4.4 we still need hand crafted assembly for atomics.
>>>> Also, the GCC atomics are only support for 4.7+ afaik [1] and C11 for 4.9.
>>> Note that I'm not arguing for using C11 at this point, but using atomics
>>> that are very much like the atomic_*_explicit operations. The function
>>> names can be different, but functionality, parameter ordering, etc.,
>>> would be similar.
>>>
>>>> We can make what Joseph suggested and move minimum compiler required for GCC 4.7.
>>>>
>>>> But if we intend to actually add mininum GCC 4.7, I can work on changing all the
>>>> atomics to GCC builtins.
>>> Another option would be to only enable the new atomics that we don't
>>> have right now (e.g., memory_order_relaxed) when building with 4.7 or
>>> higher, and falling back to some of the custom assembly ones otherwise.
>>> This would mean no performance regressions but the optimizations would
>>> only be effective for 4.7 or higher. memory_order_relaxed loads/stores
>>> could be an exception, because I think we need to make them explicity,
>>> and we don't want to slow down code that now uses plain nonatomic
>>> loads/stores in cases where it should actually be memory_order_relaxed
>>> accesses.
>>>
>>>
>> I don't like this conditional enable because it will require more logic
>> to check for compiler version and add different code generation depending
>> of this. If the idea is to simplify and cleanup code I think it is better
>> to either add custom arch-specific code that will build regardless of
>> compiler or use compiler builtins and get rid off hand-craft assembly.
>> I see no point in enabling both.
> *If* we can make 4.7 a requirement, then I'm certainly all for using
> just the builtins on architectures where they are available and get
> inlined.
>
>
And that is exactly what I am asking: are we willing to raise GCC requirements
for 2.21? If we will, I see no issue on rework the patch to use GCC builtins.
If we won't, I see to no point in trying to adjust this patch to use conditionally
the GCC builtins for GCC 4.7+.
next prev parent reply other threads:[~2014-08-29 13:32 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-22 13:50 [PATCH v2] PowerPC: libc single-thread lock optimization Adhemerval Zanella
2014-08-22 14:00 ` Future atomic operation cleanup Richard Henderson
2014-08-22 14:50 ` Joseph S. Myers
2014-08-22 17:26 ` Torvald Riegel
2014-08-28 13:57 ` Adhemerval Zanella
2014-08-28 15:37 ` Torvald Riegel
2014-08-28 16:28 ` Adhemerval Zanella
2014-08-29 9:55 ` Torvald Riegel
2014-08-29 11:10 ` Adhemerval Zanella
2014-08-29 13:28 ` Torvald Riegel
2014-08-29 13:32 ` Adhemerval Zanella [this message]
2014-08-29 16:33 ` Chris Metcalf
2014-10-23 17:12 ` Carlos O'Donell
2014-08-29 21:52 ` Roland McGrath
2014-09-05 17:51 ` Torvald Riegel
2016-03-11 18:36 ` [PATCH v3] PowerPC: libc single-thread lock optimization Tulio Magno Quites Machado Filho
2016-03-11 18:39 ` Florian Weimer
2016-03-11 21:12 ` Tulio Magno Quites Machado Filho
2016-03-28 17:36 ` [PING][PATCH " Tulio Magno Quites Machado Filho
2016-04-07 11:24 ` [PATCH " Torvald Riegel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=540080DF.6030205@linux.vnet.ibm.com \
--to=azanella@linux.vnet.ibm.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).