From: Ross Johnson <rpj@ise.canberra.edu.au>
To: pthreads-win32@sources.redhat.com
Cc: Alexander Terekhov <TEREKHOV@de.ibm.com>
Subject: Re: mutexes: "food for thought"
Date: Fri, 08 Oct 2004 12:49:00 -0000 [thread overview]
Message-ID: <41668CD1.7090207@ise.canberra.edu.au> (raw)
In-Reply-To: <OF1BDED071.2BB52616-ONC1256DC3.005FFB57-C1256DC3.0061B143@de.ibm.com>
Hi all,
Summary: mutex speedups.
Belatedly getting around to Alexander Terekhov's sketched
enhancements to mutexes (below), I've rewritten the mutex
routines in pthreads-win32. The main objective was to remove
the need for the extra critical section (wait_cs) in the
unlock and timedlock routines.
However, I couldn't get my translation of Alexander's logic
to work, so have applied Ulrich Drepper's futex based mutex
algorithms (specifically 'Mutex2') from his paper
"http://people.redhat.com/drepper/futex.pdf". Some of the
other ideas in Alexander's sketch, such as postponing full
initialisation of statically declared mutexes until the slow
sections of mutex operations (shown as DCSI() in the sketch
below), have not been included yet, and may not be, because
it would require recompiling applications before they could
use the new dll. Postponing saves a compare op in each call
to lock, timedlock or trylock.
The new code is in CVS if anyone wants to inspect/try it.
The modified pthreads-win32 dll passes the full testsuite
and has achieved some very significant speedups - at least
on my single processor machine. In particular, the rwlock7.c
test, which intensively exercises reader/writer locks, runs
approximately 3 times faster than previously. [The
reader/writer locks in pthreads-win32 are built from
pthreads-win32 mutexes and condition variables.]
Further speedups were attempted by inlining the [many] calls
to InterlockedCompareExchange(). This uses the library's own
assembler version of this routine (X86 only), which was
originally included for Win9x systems. But surprisingly,
this canceled out almost all of the speed gains just made.
It turns out that the 'lock' prefix to the
cmpxchg instruction has this effect on single processor
systems - as a google search later confirmed - see:
http://gcc.gnu.org/ml/java/2001-03/msg00122.html
Interestingly though, the Windows version of
InterlockedCompareExchange() on single processor systems
doesn't appear to use the 'lock' prefix as calling it is
only marginally slower (by approximately 10%) than the new
pthreads-win32 dll with inlined CMPXCHG minus 'lock' prefix.
I assume the difference is subroutine call overhead. So,
rather than build a separate dll for SMP systems, inlining
is currently turned off, sacrificing the 10% speed gain for
binary portablility.
[If anyone wants to turn inlining on - after checking out
the code from CVS, change the "#if 0" to "#if 1" at the
bottom of ptw32_InterlockedCompareExchange.c, and build the
dll by running "nmake clean VC-inlined", or "make clean
GC-inlined" for MinGW.]
With all changes included, performance of pthreads mutexes
is approaching, and in the case of trylock, apparently
exceeding the performance of Win32 Critical Section calls -
based on tests\benchtest1.c. But, by avoiding Win32 Critical
Sections, there is now a possibility that pthreads-win32
mutexes can exist in process shared memory, which may then
allow PROCESS_SHARED mutexes and other objects to be
implemented.
Unless I'm mistaken, the one negative about all of this is
that threads are no longer guarranteed strict FIFO access to
the lock. That is, a thread newly requesting the lock can
sometimes steal the lock off an already waiting thread.
Regards.
Ross
Alexander Terekhov wrote 1 year ago:
>G'Day,
>
>here's "ala futex based" mutex stuff using XCHG.
>
>No need for CAS. I hope that it will work just fine.
>
>Can you see any harmful race condition(s) here?
>
>TIA.
>
>#define SWAP_BASED_MUTEX_FOR_WINDOWS_INITIALIZER { 0, 0 }
>
>struct swap_based_mutex_for_windows {
>
> atomic<int> m_lock_status; // -1: free, 0: locked, 1
>lock-contention
> atomic<auto_reset_event *> m_retry_event; // DCSI'd
>
> void DCSI(); // double-checked serialized initialization
> void slow_lock();
> bool slow_trylock();
> bool slow_timedlock(absolute_timeout const & timeout);
> void release_one_waiter_if_any();
>
> void lock() {
> if (m_lock_status.swap(0, msync::acq) >= 0) slow_lock();
> }
>
> bool trylock() {
> return (m_lock_status.swap(0, msync::acq) < 0) ? true :
>slow_trylock();
> }
>
> bool timedlock(absolute_timeout const & timeout) {
> return (m_lock_status.swap(0, msync::acq) < 0) ? true :
>slow_timedlock(timeout);
> }
>
> void unlock() {
> if (m_lock_status.swap(-1, msync::rel) > 0)
>release_one_waiter_if_any();
> }
>
>};
>
>void swap_based_mutex_for_windows::slow_lock() {
> DCSI();
> while (m_lock_status.swap(1, msync::acq) >= 0)
> m_retry_event.load(msync::none)->wait();
>}
>
>bool swap_based_mutex_for_windows::slow_trylock() {
> DCSI();
> return m_lock_status.swap(1, msync::acq) < 0;
>}
>
>bool swap_based_mutex_for_windows::slow_timedlock(absolute_timeout const &
>timeout) {
> DCSI();
> while (m_lock_status.swap(1, msync::acq) >= 0)
> if (!m_retry_event.load(msync::none)->timedwait(timeout)) return
>false;
> return true;
>}
>
>void swap_based_mutex_for_windows::release_one_waiter_if_any() {
> m_retry_event.load(msync::none)->set();
>}
>
>void swap_based_mutex_for_windows::DCSI() {
> if (!m_retry_event.load(msync::none)) {
> named_windows_mutex_trick guard(this);
> if (!m_retry_event.load(msync::none)) {
> m_retry_event.store(new auto_reset_event(), msync::rel);
> m_lock_status.store(-1, msync::rel);
> }
> }
>}
>
>regards,
>alexander.
>
>P.S. I've never run it. Just a sketch.
>
>
next prev parent reply other threads:[~2004-10-08 12:49 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2003-10-18 17:47 Alexander Terekhov
2004-10-08 12:49 ` Ross Johnson [this message]
2004-10-26 17:29 ` mutexes: "food for thought" [upcoming XBOX] Alexander Terekhov
[not found] <4166852F.8090300@callisto.canberra.edu.au>
2004-10-08 12:53 ` mutexes: "food for thought" Alexander Terekhov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=41668CD1.7090207@ise.canberra.edu.au \
--to=rpj@ise.canberra.edu.au \
--cc=TEREKHOV@de.ibm.com \
--cc=pthreads-win32@sources.redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).