[Bug libobjc/47031] New: libobjc uses mutexes for properties

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug libobjc/47031] New: libobjc uses mutexes for properties
@ 2010-12-21 11:42 js-gcc at webkeks dot org
  2010-12-21 11:47 ` [Bug libobjc/47031] " nicola at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: js-gcc at webkeks dot org @ 2010-12-21 11:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47031

           Summary: libobjc uses mutexes for properties
           Product: gcc
           Version: 4.6.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libobjc
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: js-gcc@webkeks.org

In trunk, libobjc uses objc_mutex_t for properties. This means that each time
you set a property, a lock in kernelspace is acquired, though most of the time
there is not even a lock held and if it is only for a very short time.
As properties are used quite often, spinlock should be used here, as locks will
be a performance problem.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug libobjc/47031] libobjc uses mutexes for properties
  2010-12-21 11:42 [Bug libobjc/47031] New: libobjc uses mutexes for properties js-gcc at webkeks dot org
@ 2010-12-21 11:47 ` nicola at gcc dot gnu.org
  2010-12-29 16:11 ` nicola at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: nicola at gcc dot gnu.org @ 2010-12-21 11:47 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47031

Nicola Pero <nicola at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2010.12.21 11:47:22
                 CC|                            |nicola at gcc dot gnu.org
     Ever Confirmed|0                           |1

--- Comment #1 from Nicola Pero <nicola at gcc dot gnu.org> 2010-12-21 11:47:22 UTC ---
Yes, and it would be good to fix it for 4.6.

Patches are welcome. ;-)

Thanks


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug libobjc/47031] libobjc uses mutexes for properties
  2010-12-21 11:42 [Bug libobjc/47031] New: libobjc uses mutexes for properties js-gcc at webkeks dot org
  2010-12-21 11:47 ` [Bug libobjc/47031] " nicola at gcc dot gnu.org
@ 2010-12-29 16:11 ` nicola at gcc dot gnu.org
  2011-01-01 12:07 ` js-gcc at webkeks dot org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: nicola at gcc dot gnu.org @ 2010-12-29 16:11 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47031

--- Comment #2 from Nicola Pero <nicola at gcc dot gnu.org> 2010-12-29 16:10:51 UTC ---
I'm actually not very convinced by this any more; we probably need some
benchmarks. ;-)

The problem is that property accessors are basically general purpose routines
that may be used in the most varied situations.

So, we have very little control or knowledge over when and how they are used --

 * we don't know how many CPUs or cores the user has

 * we don't know how many threads the user is starting

 * we don't know how many threads are sharing a CPU or core

 * we don't know how intensively the user is using the property accessors

Spinlocks are appropriate when certain conditions are met; but in this case,
it seems impossible to be confident that these are met.  A user may write a
program with 3 or 4 threads running on his 1 CPU/core machine, which constantly
read/write an atomic synthesized property to synchronize between themselves. 
Why not; but then, spinlocks would actually degrade performance instead of
improving it.

Traditional locks may be slower if you a low contention case, but work
consistently OK in all conditions.

For me, the key problem is that:

 * spinlocks are better/faster if there is low contention and very little
chance that two threads enter the critical region (inside the accessors) at the
same time.

 * the difference in performance between mutexes and spinlocks only matters in
the program performance if the accessors are called very often.

But these two things conflict with each other ;-)

For example, if a spinlock makes the accessor 2x as fast as with a mutex, but
the program only spends 0.1% of its time calling the accessors, then the
difference in performance on the whole program would be of the order of 0.05%;
then, we prefer a mutex since the performance is more consistent and it has no
"worst-case" scenarios.

If the program spends more (say, 10%) of its time calling the accessors, then
the difference in performance would matter (it would be something like 5%), but
because the program is spending so much time in accessors, if the program is
multi-threaded there is high contention, and spinlocks don't perform well any
more - in fact, the worst-case scenarios (where lots of CPU is wasted spinning
and making no progress) may appear. (keep in mind that as we're sharing locks
across different objects/properties, even if the different threads are calling
accessors of different objects/properties, the locks would still be contended).

The only case where spinlocks really help is if the program spends lots of time
calling accessors, and is not multi-threaded.  In which case, the programmer
could get a huge speed-up by simply declaring the properties non-atomic.

So, I'm not sure there is a good case for spinlocks.

It may be good to try some benchmarks to get a feeling for the difference in
performance between mutexes and spinlocks.  Would using spinlocks make
accessors 2x faster ? 10x faster ? 10% faster ?

Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug libobjc/47031] libobjc uses mutexes for properties
  2010-12-21 11:42 [Bug libobjc/47031] New: libobjc uses mutexes for properties js-gcc at webkeks dot org
  2010-12-21 11:47 ` [Bug libobjc/47031] " nicola at gcc dot gnu.org
  2010-12-29 16:11 ` nicola at gcc dot gnu.org
@ 2011-01-01 12:07 ` js-gcc at webkeks dot org
  2011-01-07 18:11 ` nicola at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: js-gcc at webkeks dot org @ 2011-01-01 12:07 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47031

--- Comment #3 from js-gcc at webkeks dot org <js-gcc at webkeks dot org> 2011-01-01 12:06:56 UTC ---
> The problem is that property accessors are basically general purpose routines
that may be used in the most varied situations.

It does not matter very much in which situation a property is used. To chose
which type of lock you use, it's only important what is done while the lock is
held. In this case, no call to the kernel-space is made at all and only a small
operation is done. Switching to kernel-space for a mutex is already way more
complex than what we do in the lock. If I'd have to guess, I'd say switching to
kernel-space is at least 100 times more expensive than what we do.

> So, we have very little control or knowledge over when and how they are used --

Which we don't care about at all.

>  * we don't know how many CPUs or cores the user has

Does not really matter. If we have two cores, the spinlock can give control to
another thread after 10 spins using sched_yield().

So, if we only have one core and one thread spins because it waits for another
core to release the lock, then we waste at maximum 10 tries. This is the
worst-case scenario.

If we have more than one core, we most likely have another thread releasing the
lock before it even spinned 10 times.

So, no matter how many cores, it does not perform worse than a mutex (at least
not in a measurable way), while on systems with many cores, it's a huge
improvement. Plus changing a property is something that's so fast that we most
likely will never encounter a locked spinlock. That'd only happen if the
scheduler gave control to another thread before the property was changed.

So, with spinlocks, in 99% of the cases, it's not even measurable.
With mutexes, in 100% of the cases, it IS measurable.

>  * we don't know how many threads the user is starting
>  * we don't know how many threads are sharing a CPU or core

We don't really care about them, I think.

>  * we don't know how intensively the user is using the property accessors

So, because we don't know how intensively the user is using properties, we will
make them slow on purpose?

> Spinlocks are appropriate when certain conditions are met; but in this case,
> it seems impossible to be confident that these are met. 

Which conditions are not met in your opinion? Please list the conditions that
you think are not met, as Apple clearly thinks they are all met. And so do I.

> A user may write a
> program with 3 or 4 threads running on his 1 CPU/core machine, which constantly
> read/write an atomic synthesized property to synchronize between themselves. 
> Why not; but then, spinlocks would actually degrade performance instead of
> improving it.

This is actually why you call sched_yield() after 10 spins. It prevents a
thread from being stuck spinning while another thread could release the lock.

> Traditional locks may be slower if you a low contention case, but work
> consistently OK in all conditions.

Yes, they are the same in all conditions because they are always more complex
and slower ;).

> * spinlocks are better/faster if there is low contention and very little
> chance that two threads enter the critical region (inside the accessors) at the
> same time.

This is the case here.

> * the difference in performance between mutexes and spinlocks only matters in
> the program performance if the accessors are called very often.

If you init a lot of objects and those initialize let's say 30 variables using
properties, then this means that 30 locks are retained and released, although
no other thread could possibly access it. But still you do 30
userland-kernelspace-switches. For a single object! Now create 1000 objects.

With spinlocks, there won't be a single userland-kernelspace-switch!

Just to demonstrate that we are talking about something which really can make a
huge difference…

I think the percentages you list cannot be used at all, as we don't have
applications just doing some math calculations and then quitting. We don't want
something slow just because it might only be a small part of the program. We
want everything to be as fast as possible. Otherwise it sums up and makes a
crappy user experience for interactive applications. Apple demonstrated this
quite well if you compare how crappy it felt a few years ago and how well it
feels now that they started optimizing the small stuff as well.

> The only case where spinlocks really help is if the program spends lots of time
calling accessors, and is not multi-threaded.  In which case, the programmer
could get a huge speed-up by simply declaring the properties non-atomic.

Even in a threaded environment, it would make a huge difference. It's unlikely
the lock is held. Only if it is held, you need some CPU time. But with Mutexes,
each time you only check if the lock is held, you already switch to
kernel-space.

> Would using spinlocks make
> accessors 2x faster ? 10x faster ? 10% faster ?

My guess is that usually the spinlock is not held, so I could imagine factor
100 faster or even factor 1000. I remember having had some test some while ago
where I tried just locking and releasing a mutex and a spinlock and doing one
arithmetic operation. While the version with spinlocks took only a few seconds,
the version with mutexes still was not finished after a few hours, which was
when I aborted it.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug libobjc/47031] libobjc uses mutexes for properties
  2010-12-21 11:42 [Bug libobjc/47031] New: libobjc uses mutexes for properties js-gcc at webkeks dot org
                   ` (2 preceding siblings ...)
  2011-01-01 12:07 ` js-gcc at webkeks dot org
@ 2011-01-07 18:11 ` nicola at gcc dot gnu.org
  2011-01-07 18:30 ` nicola at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: nicola at gcc dot gnu.org @ 2011-01-07 18:11 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47031

--- Comment #4 from Nicola Pero <nicola at gcc dot gnu.org> 2011-01-07 17:56:01 UTC ---
Jonathan

thanks for your comments - they are very useful and we certainly want to look
at performance. ;-)

I'm not terribly convinced by using spinlocks in this context, but I'm happy to
be convinced otherwise. :-)

---

To understand what the order of magnitude is, I created a short program that
would loop and run 100m synthesized atomic, id setters.  I used GCC 4.6 to
compile, and gnustep-make/gnustep-base from trunk:

 * with the standard libobjc (objc_mutex_lock/objc_mutex_unlock), it takes
approximately 16.8 seconds to do the 100m setter calls

 * if I replace objc_mutex_lock with pthread_mutex_lock and objc_mutex_unlock
with pthread_mutex_unlock, it takes approximately 15.3 seconds (this basically
tests how fast we can go if we rework the objc_mutex... wrapper API)

 * if I comment out all the objc_mutex_lock/objc_mutex_unlock entirely, it
takes approximately 8.7 seconds.

This means that, assuming that spinlocks are infinitely faster than mutexes, in
the best possible conditions they would speed up the accessors (or, at least,
the setter, but the getter should be the same) by a maximum of about 2x. 
That's a great speedup by the way, but it's important to keep in mind that the
maximum speedup we can theoretically ever get for accessors is a 2x, not a 10x
or 100x. ;-)

---

Anyway, the key problem is that spinlocks are more wasteful (and unpredictable
?) than mutexes if there is contention and if the ratio of active threads vs.
active cores is not favourable.  Inside a kernel, when working on a specific
part, it is easy to assess these factors.  But libobjc is providing a
general-purpose, portable user-space facility for having atomic, synchronized
access to an unspecified property of an ObjC object.  It is hard to see how you
can guarantee anything about contention or ratio of active threads vs. active
cores in that context. :-(

I guess I'd feel a bit more confident if we were to experiment and benchmark
the worst-case scenario of spinlocks ... ie, how bad can they really go
compared to mutexes and how easy/hard it is for them to go bad in the case of
accessors.

Anyway, I understand you think spinlocks will always be faster than mutexes in
this case, because "My guess is that usually the spinlock is not held", but the
problem is that that is a guess, and it depends on how the accessors are used.
;-)

In particular --

 * it matters how often you call the accessors; don't forget that in the getter 
a method call (-retain) is inside the critical zone.  Even ignoring that the 
method call may trigger the execution of +initialize for the class (which would 
then execute an unbounded amount of user code into the critical zone, but which 
is fairly theoretical as the object has most likely been allocated using +alloc 
before getting a -retain, meaning +initialize should have already been done), a 
method call still constitutes about 25% of the time taken by the getter 
(ignoring locking overheads), so if your program spends 100% of the time 
calling a getter from two threads, any thread has up to a 25% chance of hitting 
the lock held by the other thread;

 * it matters how many threads you have.  If each thread spends 10% of its time 
calling a bunch of getters with the same lock, then each spends 2.5% of its 
time inside the critical zone.  If you have 2 threads, you have 2.5% of chance 
that your thread will try to enter the critical zone while the other thread is 
inside it.  But if you have 100 threads, the chances will be much much 
higher! (having 100s of threads is pretty common in server software where you 
are processing large numbers of indipendent requests simultaneously; they 
should be lightly coupled, so there is little synchronization to do; but in the 
case of synthesized property accessors, unfortunately, the locks are shared by 
completely unrelated objects, so even if the 100 threads are completely 
unrelated, as soon as they use atomic synthesized accessors, they will be using 
and competing for the same locks, exactly as if they were constantly 
synchronizing access to some shared resources!  Are spinlocks still appropriate
in this context ?)

 * it matters how many active cores you have.  If a thread tries to enter the
critical zone while another thread is holding the lock, then with mutexes it
doesn't matter if the other thread is active or suspended - there will be a
context switch in both cases; but with spinlocks it does.  If the other thread
is running on a different, active core, the first thread should waste a bit of
CPU and then enter it OK.  But if the second thread is running on the same
core, it will never make any progress during the spinning, and the first thread
will waste CPU time before being suspended - in this case it will be slower and
more CPU-wasteful than a mutex.

 * I also wonder, with all the CPUs that GCC support, if the GCC atomic
builtins are efficient on all platforms; if on a platform the builtins require
function calls they may actually make the spinlocks less attractive compared to
mutexes (we can obviously work around this by using spinlocks on the two or
three major platforms where we know they work well, and using mutexes on the
other ones).

Maybe the right way forward would be to experiment with some worst-case 
benchmarks for spinlocks vs mutexes.  Spinlocks can speed up accessors by up to 
2x in the right conditions.  How much can they speed them down in the wrong 
conditions ?  Eg, if you start up 100 threads on a 1 core machine, and have
them all access a getter concurrently for 1m times, how much slower would
spinlocks cause the program to execute ?

I would have no problem changing my mind and supporting the use of spinlocks if 
there is some evidence that they can't go *too* wrong ;-)

At the moment, I feel the performance trade-offs are unclear and we'd be using
spinlocks only because Apple is using them in their runtime.

Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug libobjc/47031] libobjc uses mutexes for properties
  2010-12-21 11:42 [Bug libobjc/47031] New: libobjc uses mutexes for properties js-gcc at webkeks dot org
                   ` (3 preceding siblings ...)
  2011-01-07 18:11 ` nicola at gcc dot gnu.org
@ 2011-01-07 18:30 ` nicola at gcc dot gnu.org
  2011-01-07 18:44 ` js-gcc at webkeks dot org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: nicola at gcc dot gnu.org @ 2011-01-07 18:30 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47031

--- Comment #5 from Nicola Pero <nicola at gcc dot gnu.org> 2011-01-07 18:23:10 UTC ---
I tried the same benchmark using pthread_spin_lock() and pthread_spin_unlock();
it takes about 11.4 seconds.

So, that means the time required to execute an accessor can go down to about 
67% of the original time by using a pthread spinlock instead of 
objc_mutex_lock.  The spinlock is indeed quite fast, as locking+unlocking seems 
to take about the same time as an ObjC method call ?  That's excellent, but if 
we were to optimize objc_mutex_lock so that it's as fast as a pthread_mutex, 
then the spinlock accessor would take about 75% of the mutex accessor, so in 
the end we are reducing the accessor overhead only by 25% in the best case, 
while introducing significant unknowns for the worst case.

I guess a manual spinlock implementation could be faster than the pthread one, 
but I expect pthreads to have a good one given writing locks is the reason of 
existence of their project, so I don't expect we can do that much better than 
they do.  Still, maybe we can inline the function calls, and get it down to 10
seconds or so.

Of course, I'm testing on Linux, where IIRC uncontended mutexes are fast as
they don't actually do the kernel call unless the lock is actually already
locked.

Given that Linux is our primary target platform, the case for spinlocks does
not seem strong.  Presumably Apple uses spinlocks because their primary target
platform (Darwin) doesn't have fast mutexes in the way that Linux does ?

It would be interesting to test the worst-case scenario for spinlocks to 
complete the picture.

Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug libobjc/47031] libobjc uses mutexes for properties
  2010-12-21 11:42 [Bug libobjc/47031] New: libobjc uses mutexes for properties js-gcc at webkeks dot org
                   ` (4 preceding siblings ...)
  2011-01-07 18:30 ` nicola at gcc dot gnu.org
@ 2011-01-07 18:44 ` js-gcc at webkeks dot org
  2011-01-08 13:43 ` nicola at gcc dot gnu.org
  2011-01-08 16:34 ` js-gcc at webkeks dot org
  7 siblings, 0 replies; 9+ messages in thread
From: js-gcc at webkeks dot org @ 2011-01-07 18:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47031

--- Comment #6 from js-gcc at webkeks dot org <js-gcc at webkeks dot org> 2011-01-07 18:43:15 UTC ---
> This means that, assuming that spinlocks are infinitely faster than mutexes, in
> the best possible conditions they would speed up the accessors (or, at least,
> the setter, but the getter should be the same) by a maximum of about 2x. 
> That's a great speedup by the way, but it's important to keep in mind that the
> maximum speedup we can theoretically ever get for accessors is a 2x, not a 10x
> or 100x. ;-)

That's interesting. Which compiler flags did you use? Did you try
-fomit-framepointer? Did you do IMP caching or did you dispatch the method
again all the time? With which flags did you build the rest runtime?

Remember: The dispatch in the current implementation is not very efficient and
the significance of this could change a lot once the dispatch is optimized.

> Anyway, the key problem is that spinlocks are more wasteful (and unpredictable
> ?) than mutexes if there is contention and if the ratio of active threads vs.
> active cores is not favourable. 

Well, not really:

Usually, the lock is not held. If it is, you do a little trick: You spin 10
times and if you still could not get the lock, it's likely the current thread
is blocking another thread from releasing the spinlock. Again, quite unlikely,
as the spinlock is only held for an extremely short amount of time. However, if
it happens that after 10 spins you still could not get the lock, you call
sched_yield() to NOT waste resources.

So, in the worst case, you waste 10 spins. That's basically 10 compares. That's
nothing compared to a user/kernelspace switch, which is often 10 times more.
Especially on architectures like SPARC64, this is extremely expensive. And it's
extremely unlikely the OS gives control to another OS before the lock is
released again. So, this almost never happens. So we can assume that almost
always the spinlock is faster here.

> It is hard to see how you
> can guarantee anything about contention or ratio of active threads vs. active
> cores in that context. :-(

See the explanation above and in the comment before. I listed all the
situations that can happen there, including multicore and singlecore machines.

> I guess I'd feel a bit more confident if we were to experiment and benchmark
> the worst-case scenario of spinlocks ... ie, how bad can they really go
> compared to mutexes and how easy/hard it is for them to go bad in the case of
> accessors.

If you give control to another thread after 10 spins, I doubt it will behave
much worse than mutexes. And again, this is very unlikely to happen. But feel
free to benchmark :). I'd be especially interested in benchmarks on platforms
where context switches are cheap and benchmarks on platforms where context
switches are extremely expensive.

> Anyway, I understand you think spinlocks will always be faster than mutexes in
> this case, because "My guess is that usually the spinlock is not held", but the
> problem is that that is a guess, and it depends on how the accessors are used.
> ;-)

Feel free to print "Conflict!" or something when that happens and try it in a
realworld scenario ;).

For the large list you created, I'm not sure what I should reply, because I'm
not sure whether those are questions or if you regard them as facts and because
I think I already commented most of it before.

One thing though: "(having 100s of threads is pretty common in server software
where you 
are processing large numbers of indipendent requests simultaneously;"
This is considered bad design. Stuff like select(), epoll() etc. exists ;). You
usually should have num_cores threads for best performance.

> But if the second thread is running on the same
> core, it will never make any progress during the spinning, and the first thread
> will waste CPU time before being suspended - in this case it will be slower and
> more CPU-wasteful than a mutex.

See above about the maximum spin count and sched_yield().

> * I also wonder, with all the CPUs that GCC support, if the GCC atomic
> builtins are efficient on all platforms; if on a platform the builtins require
> function calls they may actually make the spinlocks less attractive compared to
> mutexes (we can obviously work around this by using spinlocks on the two or
> three major platforms where we know they work well, and using mutexes on the
> other ones).

Do you know of any architectures where GCC needs to call a method? I've seen
GCC trying to call some methods for atomic ops - but this was always on
architectures which didn't support atomic operations at all. What I do is that
I check for atomic ops, then spinlocks in pthread and if that all fails fall
back to mutexes.

> I would have no problem changing my mind and supporting the use of spinlocks if 
> there is some evidence that they can't go *too* wrong ;-)

I guess you you should just test "hybrid spin locks" where you give control to
the kernel after a certain spin amount ;).

> At the moment, I feel the performance trade-offs are unclear and we'd be using
> spinlocks only because Apple is using them in their runtime.

This would also mean bigger compatibility :P. Everybody assumes it's spinlocks!
:P

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug libobjc/47031] libobjc uses mutexes for properties
  2010-12-21 11:42 [Bug libobjc/47031] New: libobjc uses mutexes for properties js-gcc at webkeks dot org
                   ` (5 preceding siblings ...)
  2011-01-07 18:44 ` js-gcc at webkeks dot org
@ 2011-01-08 13:43 ` nicola at gcc dot gnu.org
  2011-01-08 16:34 ` js-gcc at webkeks dot org
  7 siblings, 0 replies; 9+ messages in thread
From: nicola at gcc dot gnu.org @ 2011-01-08 13:43 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47031

--- Comment #7 from Nicola Pero <nicola at gcc dot gnu.org> 2011-01-08 11:39:38 UTC ---

> Usually, the lock is not held. If it is, you do a little trick: You spin 10
> times and if you still could not get the lock, it's likely the current thread
> is blocking another thread from releasing the spinlock. Again, quite unlikely,
> as the spinlock is only held for an extremely short amount of time. However,
> if it happens that after 10 spins you still could not get the lock, you call
> sched_yield() to NOT waste resources.
>
> So, in the worst case, you waste 10 spins. That's basically 10 compares. 
> That's nothing compared to a user/kernelspace switch, which is often 10 times 
> more.

Well, but locking a mutex on Linux is implemented on top of futexes and does 
not require a user/kernelspace switch unless the lock is already held (in which 
case a spinlock requires a switch too). ;-)

So, basically on Linux the standard mutexes are already optimized and perform 
not as fast but almost as fast as spinlocks in the uncontended case, but 
without the problems of spinlocks in the contented case (my benchmarks confirm 
that; there is nothing like the 10x difference you mention in the uncontented 
case). :-)

Maybe you benchmarked or used other platforms in the past; and you may have a 
very good point there.  If objc_mutex_lock() and objc_mutex_unlock() actually
do
always perform a system call each on some systems, the mutex-protected accessor 
could be so much slower (100x ?) than the spinlock-protected accessor (in the 
non-contented case) that it may make sense to multiply the number of accessor 
locks (say, to 64) to reduce the chance of contention and then use spinlocks 
there. :-)

On the other hand, mutexes are easy to port, have been ported and are known
to work well out of the box, so in terms of maintenance of other platforms I 
wouldn't mind sticking with them for all the other, less-used platforms too.  
They may not be fast, but at least they always work. ;-)

It would still be good to try a worst-case benchmark of spinlocks in the highly 
contended case.  I am assuming the performance would be really really bad, but
then I may just be wrong. ;-)

Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug libobjc/47031] libobjc uses mutexes for properties
  2010-12-21 11:42 [Bug libobjc/47031] New: libobjc uses mutexes for properties js-gcc at webkeks dot org
                   ` (6 preceding siblings ...)
  2011-01-08 13:43 ` nicola at gcc dot gnu.org
@ 2011-01-08 16:34 ` js-gcc at webkeks dot org
  7 siblings, 0 replies; 9+ messages in thread
From: js-gcc at webkeks dot org @ 2011-01-08 16:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47031

--- Comment #8 from js-gcc at webkeks dot org <js-gcc at webkeks dot org> 2011-01-08 16:14:28 UTC ---
Yeah, but Linux is just one of the many OSes supported by GCC. And I don't know
of any other OS that uses futexes fors pthread mutexes.

> It would still be good to try a worst-case benchmark of spinlocks in the highly 
> contended case.  I am assuming the performance would be really really bad, but
> then I may just be wrong. ;-)

As I said: I doubt it, as it's only 10 spins and then the control is given to
another thread.

Benchmarking Mutexes, Futexes and Spinlocks in the highligh contended case
would be interesting. I guess all three are almost equal in this case, but
differ a lot in the less contended case.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2011-01-08 16:14 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-21 11:42 [Bug libobjc/47031] New: libobjc uses mutexes for properties js-gcc at webkeks dot org
2010-12-21 11:47 ` [Bug libobjc/47031] " nicola at gcc dot gnu.org
2010-12-29 16:11 ` nicola at gcc dot gnu.org
2011-01-01 12:07 ` js-gcc at webkeks dot org
2011-01-07 18:11 ` nicola at gcc dot gnu.org
2011-01-07 18:30 ` nicola at gcc dot gnu.org
2011-01-07 18:44 ` js-gcc at webkeks dot org
2011-01-08 13:43 ` nicola at gcc dot gnu.org
2011-01-08 16:34 ` js-gcc at webkeks dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).