From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 26621 invoked by alias); 7 Jan 2011 17:56:13 -0000 Received: (qmail 26606 invoked by uid 22791); 7 Jan 2011 17:56:11 -0000 X-SWARE-Spam-Status: No, hits=-2.8 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00,TW_BJ X-Spam-Check-By: sourceware.org Received: from localhost (HELO gcc.gnu.org) (127.0.0.1) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 07 Jan 2011 17:56:05 +0000 From: "nicola at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug libobjc/47031] libobjc uses mutexes for properties X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: libobjc X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: nicola at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Date: Fri, 07 Jan 2011 18:11:00 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2011-01/txt/msg00628.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47031 --- Comment #4 from Nicola Pero 2011-01-07 17:56:01 UTC --- Jonathan thanks for your comments - they are very useful and we certainly want to look at performance. ;-) I'm not terribly convinced by using spinlocks in this context, but I'm happy to be convinced otherwise. :-) --- To understand what the order of magnitude is, I created a short program that would loop and run 100m synthesized atomic, id setters. I used GCC 4.6 to compile, and gnustep-make/gnustep-base from trunk: * with the standard libobjc (objc_mutex_lock/objc_mutex_unlock), it takes approximately 16.8 seconds to do the 100m setter calls * if I replace objc_mutex_lock with pthread_mutex_lock and objc_mutex_unlock with pthread_mutex_unlock, it takes approximately 15.3 seconds (this basically tests how fast we can go if we rework the objc_mutex... wrapper API) * if I comment out all the objc_mutex_lock/objc_mutex_unlock entirely, it takes approximately 8.7 seconds. This means that, assuming that spinlocks are infinitely faster than mutexes, in the best possible conditions they would speed up the accessors (or, at least, the setter, but the getter should be the same) by a maximum of about 2x. That's a great speedup by the way, but it's important to keep in mind that the maximum speedup we can theoretically ever get for accessors is a 2x, not a 10x or 100x. ;-) --- Anyway, the key problem is that spinlocks are more wasteful (and unpredictable ?) than mutexes if there is contention and if the ratio of active threads vs. active cores is not favourable. Inside a kernel, when working on a specific part, it is easy to assess these factors. But libobjc is providing a general-purpose, portable user-space facility for having atomic, synchronized access to an unspecified property of an ObjC object. It is hard to see how you can guarantee anything about contention or ratio of active threads vs. active cores in that context. :-( I guess I'd feel a bit more confident if we were to experiment and benchmark the worst-case scenario of spinlocks ... ie, how bad can they really go compared to mutexes and how easy/hard it is for them to go bad in the case of accessors. Anyway, I understand you think spinlocks will always be faster than mutexes in this case, because "My guess is that usually the spinlock is not held", but the problem is that that is a guess, and it depends on how the accessors are used. ;-) In particular -- * it matters how often you call the accessors; don't forget that in the getter a method call (-retain) is inside the critical zone. Even ignoring that the method call may trigger the execution of +initialize for the class (which would then execute an unbounded amount of user code into the critical zone, but which is fairly theoretical as the object has most likely been allocated using +alloc before getting a -retain, meaning +initialize should have already been done), a method call still constitutes about 25% of the time taken by the getter (ignoring locking overheads), so if your program spends 100% of the time calling a getter from two threads, any thread has up to a 25% chance of hitting the lock held by the other thread; * it matters how many threads you have. If each thread spends 10% of its time calling a bunch of getters with the same lock, then each spends 2.5% of its time inside the critical zone. If you have 2 threads, you have 2.5% of chance that your thread will try to enter the critical zone while the other thread is inside it. But if you have 100 threads, the chances will be much much higher! (having 100s of threads is pretty common in server software where you are processing large numbers of indipendent requests simultaneously; they should be lightly coupled, so there is little synchronization to do; but in the case of synthesized property accessors, unfortunately, the locks are shared by completely unrelated objects, so even if the 100 threads are completely unrelated, as soon as they use atomic synthesized accessors, they will be using and competing for the same locks, exactly as if they were constantly synchronizing access to some shared resources! Are spinlocks still appropriate in this context ?) * it matters how many active cores you have. If a thread tries to enter the critical zone while another thread is holding the lock, then with mutexes it doesn't matter if the other thread is active or suspended - there will be a context switch in both cases; but with spinlocks it does. If the other thread is running on a different, active core, the first thread should waste a bit of CPU and then enter it OK. But if the second thread is running on the same core, it will never make any progress during the spinning, and the first thread will waste CPU time before being suspended - in this case it will be slower and more CPU-wasteful than a mutex. * I also wonder, with all the CPUs that GCC support, if the GCC atomic builtins are efficient on all platforms; if on a platform the builtins require function calls they may actually make the spinlocks less attractive compared to mutexes (we can obviously work around this by using spinlocks on the two or three major platforms where we know they work well, and using mutexes on the other ones). Maybe the right way forward would be to experiment with some worst-case benchmarks for spinlocks vs mutexes. Spinlocks can speed up accessors by up to 2x in the right conditions. How much can they speed them down in the wrong conditions ? Eg, if you start up 100 threads on a 1 core machine, and have them all access a getter concurrently for 1m times, how much slower would spinlocks cause the program to execute ? I would have no problem changing my mind and supporting the use of spinlocks if there is some evidence that they can't go *too* wrong ;-) At the moment, I feel the performance trade-offs are unclear and we'd be using spinlocks only because Apple is using them in their runtime. Thanks