From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1023 invoked by alias); 23 Apr 2010 14:17:58 -0000 Received: (qmail 972 invoked by uid 48); 23 Apr 2010 14:17:43 -0000 Date: Fri, 23 Apr 2010 14:17:00 -0000 Message-ID: <20100423141743.971.qmail@sourceware.org> X-Bugzilla-Reason: CC References: Subject: [Bug libgomp/43706] scheduling two threads on one core leads to starvation In-Reply-To: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "singler at kit dot edu" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2010-04/txt/msg02479.txt.bz2 ------- Comment #13 from singler at kit dot edu 2010-04-23 14:17 ------- The default spin count is not 2,000,000 cycles, but even 20,000,000. As commented in libgomp/env.c, this is supposed to correspond to 200ms. The timings we see here are even larger, but the number of cycles is just a rough estimation. Throttling the spincount in the awareness of too many threads is a good idea, but it is just a heuristic. If there are other processes, the cores might be loaded anyway, and libgomp has little chances to figure that out. This gets even more difficult when having multiple programs using libgomp at the same time. So I would like the non-throttling value to be chosen more conservative, better balancing worst case behavior in difficult situations and best case behavior on an unloaded machine. There are algorithms in libstdc++ parallel mode that show speedups for as little as less than 1ms of sequential running time (when taking threads from the pool), so users will accept a parallelization overhead for such small computing times. However, if they are then hit by a 200ms penalty, this results in catastrophic slowdowns. Calling such short-lived parallel regions several times will make this very noticeable, although it need not be. So IMHO, by default, the spinning should take about as long as rescheduling a thread takes (that was already migrated on another core), by that making things at most twice as bad as in the best case. >>From my experience, this is a matter of a few milliseconds, so I propose to lower the default spincount to something like 10,000, at most 100,000. I think that spinning for even longer than a usual time slice like now is questionable anyway. Are nested threads taken into account when deciding on whether to throttle or not? -- singler at kit dot edu changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |UNCONFIRMED Resolution|FIXED | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706