[Bug libgomp/43706] New: scheduling two threads on one core leads to starvation

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug libgomp/43706]  New: scheduling two threads on one core leads to starvation
@ 2010-04-09 16:22 baeuml at kit dot edu
  2010-04-09 16:22 ` [Bug libgomp/43706] " baeuml at kit dot edu
                   ` (23 more replies)
  0 siblings, 24 replies; 32+ messages in thread
From: baeuml at kit dot edu @ 2010-04-09 16:22 UTC (permalink / raw)
  To: gcc-bugs

The following code results in starvation if at least two OpenMP threads are
assigned to one core.

#include <cstdio>

int main()
{
    while (true) {
        #pragma omp parallel for
        for (int ii = 0; ii < 1000; ++ii) {
            int s = ii;
        }
        printf(".");
    }

    return 0;
}

Compiled with
> g++ -fopenmp -save-temps -o test test.cpp

and ran with
> GOMP_CPU_AFFINITY="0 1 2" ./test

on a quad-core (Q9550) system results in veeery slow progress.


However,
> GOMP_CPU_AFFINITY="0 1 2 3" ./test
runs as expected.

This happens also, if cpu affinity is not explicitly given, but some of the
cores are busy with other processes. In this case it also helps to explicitly
assign each thread to one core with GOMP_CPU_AFFINITY="0 1 2 3".

> gcc -v
Using built-in specs.
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info
--mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64
--enable-languages=c,c++,objc,fortran,obj-c++,java,ada
--enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.4
--enable-ssp --disable-libssp --with-bugurl=http://bugs.opensuse.org/
--with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap
--with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit
--enable-libstdcxx-allocator=new --disable-libstdcxx-pch
--enable-version-specific-runtime-libs --program-suffix=-4.4
--enable-linux-futex --without-system-libunwind --with-arch-32=i586
--with-tune=generic --build=x86_64-suse-linux
Thread model: posix
gcc version 4.4.1 [gcc-4_4-branch revision 150839] (SUSE Linux)


-- 
           Summary: scheduling two threads on one core leads to starvation
           Product: gcc
           Version: 4.4.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgomp
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: baeuml at kit dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
@ 2010-04-09 16:22 ` baeuml at kit dot edu
  2010-04-09 18:34 ` pinskia at gcc dot gnu dot org
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: baeuml at kit dot edu @ 2010-04-09 16:22 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #1 from baeuml at kit dot edu  2010-04-09 16:22 -------
Created an attachment (id=20348)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20348&action=view)
output of -save-temps


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
  2010-04-09 16:22 ` [Bug libgomp/43706] " baeuml at kit dot edu
@ 2010-04-09 18:34 ` pinskia at gcc dot gnu dot org
  2010-04-09 20:55 ` baeuml at kit dot edu
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: pinskia at gcc dot gnu dot org @ 2010-04-09 18:34 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #2 from pinskia at gcc dot gnu dot org  2010-04-09 18:33 -------
Have you done a profile (using oprofile) to see why this happens. Really I
think GOMP_CPU_AFFINITY should not be used that much as it will cause
starvation no matter what.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
  2010-04-09 16:22 ` [Bug libgomp/43706] " baeuml at kit dot edu
  2010-04-09 18:34 ` pinskia at gcc dot gnu dot org
@ 2010-04-09 20:55 ` baeuml at kit dot edu
  2010-04-09 22:11 ` mika dot fischer at kit dot edu
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: baeuml at kit dot edu @ 2010-04-09 20:55 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #3 from baeuml at kit dot edu  2010-04-09 20:55 -------
> Have you done a profile (using oprofile) to see why this happens.

I've oprofile'd the original program from which this is a stripped down minimal
example.  I did not see anything unusual, but I'm certainly no expert with
oprofile.

Gdb however shows that all 4 threads are waiting in do_wait()
(config/linux/wait.h)

> Really I think GOMP_CPU_AFFINITY should not be used that much as it will cause
> starvation no matter what.

As I said, this happens also without GOMP_CPU_AFFINITY when there is enough
load on the cores.  By using GOMP_CPU_AFFINITY I'm just able to reproduce the
issue reliably with the given minimal example without needing to put load on
the cores.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (2 preceding siblings ...)
  2010-04-09 20:55 ` baeuml at kit dot edu
@ 2010-04-09 22:11 ` mika dot fischer at kit dot edu
  2010-04-20 10:23 ` jakub at gcc dot gnu dot org
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: mika dot fischer at kit dot edu @ 2010-04-09 22:11 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #4 from mika dot fischer at kit dot edu  2010-04-09 22:10 -------
I'm Martin's coworker and want to add some additional points.

Just be be clear, this is not an exotic toy example, it is causing us real
problems with production code. Martin just stripped it down so it can be easily
reproduced.

The same goes for GOMP_CPU_AFFINITY. Of course we don't use it. It just makes
it very easy to reproduce the bug. The real problem we're having is being
caused by other threads than OpenMP threads causing load on some cores. You can
also easily test this by occupying one of the cores using some computation and
running the test program without GOMP_CPU_AFFINITY.

I also don't quite get why GOMP_CPU_AFFINITY should cause starvation. Maybe you
could elaborate. It's clear that it will cause additional scheduling overhead
and one thread might have to wait for the other to finish, but I would not call
this starvation.

Finally, and most importantly, the workaround we're currently employing is to
use libgomp.so.1 (via LD_LIBRARY_PATH) from OpenSuSE 11.0. With this version of
libgomp, the problem does not occur at all!

Here's the gcc -v output from the GCC release of OpenSuSE 11.0 from which we
took the working libgomp:
gcc -v
Using built-in specs.
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --with-local-prefix=/usr/local
--infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64
--libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada
--enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.3
--enable-ssp --disable-libssp --with-bugurl=http://bugs.opensuse.org/
--with-pkgversion='SUSE Linux' --disable-libgcj --with-slibdir=/lib64
--with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new
--disable-libstdcxx-pch --program-suffix=-4.3
--enable-version-specific-runtime-libs --enable-linux-futex
--without-system-libunwind --with-cpu=generic --build=x86_64-suse-linux
Thread model: posix
gcc version 4.3.1 20080507 (prerelease) [gcc-4_3-branch revision 135036] (SUSE
Linux)

-- 

mika dot fischer at kit dot edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mika dot fischer at kit dot
                   |                            |edu

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (3 preceding siblings ...)
  2010-04-09 22:11 ` mika dot fischer at kit dot edu
@ 2010-04-20 10:23 ` jakub at gcc dot gnu dot org
  2010-04-20 10:49 ` jakub at gcc dot gnu dot org
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-04-20 10:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #5 from jakub at gcc dot gnu dot org  2010-04-20 10:23 -------
For performance reasons libgomp uses some busy waiting, which of course works
well when there are available CPUs and cycles to burn (decreases latency a
lot), but if you have more threads than CPUs it can make things worse.
You can tweak this through OMP_WAIT_POLICY and GOMP_SPINCOUNT env vars.
Although the implementation recognizes two kinds of spin counts (normal and
throttled, the latter in use when number of threads is bigger than number of
available CPUs), in some cases even that default might be too large (the
default
for throttled spin count is 1000 spins for OMP_WAIT_POLICY=active and 100 spins
for no OMP_WAIT_POLICY in environment).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (4 preceding siblings ...)
  2010-04-20 10:23 ` jakub at gcc dot gnu dot org
@ 2010-04-20 10:49 ` jakub at gcc dot gnu dot org
  2010-04-20 12:23 ` mika dot fischer at kit dot edu
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-04-20 10:49 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #6 from jakub at gcc dot gnu dot org  2010-04-20 10:49 -------
Created an attachment (id=20441)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=20441&action=view)
gcc46-pr43706.patch

For GOMP_CPU_AFFINITY there was an issue that the number of available CPUs used
to decide whether the number of managed threads is bigger than available CPUs
didn't take into account GOMP_CPU_AFFINITY restriction.  The attached patch
does that.  That said, for cases where some CPU is available for the GOMP
program, yet is constantly busy doing other things (say higher priority), this
can't help and OMP_WAIT_POLICY=passive or GOMP_SPINCOUNT=1000 or some similar
small number is your only option.


-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |jakub at gcc dot gnu dot org
                   |dot org                     |
             Status|UNCONFIRMED                 |ASSIGNED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (5 preceding siblings ...)
  2010-04-20 10:49 ` jakub at gcc dot gnu dot org
@ 2010-04-20 12:23 ` mika dot fischer at kit dot edu
  2010-04-20 15:38 ` jakub at gcc dot gnu dot org
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: mika dot fischer at kit dot edu @ 2010-04-20 12:23 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #7 from mika dot fischer at kit dot edu  2010-04-20 12:23 -------
> For performance reasons libgomp uses some busy waiting, which of course works
> well when there are available CPUs and cycles to burn (decreases latency a
> lot), but if you have more threads than CPUs it can make things worse.
> You can tweak this through OMP_WAIT_POLICY and GOMP_SPINCOUNT env vars.

This is definitely the reason for the behavior we're seeing. When we set
OMP_WAIT_POLICY=passive, the test program runs through normally. Without it
it takes very very long.

Here are some
measurements when "while (true)" is replaced by "for (int j=0; j<1000; ++j)":

All cores idle:
===============
$ /usr/bin/time ./openmp-bug
3.21user 0.00system 0:00.81elapsed 391%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+331minor)pagefaults 0swaps

$ OMP_WAIT_POLICY=passive /usr/bin/time ./openmp-bug
2.75user 0.05system 0:01.42elapsed 196%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+335minor)pagefaults 0swaps

1 (out of 4) cores occupied:
============================
$ /usr/bin/time ./openmp-bug
133.65user 0.02system 0:45.30elapsed 295%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+330minor)pagefaults 0swaps

$ OMP_WAIT_POLICY=passive /usr/bin/time ./openmp-bug
2.67user 0.00system 0:02.35elapsed 113%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+335minor)pagefaults 0swaps

$ GOMP_SPINCOUNT=10 /usr/bin/time ./openmp-bug
2.91user 0.03system 0:01.73elapsed 169%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+336minor)pagefaults 0swaps

$ GOMP_SPINCOUNT=100 /usr/bin/time ./openmp-bug
2.77user 0.03system 0:01.90elapsed 147%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+336minor)pagefaults 0swaps

$ GOMP_SPINCOUNT=1000 /usr/bin/time ./openmp-bug
2.87user 0.00system 0:01.70elapsed 168%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+336minor)pagefaults 0swaps

$ GOMP_SPINCOUNT=10000 /usr/bin/time ./openmp-bug
3.05user 0.06system 0:01.85elapsed 167%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+337minor)pagefaults 0swaps

$ GOMP_SPINCOUNT=100000 /usr/bin/time ./openmp-bug
5.25user 0.03system 0:03.10elapsed 170%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+335minor)pagefaults 0swaps

$ GOMP_SPINCOUNT=1000000 /usr/bin/time ./openmp-bug
28.84user 0.00system 0:14.13elapsed 203%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+336minor)pagefaults 0swaps

[I ran all of these several times and took the runtime around the average]

> Although the implementation recognizes two kinds of spin counts (normal and
> throttled, the latter in use when number of threads is bigger than number of
> available CPUs), in some cases even that default might be too large (the
> default for throttled spin count is 1000 spins for OMP_WAIT_POLICY=active and
> 100 spins for no OMP_WAIT_POLICY in environment).

As the numbers show, a default spin count of 1000 would be fine. The problem is
however, that OpenMP assumes that it has all the cores of the CPU for itself.
The throttled spin count is only used if the number of OpenMP threads is larger
than the number of cores in the system (AFAICT). This will almost never happen
(AFAICT only if you set OMP_NUM_THREADS to something larger than the number of
cores).

Since it seems clear that the number of spin counts should be smaller when the
CPU cores are active, the throttled spin count must be used when the cores are
actually used at the moment the thread starts waiting. That the number of
OpenMP
threads running at that time is smaller than the number of cores is not a
sufficient condition. If it's not possible to determine this or if it's too
time-consuming, then maybe the non-throttled default spin count can be reduced
to 1000
or so.

So thanks for the workaround! But I still think the default behavior can easily
cause very significant slowdowns and thus should be reconsidered.

Finally, I still don't get why the spinlocking has these effects on the
runtime.
I would expect even 2000000 spin lock cycles to be over very quickly and not a
20-fold increase in the total runtime of the program. Just out of curiosity
maybe you can explain why this happens.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (6 preceding siblings ...)
  2010-04-20 12:23 ` mika dot fischer at kit dot edu
@ 2010-04-20 15:38 ` jakub at gcc dot gnu dot org
  2010-04-21 14:01 ` jakub at gcc dot gnu dot org
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-04-20 15:38 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #8 from jakub at gcc dot gnu dot org  2010-04-20 15:38 -------
Subject: Bug 43706

Author: jakub
Date: Tue Apr 20 15:37:51 2010
New Revision: 158565

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158565
Log:
        PR libgomp/43706
        * config/linux/affinity.c (gomp_init_affinity): Decrease
        gomp_available_cpus if affinity mask confines the process to fewer
        CPUs.
        * config/linux/proc.c (get_num_procs): If gomp_cpu_affinity is
        non-NULL, just return gomp_available_cpus.

Modified:
    trunk/libgomp/ChangeLog
    trunk/libgomp/config/linux/affinity.c
    trunk/libgomp/config/linux/proc.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (7 preceding siblings ...)
  2010-04-20 15:38 ` jakub at gcc dot gnu dot org
@ 2010-04-21 14:01 ` jakub at gcc dot gnu dot org
  2010-04-21 14:01 ` jakub at gcc dot gnu dot org
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-04-21 14:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #9 from jakub at gcc dot gnu dot org  2010-04-21 14:00 -------
Subject: Bug 43706

Author: jakub
Date: Wed Apr 21 13:59:39 2010
New Revision: 158600

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158600
Log:
        PR libgomp/43706
        * config/linux/affinity.c (gomp_init_affinity): Decrease
        gomp_available_cpus if affinity mask confines the process to fewer
        CPUs.
        * config/linux/proc.c (get_num_procs): If gomp_cpu_affinity is
        non-NULL, just return gomp_available_cpus.

Modified:
    branches/gcc-4_5-branch/libgomp/ChangeLog
    branches/gcc-4_5-branch/libgomp/config/linux/affinity.c
    branches/gcc-4_5-branch/libgomp/config/linux/proc.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (8 preceding siblings ...)
  2010-04-21 14:01 ` jakub at gcc dot gnu dot org
@ 2010-04-21 14:01 ` jakub at gcc dot gnu dot org
  2010-04-21 14:06 ` jakub at gcc dot gnu dot org
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-04-21 14:01 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #10 from jakub at gcc dot gnu dot org  2010-04-21 14:01 -------
Subject: Bug 43706

Author: jakub
Date: Wed Apr 21 14:00:10 2010
New Revision: 158601

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=158601
Log:
        PR libgomp/43706
        * config/linux/affinity.c (gomp_init_affinity): Decrease
        gomp_available_cpus if affinity mask confines the process to fewer
        CPUs.
        * config/linux/proc.c (get_num_procs): If gomp_cpu_affinity is
        non-NULL, just return gomp_available_cpus.

Modified:
    branches/gcc-4_4-branch/libgomp/ChangeLog
    branches/gcc-4_4-branch/libgomp/config/linux/affinity.c
    branches/gcc-4_4-branch/libgomp/config/linux/proc.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (9 preceding siblings ...)
  2010-04-21 14:01 ` jakub at gcc dot gnu dot org
@ 2010-04-21 14:06 ` jakub at gcc dot gnu dot org
  2010-04-21 14:07 ` jakub at gcc dot gnu dot org
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-04-21 14:06 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #11 from jakub at gcc dot gnu dot org  2010-04-21 14:05 -------
GOMP_CPU_AFFINITY vs. throttling fixed for 4.4.4/4.5.1/4.6.0.


-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (10 preceding siblings ...)
  2010-04-21 14:06 ` jakub at gcc dot gnu dot org
@ 2010-04-21 14:07 ` jakub at gcc dot gnu dot org
  2010-04-21 14:23 ` mika dot fischer at kit dot edu
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-04-21 14:07 UTC (permalink / raw)
  To: gcc-bugs



-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |4.4.4


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (11 preceding siblings ...)
  2010-04-21 14:07 ` jakub at gcc dot gnu dot org
@ 2010-04-21 14:23 ` mika dot fischer at kit dot edu
  2010-04-23 14:17 ` singler at kit dot edu
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: mika dot fischer at kit dot edu @ 2010-04-21 14:23 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #12 from mika dot fischer at kit dot edu  2010-04-21 14:23 -------
Just to make it clear, this patch most probably does not fix the issue we're
having, since it can be triggered without using GOMP_CPU_AFFINITY.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (12 preceding siblings ...)
  2010-04-21 14:23 ` mika dot fischer at kit dot edu
@ 2010-04-23 14:17 ` singler at kit dot edu
  2010-04-30  8:53 ` jakub at gcc dot gnu dot org
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: singler at kit dot edu @ 2010-04-23 14:17 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #13 from singler at kit dot edu  2010-04-23 14:17 -------
The default spin count is not 2,000,000 cycles, but even 20,000,000.  As
commented in libgomp/env.c, this is supposed to correspond to 200ms.  The
timings we see here are even larger, but the number of cycles is just a rough
estimation.

Throttling the spincount in the awareness of too many threads is a good idea,
but it is just a heuristic.  If there are other processes, the cores might be
loaded anyway, and libgomp has little chances to figure that out.  This gets
even more difficult when having multiple programs using libgomp at the same
time.  So I would like the non-throttling value to be chosen more conservative,
better balancing worst case behavior in difficult situations and best case
behavior on an unloaded machine.

There are algorithms in libstdc++ parallel mode that show speedups for as
little as less than 1ms of sequential running time (when taking threads from
the pool), so users will accept a parallelization overhead for such small
computing times.  However, if they are then hit by a 200ms penalty, this
results in catastrophic slowdowns.  Calling such short-lived parallel regions
several times will make this very noticeable, although it need not be.  So
IMHO, by default, the spinning should take about as long as rescheduling a
thread takes (that was already migrated on another core), by that making things
at most twice as bad as in the best case.
>From my experience, this is a matter of a few milliseconds, so I propose to
lower the default spincount to something like 10,000, at most 100,000.  I think
that spinning for even longer than a usual time slice like now is questionable
anyway.

Are nested threads taken into account when deciding on whether to throttle or
not?


-- 

singler at kit dot edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |UNCONFIRMED
         Resolution|FIXED                       |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (13 preceding siblings ...)
  2010-04-23 14:17 ` singler at kit dot edu
@ 2010-04-30  8:53 ` jakub at gcc dot gnu dot org
  2010-07-02  1:39 ` solar-gcc at openwall dot com
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-04-30  8:53 UTC (permalink / raw)
  To: gcc-bugs



-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.4.4                       |4.4.5


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (14 preceding siblings ...)
  2010-04-30  8:53 ` jakub at gcc dot gnu dot org
@ 2010-07-02  1:39 ` solar-gcc at openwall dot com
  2010-07-30 14:00 ` johnfb at mail dot utexas dot edu
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: solar-gcc at openwall dot com @ 2010-07-02  1:39 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #14 from solar-gcc at openwall dot com  2010-07-02 01:39 -------
We're also seeing this problem on OpenMP-using code built with gcc 4.5.0
release on linux-x86_64.  Here's a user's report (400x slowdown on an 8-core
system when there's a single other process running on a CPU):

http://www.openwall.com/lists/john-users/2010/06/30/3

Here's my confirmation of the problem report (I easily reproduced similar
slowdowns), and workarounds:

http://www.openwall.com/lists/john-users/2010/06/30/6

GOMP_SPINCOUNT=10000 (this specific value) turned out to be nearly optimal in
cases affected by this problem, as well as on idle systems, although I was also
able to identify cases (with server-like unrelated load: short requests to many
processes, which quickly go back to sleep) where this setting lowered the
measured best-case speed by 15% (over multiple benchmark invocations), even
though it might have improved the average speed even in those cases.

All of this is reproducible with John the Ripper 1.7.6 release on Blowfish
hashes ("john --test --format=bf") and with the -omp-des patch (current
revision is 1.7.6-omp-des-4) on DES-based crypt(3) hashes ("john --test
--format=des").  The use of OpenMP needs to be enabled by uncommenting the
OMPFLAGS line in the Makefile.  JtR and the patch can be downloaded from:

http://www.openwall.com/john/
http://openwall.info/wiki/john/patches

To reproduce the problem, it is sufficient to have one other CPU-using process
running when invoking the John benchmark.  I was using a non-OpenMP build of
John itself as that other process.

Overall, besides this specific "bug", OpenMP-using programs are very sensitive
to other system load - e.g., unrelated server-like load of 10% often slows an
OpenMP program down by 50%.  Any improvements in this area would be very
welcome.  However, this specific "bug" is extreme, with its 400x slowdowns, so
perhaps it is to be treated with priority.

Jakub - thank you for your work on gcc's OpenMP support.  The ease of use is
great!

-- 

solar-gcc at openwall dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |solar-gcc at openwall dot
                   |                            |com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (15 preceding siblings ...)
  2010-07-02  1:39 ` solar-gcc at openwall dot com
@ 2010-07-30 14:00 ` johnfb at mail dot utexas dot edu
  2010-08-13 15:48 ` singler at kit dot edu
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: johnfb at mail dot utexas dot edu @ 2010-07-30 14:00 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #15 from johnfb at mail dot utexas dot edu  2010-07-30 14:00 -------
We have also had some trouble with this issue. We found that in general if we
where running on a machine with hardware threads (i.e., Intel's
Hyper-Threading) then performance was poor. Most of our runs where on a machine
with a Intel Xeon L5530 running RHEL 5. Profiling showed that our program was
spending 50% of its time inside libgomp. Setting either GOMP_SPINCOUNT or
OMP_WAIT_POLICY as discussed in this thread increased performance greatly.
Experiments with disabling and enabling cores with default OMP settings showed
that when the Hyper-Thread cores come on performance dipped below what we got
when we had only one core enabled on some runs.

A little thought as to how hardware threads are implemented makes it obvious
why spinning for more than a few cycles will cause performance problems. If one
hardware threads spins then all other threads on that core may be starved as
resources are shared between cores.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (16 preceding siblings ...)
  2010-07-30 14:00 ` johnfb at mail dot utexas dot edu
@ 2010-08-13 15:48 ` singler at kit dot edu
  2010-08-24 11:07 ` solar-gcc at openwall dot com
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: singler at kit dot edu @ 2010-08-13 15:48 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #16 from singler at kit dot edu  2010-08-13 15:48 -------
I would really like to see this bug tackled.  It has been confirmed two more
times. 

Fixing it is easily done by lowering the spin count as proposed.  Otherwise,
please show cases where a low spin count hurts performance.

In general, for a tuning parameter, a good-natured rather value should be
preferred over a value that gives best results in one case, but very bad ones
in another case.


-- 

singler at kit dot edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever Confirmed|0                           |1
   Last reconfirmed|0000-00-00 00:00:00         |2010-08-13 15:48:18
               date|                            |
   Target Milestone|4.4.5                       |4.5.2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (17 preceding siblings ...)
  2010-08-13 15:48 ` singler at kit dot edu
@ 2010-08-24 11:07 ` solar-gcc at openwall dot com
  2010-08-24 11:41 ` jakub at gcc dot gnu dot org
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: solar-gcc at openwall dot com @ 2010-08-24 11:07 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #17 from solar-gcc at openwall dot com  2010-08-24 11:07 -------
(In reply to comment #16)
> I would really like to see this bug tackled.

I second that.

> Fixing it is easily done by lowering the spin count as proposed.  Otherwise,
> please show cases where a low spin count hurts performance.

Unfortunately, yes, I've since identified real-world test cases where
GOMP_SPINCOUNT=10000 hurts performance significantly (compared to gcc 4.5.0's
default).  Specifically, this was the case when I experimented with my John the
Ripper patches on a dual-X5550 system (16 logical CPUs).  On a few
real-world'ish runs, GOMP_SPINCOUNT=10000 would halve the speed.  On most other
tests I ran, it would slow things down by about 10%.  That's on an otherwise
idle system.  I was surprised as I previously only saw GOMP_SPINCOUNT=10000
hurt performance on systems with server-like unrelated load (and it would help
tremendously with certain other kinds of load).

> In general, for a tuning parameter, a good-natured rather value should be
> preferred over a value that gives best results in one case, but very bad ones
> in another case.

In general, I agree.  Even the 50% worse-case slowdown I observed with
GOMP_SPINCOUNT=10000 is not as bad as the 400x worst-case slowdown observed
without that option.  On the other hand, a 50% slowdown would be fatal as it
relates to comparison of libgomp vs. competing implementations.  Also, HPC
cluster nodes may well be allocated such that there's no other load on each
individual node.  So having the defaults tuned for a system with no other load
makes some sense to me, and I am really unsure whether simply changing the
defaults is the proper fix here.

I'd be happy to see this problem fixed differently, such that the unacceptable
slowdowns are avoided in "both" cases.  Maybe the new default could be to
auto-tune the setting while the program is running?

Meanwhile, if it's going to take a long time until we have a code fix, perhaps
the problem and the workaround need to be documented prominently.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (18 preceding siblings ...)
  2010-08-24 11:07 ` solar-gcc at openwall dot com
@ 2010-08-24 11:41 ` jakub at gcc dot gnu dot org
  2010-08-24 12:18 ` solar-gcc at openwall dot com
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-08-24 11:41 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #18 from jakub at gcc dot gnu dot org  2010-08-24 11:40 -------
For the auto-tuning, ideally the kernel would tell the thread when it lost CPU,
I doubt there is any API for that currently.  E.g. if a thread could register
with kernel address where the kernel would store some value (e.g. zero) or do
atomic increment or decrement upon taking away CPU from the thread.
Then, at the start of the spinning libgomp could initialize that flag and check
it from time to time (say every few hundred or thousand iterations) whether it
has lost the CPU.  In the lost CPU case it would continue just with a couple of
spins and then go to sleep, and ensure the spincount will get dynamically
adjusted next time or something similar.

Currently I'm afraid the only way to dynamically adjust is from time to time
read/parse /proc/loadavg and if it went above number of available CPUs or
something similar, start throttling down.


-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |drepper at redhat dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (19 preceding siblings ...)
  2010-08-24 11:41 ` jakub at gcc dot gnu dot org
@ 2010-08-24 12:18 ` solar-gcc at openwall dot com
  2010-08-30  8:41 ` singler at kit dot edu
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 32+ messages in thread
From: solar-gcc at openwall dot com @ 2010-08-24 12:18 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #19 from solar-gcc at openwall dot com  2010-08-24 12:18 -------
(In reply to comment #18)
> Then, at the start of the spinning libgomp could initialize that flag and check
> it from time to time (say every few hundred or thousand iterations) whether it
> has lost the CPU.

Without a kernel API like that, you can achieve a similar effect by issuing the
rdtsc instruction (or its equivalents for non-x86 archs) and seeing if the
cycle counter changes unexpectedly (say, by 1000 or more for a single loop
iteration), which would indicate that there was a context switch.  For an
arch-independent implementation, you could also use a syscall such as times(2)
or gettimeofday(2), but then you'd need to do it very infrequently (e.g., maybe
just to see if there's a context switch between 10k to 20k spins).

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (20 preceding siblings ...)
  2010-08-24 12:18 ` solar-gcc at openwall dot com
@ 2010-08-30  8:41 ` singler at kit dot edu
  2010-09-01 16:38 ` jakub at gcc dot gnu dot org
  2010-09-05 11:37 ` solar-gcc at openwall dot com
  23 siblings, 0 replies; 32+ messages in thread
From: singler at kit dot edu @ 2010-08-30  8:41 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #20 from singler at kit dot edu  2010-08-30 08:41 -------
Maybe we could agree on a compromise for a start.  Alexander, what are the
corresponding results for GOMP_SPINCOUNT=100000?


-- 

singler at kit dot edu changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |singler at kit dot edu


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (21 preceding siblings ...)
  2010-08-30  8:41 ` singler at kit dot edu
@ 2010-09-01 16:38 ` jakub at gcc dot gnu dot org
  2010-09-05 11:37 ` solar-gcc at openwall dot com
  23 siblings, 0 replies; 32+ messages in thread
From: jakub at gcc dot gnu dot org @ 2010-09-01 16:38 UTC (permalink / raw)
  To: gcc-bugs



------- Comment #21 from jakub at gcc dot gnu dot org  2010-09-01 16:38 -------
*** Bug 45485 has been marked as a duplicate of this bug. ***


-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |h dot vogt at gom dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
  2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
                   ` (22 preceding siblings ...)
  2010-09-01 16:38 ` jakub at gcc dot gnu dot org
@ 2010-09-05 11:37 ` solar-gcc at openwall dot com
  23 siblings, 0 replies; 32+ messages in thread
From: solar-gcc at openwall dot com @ 2010-09-05 11:37 UTC (permalink / raw)
  To: gcc-bugs

------- Comment #22 from solar-gcc at openwall dot com  2010-09-05 11:37 -------
(In reply to comment #20)
> Maybe we could agree on a compromise for a start.  Alexander, what are the
> corresponding results for GOMP_SPINCOUNT=100000?

Unfortunately, I no longer have access to the dual-X5550 system, and I did not
try other values for this parameter when I was benchmarking that system.  On
systems that I do currently have access to, the slowdown from
GOMP_SPINCOUNT=10000 was typically no more than 10% (and most of the time there
was either no effect or substantial speedup).  I can try 100000 on those,
although it'd be difficult to tell the difference from 10000 because of the
changing load.  I'll plan on doing this next time I run this sort of
benchmarks.

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
       [not found] <bug-43706-4@http.gcc.gnu.org/bugzilla/>
                   ` (5 preceding siblings ...)
  2010-12-16 13:03 ` rguenth at gcc dot gnu.org
@ 2012-01-12 20:34 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 32+ messages in thread
From: pinskia at gcc dot gnu.org @ 2012-01-12 20:34 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED
   Target Milestone|4.5.3                       |4.6.0

--- Comment #29 from Andrew Pinski <pinskia at gcc dot gnu.org> 2012-01-12 20:32:38 UTC ---
Fixed.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
       [not found] <bug-43706-4@http.gcc.gnu.org/bugzilla/>
                   ` (4 preceding siblings ...)
  2010-12-02 14:31 ` jakub at gcc dot gnu.org
@ 2010-12-16 13:03 ` rguenth at gcc dot gnu.org
  2012-01-12 20:34 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 32+ messages in thread
From: rguenth at gcc dot gnu.org @ 2010-12-16 13:03 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706

Richard Guenther <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|4.5.2                       |4.5.3

--- Comment #28 from Richard Guenther <rguenth at gcc dot gnu.org> 2010-12-16 13:02:46 UTC ---
GCC 4.5.2 is being released, adjusting target milestone.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
       [not found] <bug-43706-4@http.gcc.gnu.org/bugzilla/>
                   ` (3 preceding siblings ...)
  2010-11-15  9:14 ` singler at kit dot edu
@ 2010-12-02 14:31 ` jakub at gcc dot gnu.org
  2010-12-16 13:03 ` rguenth at gcc dot gnu.org
  2012-01-12 20:34 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 32+ messages in thread
From: jakub at gcc dot gnu.org @ 2010-12-02 14:31 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706

--- Comment #27 from Jakub Jelinek <jakub at gcc dot gnu.org> 2010-12-02 14:31:31 UTC ---
Author: jakub
Date: Thu Dec  2 14:31:27 2010
New Revision: 167371

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=167371
Log:
    PR libgomp/43706
    * env.c (initialize_env): Default to spin count 300000
    instead of 20000000 if neither OMP_WAIT_POLICY nor GOMP_SPINCOUNT
    is specified.

Modified:
    trunk/libgomp/ChangeLog
    trunk/libgomp/env.c


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
       [not found] <bug-43706-4@http.gcc.gnu.org/bugzilla/>
                   ` (2 preceding siblings ...)
  2010-11-12 11:44 ` solar-gcc at openwall dot com
@ 2010-11-15  9:14 ` singler at kit dot edu
  2010-12-02 14:31 ` jakub at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: singler at kit dot edu @ 2010-11-15  9:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706

--- Comment #26 from Johannes Singler <singler at kit dot edu> 2010-11-15 08:53:12 UTC ---
(In reply to comment #25)
> You might have misread what I wrote.  I did not mention "35 tests"; I 
> mentioned
> that a test became slower by 35%.  The total number of different tests was 4
> (and each was invoked multiple times per spincount setting, indeed).  One out
> of four stayed 35% slower until I increased GOMP_SPINCOUNT to 200000.

Sorry, I got that wrong.  

> This makes some sense, but the job of an optimizing compiler and runtime
> libraries is to deliver the best performance they can even with somewhat
> non-optimal source code.  

I agree with that in principle.  But please be reminded that as is, there is
the very simple testcase posted, which takes a serious performance hit.  And
repeated parallel loops like the one in the test program certainly appear very
often in real applications.
BTW:  How does the testcase react to this change on your machine?

> There are plenty of real-world cases where spending
> time on application redesign for speed is unreasonable or can only be 
> completed
> at a later time - yet it is desirable to squeeze a little bit of extra
> performance out of the existing code.  There are also cases where more
> efficient parallelization - implemented at a higher level to avoid frequent
> switches between parallel and sequential execution - makes the application
> harder to use.  To me, one of the very reasons to use OpenMP was to
> avoid/postpone that redesign and the user-visible complication for now.  If I
> went for a more efficient higher-level solution, I would not need OpenMP in 
> the
> first place.

OpenMP should not be regarded as "only good for loop parallelization".  With
the new task construct, it is a fully-fledged parallelization substrate.

> > So I would suggest a threshold of 100000 for now.
> 
> My suggestion is 250000.

Well, that's already much better than staying with 20,000,000, so I agree.

> > IMHO, something should really happen to this problem before the 4.6 release.
> 
> Agreed.  It'd be best to have a code fix, though.

IMHO, there is no obvious way to fix this in principle.  There will always be a
compromise between busy waiting and giving back control to the OS.

Jakub, what do you plan to do about this problem?


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
       [not found] <bug-43706-4@http.gcc.gnu.org/bugzilla/>
  2010-11-09 16:33 ` solar-gcc at openwall dot com
  2010-11-12  8:21 ` singler at kit dot edu
@ 2010-11-12 11:44 ` solar-gcc at openwall dot com
  2010-11-15  9:14 ` singler at kit dot edu
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: solar-gcc at openwall dot com @ 2010-11-12 11:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706

--- Comment #25 from Alexander Peslyak <solar-gcc at openwall dot com> 2010-11-12 11:19:13 UTC ---
(In reply to comment #24)
> If only one out of 35 tests becomes slower,

You might have misread what I wrote.  I did not mention "35 tests"; I mentioned
that a test became slower by 35%.  The total number of different tests was 4
(and each was invoked multiple times per spincount setting, indeed).  One out
of four stayed 35% slower until I increased GOMP_SPINCOUNT to 200000.

> I would rather blame it to this one (probably badly parallelized) application, not the OpenMP runtime system.

This makes some sense, but the job of an optimizing compiler and runtime
libraries is to deliver the best performance they can even with somewhat
non-optimal source code.  There are plenty of real-world cases where spending
time on application redesign for speed is unreasonable or can only be completed
at a later time - yet it is desirable to squeeze a little bit of extra
performance out of the existing code.  There are also cases where more
efficient parallelization - implemented at a higher level to avoid frequent
switches between parallel and sequential execution - makes the application
harder to use.  To me, one of the very reasons to use OpenMP was to
avoid/postpone that redesign and the user-visible complication for now.  If I
went for a more efficient higher-level solution, I would not need OpenMP in the
first place.

> So I would suggest a threshold of 100000 for now.

My suggestion is 250000.

> IMHO, something should really happen to this problem before the 4.6 release.

Agreed.  It'd be best to have a code fix, though.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
       [not found] <bug-43706-4@http.gcc.gnu.org/bugzilla/>
  2010-11-09 16:33 ` solar-gcc at openwall dot com
@ 2010-11-12  8:21 ` singler at kit dot edu
  2010-11-12 11:44 ` solar-gcc at openwall dot com
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: singler at kit dot edu @ 2010-11-12  8:21 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706

--- Comment #24 from Johannes Singler <singler at kit dot edu> 2010-11-12 08:15:56 UTC ---
If only one out of 35 tests becomes slower, I would rather blame it to this one
(probably badly parallelized) application, not the OpenMP runtime system.  So I
would suggest a threshold of 100000 for now.  IMHO, something should really
happen to this problem before the 4.6 release.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Bug libgomp/43706] scheduling two threads on one core leads to starvation
       [not found] <bug-43706-4@http.gcc.gnu.org/bugzilla/>
@ 2010-11-09 16:33 ` solar-gcc at openwall dot com
  2010-11-12  8:21 ` singler at kit dot edu
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 32+ messages in thread
From: solar-gcc at openwall dot com @ 2010-11-09 16:33 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43706

--- Comment #23 from Alexander Peslyak <solar-gcc at openwall dot com> 2010-11-09 16:32:53 UTC ---
(In reply to comment #20)
> Maybe we could agree on a compromise for a start.  Alexander, what are the
> corresponding results for GOMP_SPINCOUNT=100000?

I reproduced slowdown of 5% to 35% (on different pieces of code) on an
otherwise-idle dual-E5520 system (16 logical CPUs) when going from gcc 4.5.0's
defaults to GOMP_SPINCOUNT=10000.  On all but one test, the original full speed
is restored with GOMP_SPINCOUNT=100000.  On the remaining test, the threshold
appears to be between 100000 (still 35% slower than full speed) and 200000
(original full speed).  So if we're not going to have a code fix soon enough
maybe the new default should be slightly higher than 200000.  It won't help as
much as 10000 would for cases where this is needed, but it would be of some
help.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2012-01-12 20:34 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-09 16:22 [Bug libgomp/43706] New: scheduling two threads on one core leads to starvation baeuml at kit dot edu
2010-04-09 16:22 ` [Bug libgomp/43706] " baeuml at kit dot edu
2010-04-09 18:34 ` pinskia at gcc dot gnu dot org
2010-04-09 20:55 ` baeuml at kit dot edu
2010-04-09 22:11 ` mika dot fischer at kit dot edu
2010-04-20 10:23 ` jakub at gcc dot gnu dot org
2010-04-20 10:49 ` jakub at gcc dot gnu dot org
2010-04-20 12:23 ` mika dot fischer at kit dot edu
2010-04-20 15:38 ` jakub at gcc dot gnu dot org
2010-04-21 14:01 ` jakub at gcc dot gnu dot org
2010-04-21 14:01 ` jakub at gcc dot gnu dot org
2010-04-21 14:06 ` jakub at gcc dot gnu dot org
2010-04-21 14:07 ` jakub at gcc dot gnu dot org
2010-04-21 14:23 ` mika dot fischer at kit dot edu
2010-04-23 14:17 ` singler at kit dot edu
2010-04-30  8:53 ` jakub at gcc dot gnu dot org
2010-07-02  1:39 ` solar-gcc at openwall dot com
2010-07-30 14:00 ` johnfb at mail dot utexas dot edu
2010-08-13 15:48 ` singler at kit dot edu
2010-08-24 11:07 ` solar-gcc at openwall dot com
2010-08-24 11:41 ` jakub at gcc dot gnu dot org
2010-08-24 12:18 ` solar-gcc at openwall dot com
2010-08-30  8:41 ` singler at kit dot edu
2010-09-01 16:38 ` jakub at gcc dot gnu dot org
2010-09-05 11:37 ` solar-gcc at openwall dot com
     [not found] <bug-43706-4@http.gcc.gnu.org/bugzilla/>
2010-11-09 16:33 ` solar-gcc at openwall dot com
2010-11-12  8:21 ` singler at kit dot edu
2010-11-12 11:44 ` solar-gcc at openwall dot com
2010-11-15  9:14 ` singler at kit dot edu
2010-12-02 14:31 ` jakub at gcc dot gnu.org
2010-12-16 13:03 ` rguenth at gcc dot gnu.org
2012-01-12 20:34 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).