public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c/96844] New: OpenMP: two worksharing constructs with different num_threads clauses break thread pooling
@ 2020-08-29 16:00 mority at posteo dot net
  2020-08-29 16:01 ` [Bug c/96844] " mority at posteo dot net
  2023-01-16 17:40 ` [Bug libgomp/96844] " rsandifo at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: mority at posteo dot net @ 2020-08-29 16:00 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96844

            Bug ID: 96844
           Summary: OpenMP: two worksharing constructs with different
                    num_threads clauses break thread pooling
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mority at posteo dot net
  Target Milestone: ---

Created attachment 49154
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49154&action=edit
Code that produces bug

Hi,

if a for loop contains two OpenMP worksharing constructs which specify
different values in their num_threads clauses, thread pooling seems not to be
working correctly. 

E.g., the first worksharing construct has num_threads(2) and the second
num_threads(4). The expected behavior would be that a total of 4 threads is
created. The first worksharing construct uses 2 of these threads and the second
all of them. 

However, this seems not be the case. While thread pooling seems to work for the
first worksharing construct, it fails for the second. Every time the second
worksharing construct is executed, 2 new threads are created. This causes
significant overhead.

For clarification: There is no nested parallelism.

The attached code can be used to reproduce the bug. The code can be compiled
into 4 different versions using conditional compilation:

1. no OpenMP
    gcc -O3 -I. -Wall -g -DPRINT_TID mwe2_woMPI.c -o mwe2_woMPI

2. worksharing construct foo only
    gcc -O3 -I. -Wall -g -DPRINT_TID -DPRAGMA_FOO -fopenmp mwe2_woMPI.c -o
mwe2_woMPI_foo

3. worksharing construct bar only
    gcc -O3 -I. -Wall -g -DPRINT_TID -DPRAGMA_BAR -fopenmp mwe2_woMPI.c -o
mwe2_woMPI_bar

4. both worksharing constructs
    gcc -O3 -I. -Wall -g -DPRINT_TID -DPRAGMA_FOO -DPRAGMA_BAR -fopenmp
mwe2_woMPI.c -o mwe2_woMPI_foobar

I analyzed the output of the different versions which contains the thread id
for every iteration. Each worksharing construct in isolation works correctly
and 2 or 4 threads are created, respectively. However, if both worksharing
constructs are used at the same time, the first worksharing construct uses 2
different threads and the second 22 different threads.

GCC versions 8.3, 9.2. and 10.2 all show this behavior. I also compiled the
code with clang 10.1 and icc 19.4 which both handle the case correctly.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug c/96844] OpenMP: two worksharing constructs with different num_threads clauses break thread pooling
  2020-08-29 16:00 [Bug c/96844] New: OpenMP: two worksharing constructs with different num_threads clauses break thread pooling mority at posteo dot net
@ 2020-08-29 16:01 ` mority at posteo dot net
  2023-01-16 17:40 ` [Bug libgomp/96844] " rsandifo at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: mority at posteo dot net @ 2020-08-29 16:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96844

--- Comment #1 from Moritz Fischer <mority at posteo dot net> ---
Created attachment 49155
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=49155&action=edit
python script to count number of different threads used for each worksharing
construct

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug libgomp/96844] OpenMP: two worksharing constructs with different num_threads clauses break thread pooling
  2020-08-29 16:00 [Bug c/96844] New: OpenMP: two worksharing constructs with different num_threads clauses break thread pooling mority at posteo dot net
  2020-08-29 16:01 ` [Bug c/96844] " mority at posteo dot net
@ 2023-01-16 17:40 ` rsandifo at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: rsandifo at gcc dot gnu.org @ 2023-01-16 17:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96844

rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rsandifo at gcc dot gnu.org
   Last reconfirmed|                            |2023-01-16
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=44833
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #2 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> ---
I got an internal Arm report about the same behaviour.  Like you say, they
don't see the problem with LLVM's and Intel's libraries.

https://github.com/xianyi/OpenBLAS/pull/3546#issuecomment-1153914479 is a
discussion about the same problem in an OpenBLAS context.

Some other places where the question has come up:
- https://stackoverflow.com/a/52821175
- https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71781

I suppose the question is what to do instead.  As
[https://github.com/xianyi/OpenBLAS/pull/3546#issuecomment-1154817082]
says, past behaviour isn't necessarily an indication of future behaviour, so a
simple counter- or time-based reaping heuristic is likely to create unexpected
cliff-edges.

From my reading of the LLVM libomp sources, it looks like it doesn't reap
threads until:
- library shutdown
- omp_pause_resource(omp_pause_hard, ...) or
omp_pause_resource_all(omp_pause_hard) is called
(__kmp_allocate_team also reaps teams that are too small for the request, but
the comments indicate that that's temporary.)

I could well be wrong: there could well be other situations in which LLVM reaps
threads too.

Still, would it be OK/useful to have an environment variable that selects the
same behaviour in libgomp?

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-01-16 17:40 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-29 16:00 [Bug c/96844] New: OpenMP: two worksharing constructs with different num_threads clauses break thread pooling mority at posteo dot net
2020-08-29 16:01 ` [Bug c/96844] " mority at posteo dot net
2023-01-16 17:40 ` [Bug libgomp/96844] " rsandifo at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).