public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libgomp/115367] New: The implementation of OMP_DYNAMIC is not dynamic
@ 2024-06-06  1:59 mail+gcc at nh2 dot me
  2024-06-06  6:11 ` [Bug libgomp/115367] " jakub at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: mail+gcc at nh2 dot me @ 2024-06-06  1:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115367

            Bug ID: 115367
           Summary: The implementation of OMP_DYNAMIC is not dynamic
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgomp
          Assignee: unassigned at gcc dot gnu.org
          Reporter: mail+gcc at nh2 dot me
                CC: jakub at gcc dot gnu.org
  Target Milestone: ---

Please see:

"Why does my OpenMP app sometimes use only 1 thread, sometimes 3, sometimes all
cores?"

https://stackoverflow.com/questions/78584145/why-does-my-openmp-app-sometimes-use-only-1-thread-sometimes-3-sometimes-all-c/78584146

OMP_DYNAMIC is implemented like this (on Linux, likely other platforms):

https://github.com/gcc-mirror/gcc/blob/10cb3336ba1ac89b258f627222e668b023a6d3d4/libgomp/config/linux/proc.c#L180-L188

    /* When OMP_DYNAMIC is set, at thread launch determine the number of
       threads we should spawn for this team.  */
    /* ??? I have no idea what best practice for this is.  Surely some
       function of the number of processors that are *still* online and
       the load average.  Here I use the number of processors online
       minus the 15 minute load average.  */

    unsigned
    gomp_dynamic_max_threads (void) {
    // ...
        return n_onln - loadavg;


### `OMP_DYNAMIC` (of `libgomp`) is really bad

* Because of this logic, your app will use only 1 thread, even though the
system is completely idle _now_, just because it was busy 10 minutes ago.
* The dynamic limit is determined _at process start_ (loading time), and fixed
forever.
  * So started programs stay slow _forever_ if they were started at a time 5
minutes after the system was busy.
* It means a server can never achieve full utilisation when working down a
queue of jobs.
  * Say you have 8 cores, and a queue of N jobs to process, each of which takes
15 minutes full-CPU.
  * The first jobs starts at 0 15-min-utilisation, thus using all cores.
  * The next job starts, using only 1 core.
  * The next job starts, using only 7 cores.
  * The next job starts, using only 1 cores.
  * The next job starts, using only 7 cores.
  * ...
  * In the long run, the **server uses only half of its cores on average**.
* It makes performance behaviour completely irreproducible.

None of this behaviour is documented

* in [`libgomp`](https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fDYNAMIC.html)
* or in the [OpenMP spec for
`OMP_DYNAMIC`](https://www.openmp.org/spec-html/5.0/openmpse51.html).

Those docs sound like the behaviour is nice "runtime-dynamic" when in fact it
is fixed across the process's liftime, and based on ultra-slow rolling
averages.


I argue that libgomp does not implement the OpenMP spec well here.

It says

> OpenMP implementation may adjust the number of threads to use for executing parallel regions in order to optimize the use of system resources


Thus suggests that the OpenMP implementation may do something sensible to
adjust the number of threads "DYNAMIC"ally.

Nowhere does it say that it should determine this at the start of the process,
and never adjust it again.

That's the opposite of "dynamic"!

And then combined with a very-much-not-dynamic 15 minutes delay.

I read the spec text as "do something sensible like GNU make, which checks the
(short-term!) loadavg()" of the current system periodically and ajusts its
parallelism accordingly".

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug libgomp/115367] The implementation of OMP_DYNAMIC is not dynamic
  2024-06-06  1:59 [Bug libgomp/115367] New: The implementation of OMP_DYNAMIC is not dynamic mail+gcc at nh2 dot me
@ 2024-06-06  6:11 ` jakub at gcc dot gnu.org
  2024-06-06  7:13 ` comptes at ugo235 dot fr
  2024-06-06  7:43 ` comptes at ugo235 dot fr
  2 siblings, 0 replies; 4+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-06-06  6:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115367

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Niklas Hambüchen from comment #0)
> Those docs sound like the behaviour is nice "runtime-dynamic" when in fact
> it is fixed across the process's liftime, and based on ultra-slow rolling
> averages.

That is not the case.  It really calls getloadavg each time when trying to
determine the number of threads and uses the 15min average then.
Using say 1min average is IMHO highly undesirable for decisions in a program
that usually will last longer than that.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug libgomp/115367] The implementation of OMP_DYNAMIC is not dynamic
  2024-06-06  1:59 [Bug libgomp/115367] New: The implementation of OMP_DYNAMIC is not dynamic mail+gcc at nh2 dot me
  2024-06-06  6:11 ` [Bug libgomp/115367] " jakub at gcc dot gnu.org
@ 2024-06-06  7:13 ` comptes at ugo235 dot fr
  2024-06-06  7:43 ` comptes at ugo235 dot fr
  2 siblings, 0 replies; 4+ messages in thread
From: comptes at ugo235 dot fr @ 2024-06-06  7:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115367

--- Comment #2 from PierU <comptes at ugo235 dot fr> ---
I can confirm it's really dynamic, i.e. the number of threads can change each
time a parallel region is started. I tested with this Fortran toy example:
```
program p
use omp_lib
implicit none

real, allocatable :: x(:,:)
integer :: i, n = 1000

call omp_set_dynamic(.true.)

allocate( x(n*n,n) )

do 

   !$OMP PARALLEL 
   !$OMP SINGLE
      write(*,"(I3)",advance='no') omp_get_num_threads()
   !$OMP END SINGLE
   !$OMP DO
   do i = 1, n
      x(:,i) = cos(x(:,i))
   end do
   !$OMP END DO
   !$OMP SINGLE
      write(*,*) "   ", x(n*n/2,n/2)
   !$OMP END SINGLE
   !$OMP END PARALLEL

end do

end
```

And got (the number of threads is the first column):
```
  2       1.00000000    
  2      0.540302277    
  2      0.857553244    
  2      0.654289782    
  1      0.793480337    
  1      0.701368809    
  1      0.763959646    
  1^C
```

However, this implementation looks weird to me:

- the load avg is IMO not a good measure of the CPU occupation. On my system
(macOS with a 4 cores), the load avg is typically between 2 and 3 while the CPU
occupation is constantly around 10% (100% being the 4 cores fully occupied).
See this output from top:


- When an application runs during a long time, it results in a raise of the
load avg, but this is not because some other processes concurrently use the
CPU. And yet the number of threads is dynamically reduced. With such a method,
on average the application will use only half of the cores...

I definitely don't how OMP_DYNAMIC should be implemented, but clearly the
analysis should make a difference between the CPU occupation by the current
application and by the other processes...

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug libgomp/115367] The implementation of OMP_DYNAMIC is not dynamic
  2024-06-06  1:59 [Bug libgomp/115367] New: The implementation of OMP_DYNAMIC is not dynamic mail+gcc at nh2 dot me
  2024-06-06  6:11 ` [Bug libgomp/115367] " jakub at gcc dot gnu.org
  2024-06-06  7:13 ` comptes at ugo235 dot fr
@ 2024-06-06  7:43 ` comptes at ugo235 dot fr
  2 siblings, 0 replies; 4+ messages in thread
From: comptes at ugo235 dot fr @ 2024-06-06  7:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115367

PierU <comptes at ugo235 dot fr> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |comptes at ugo235 dot fr

--- Comment #3 from PierU <comptes at ugo235 dot fr> ---
I forgot the "top" output:

Load Avg: 3.54, 3.21, 3.17  CPU usage: 3.24% user, 4.16% sys, 92.59% idle

The machine has been mostly idle for more than 15mn, just with background
applications opened (firefox, thunderbird...), the CPU occupation is generally
around 10% (100% being the 4 cores occupied), and yet the load avg is above 3.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2024-06-06  7:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-06  1:59 [Bug libgomp/115367] New: The implementation of OMP_DYNAMIC is not dynamic mail+gcc at nh2 dot me
2024-06-06  6:11 ` [Bug libgomp/115367] " jakub at gcc dot gnu.org
2024-06-06  7:13 ` comptes at ugo235 dot fr
2024-06-06  7:43 ` comptes at ugo235 dot fr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).