public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug libgomp/49490] New: suboptimal load balancing in loops
@ 2011-06-21 16:48 dennis.jespersen at nasa dot gov
  2011-06-22 14:42 ` [Bug libgomp/49490] " jakub at gcc dot gnu.org
  2011-06-22 20:39 ` jakub at gcc dot gnu.org
  0 siblings, 2 replies; 3+ messages in thread
From: dennis.jespersen at nasa dot gov @ 2011-06-21 16:48 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49490

           Summary: suboptimal load balancing in loops
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: minor
          Priority: P3
         Component: libgomp
        AssignedTo: unassigned@gcc.gnu.org
        ReportedBy: dennis.jespersen@nasa.gov


Created attachment 24573
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24573
test code to show how a compiler/runtime splits an OpenMP loop

The OpenMP runtime library produces a correct but suboptimal load balance
in parallel loops.
For example, a loop of length 33 with 8 OpenMP threads will give the
threads work of lengths 5, 5, 5, 5, 5, 5, 3, 0 respectively.  This is logically
correct, but imagine a dual-socket 4 core + 4 core configuration; then
the "left" socket has 20 units of work while the "right" socket has 13
units of work.  This could put undue pressure on the left cache(s) and/or
memory connection.  It would be better to spread out the work as much
as possible, so in the example in question the threads would get work
of lengths 5, 4, 4, 4, 4, 4, 4, 4.

It should be fairly easy to modify libgomp/iter.c to produce the better
load balancing (at least I think that's where the modification would go).

The attached Fortran code will show the load balance; the Portland Group and
Intel products give the desired even balance.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug libgomp/49490] suboptimal load balancing in loops
  2011-06-21 16:48 [Bug libgomp/49490] New: suboptimal load balancing in loops dennis.jespersen at nasa dot gov
@ 2011-06-22 14:42 ` jakub at gcc dot gnu.org
  2011-06-22 20:39 ` jakub at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-06-22 14:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49490

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2011.06.22 14:41:35
         AssignedTo|unassigned at gcc dot       |jakub at gcc dot gnu.org
                   |gnu.org                     |
     Ever Confirmed|0                           |1

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-06-22 14:41:35 UTC ---
Created attachment 24580
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=24580
gcc47-pr49490.patch

Untested fix.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Bug libgomp/49490] suboptimal load balancing in loops
  2011-06-21 16:48 [Bug libgomp/49490] New: suboptimal load balancing in loops dennis.jespersen at nasa dot gov
  2011-06-22 14:42 ` [Bug libgomp/49490] " jakub at gcc dot gnu.org
@ 2011-06-22 20:39 ` jakub at gcc dot gnu.org
  1 sibling, 0 replies; 3+ messages in thread
From: jakub at gcc dot gnu.org @ 2011-06-22 20:39 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49490

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> 2011-06-22 20:39:27 UTC ---
Author: jakub
Date: Wed Jun 22 20:39:25 2011
New Revision: 175315

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=175315
Log:
    PR libgomp/49490
    * omp-low.c (expand_omp_for_static_nochunk): Only
    use n ceil/ nthreads size for the first
    n % nthreads threads in the team instead of
    all threads except for the last few ones which
    get less work or none at all.

    * iter.c (gomp_iter_static_next): For chunk size 0
    only use n ceil/ nthreads size for the first
    n % nthreads threads in the team instead of
    all threads except for the last few ones which
    get less work or none at all.
    * iter_ull.c (gomp_iter_ull_static_next): Likewise.
    * env.c (parse_schedule): If OMP_SCHEDULE doesn't have
    chunk argument, set run_sched_modifier to 0 for static
    resp. 1 for other kinds.  If chunk argument is 0
    and not static, set value to 1.

Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/omp-low.c
    trunk/libgomp/ChangeLog
    trunk/libgomp/env.c
    trunk/libgomp/iter.c
    trunk/libgomp/iter_ull.c


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2011-06-22 20:39 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-21 16:48 [Bug libgomp/49490] New: suboptimal load balancing in loops dennis.jespersen at nasa dot gov
2011-06-22 14:42 ` [Bug libgomp/49490] " jakub at gcc dot gnu.org
2011-06-22 20:39 ` jakub at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).