public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
From: Tom de Vries <tdevries@suse.de>
To: Tom Tromey <tom@tromey.com>,
	Tom de Vries via Gdb-patches <gdb-patches@sourceware.org>
Subject: Re: [RFC][gdbsupport] Improve thread scheduling in parallel_for_each
Date: Mon, 18 Jul 2022 09:30:28 +0200	[thread overview]
Message-ID: <b766c96b-c49f-d235-17b3-5b37d47de10d@suse.de> (raw)
In-Reply-To: <87y1wumc1f.fsf@tromey.com>

[-- Attachment #1: Type: text/plain, Size: 1844 bytes --]

On 7/15/22 21:05, Tom Tromey wrote:
>>>>>> "Tom" == Tom de Vries via Gdb-patches <gdb-patches@sourceware.org> writes:
> 
> Tom> This introduces a performance regression on a particular test-case I happened
> Tom> to use:
> Tom> ...
> Tom> $ for n in $(seq 1 10); do \
> Tom>     time gdb -q -batch libxul.so.debug 2>&1 | grep real:; \
> Tom>   done
> Tom> ...
> Tom> so revert to the original schedule by reducing the worker threads:
> Tom> ...
> 
> This seems like making a change and then undoing it somewhere else?
> 
> Tom> Still, the performance experiment yields a slight performance loss.
> 
> Sounds bad.
> 
> Tom>    if (n_threads < 0)
> Tom> -    n_threads = std::thread::hardware_concurrency ();
> Tom> +    {
> Tom> +      n_threads = std::thread::hardware_concurrency ();
> Tom> +      if (n_threads > 0)
> Tom> +	/* Account for main thread.  */
> Tom> +	n_threads--;
> Tom> +    }
> 
> I think it's better if the setting just directly controls how many
> threads there are.  Then elsewhere we can decide what that means -- like
> if it performs better with the defaults to not do any work in the main
> thread, then parallel_for_each can be modified to just send tasks to the
> workers and do nothing in the main thread except wait for results.
> 
> Tom>    size_t elts_per_thread = 0;
> [...]
> Tom> +  elts_per_thread = n_elements / n_threads;
> 
> The initial declaration can be removed and then this latter line can
> declare the variable as well.
> 
> Tom>    for (int i = 0; i < count; ++i)
> Tom>      {
> Tom>        RandomIt end = first + elts_per_thread;
> Tom> +      if (i < left_over)
> Tom> +	end++;
> 
> It may be nice to mention the distribution of leftovers in a comment
> somewhere.

I've ended up committing this patch, which just does the leftovers 
distribution part, with some comments added.

Thanks,
- Tom

[-- Attachment #2: 0001-gdbsupport-Improve-thread-scheduling-in-parallel_for_each.patch --]
[-- Type: text/x-patch, Size: 2749 bytes --]

[gdbsupport] Improve thread scheduling in parallel_for_each

When running a task using parallel_for_each, we get the following
distribution:
...
Parallel for: n_elements: 7271
Parallel for: minimum elements per thread: 10
Parallel for: elts_per_thread: 1817
Parallel for: elements on worker thread 0       : 1817
Parallel for: elements on worker thread 1       : 1817
Parallel for: elements on worker thread 2       : 1817
Parallel for: elements on worker thread 3       : 0
Parallel for: elements on main thread           : 1820
...

Note that there are 4 active threads, and scheduling elts_per_thread on each
of those handles 4 * 1817 == 7268, leaving 3 "left over" elements.

These leftovers are currently handled in the main thread.

That doesn't seem to matter much for this example, but for say 10 threads and
99 elements, you'd have 9 threads handling 9 elements and 1 thread handling 18
elements.

Instead, distribute the left over elements over the worker threads, such that
we have:
...
Parallel for: elements on worker thread 0       : 1818
Parallel for: elements on worker thread 1       : 1818
Parallel for: elements on worker thread 2       : 1818
Parallel for: elements on worker thread 3       : 0
Parallel for: elements on main thread           : 1817
...

Tested on x86_64-linux.

---
 gdbsupport/parallel-for.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/gdbsupport/parallel-for.h b/gdbsupport/parallel-for.h
index cfe8a6e4f09..bf40f125f0f 100644
--- a/gdbsupport/parallel-for.h
+++ b/gdbsupport/parallel-for.h
@@ -147,6 +147,8 @@ parallel_for_each (unsigned n, RandomIt first, RandomIt last,
   size_t n_threads = n_worker_threads;
   size_t n_elements = last - first;
   size_t elts_per_thread = 0;
+  size_t elts_left_over = 0;
+
   if (n_threads > 1)
     {
       /* Require that there should be at least N elements in a
@@ -155,6 +157,8 @@ parallel_for_each (unsigned n, RandomIt first, RandomIt last,
       if (n_elements / n_threads < n)
 	n_threads = std::max (n_elements / n, (size_t) 1);
       elts_per_thread = n_elements / n_threads;
+      elts_left_over = n_elements % n_threads;
+      /* n_elements == n_threads * elts_per_thread + elts_left_over. */
     }
 
   size_t count = n_threads == 0 ? 0 : n_threads - 1;
@@ -170,6 +174,10 @@ parallel_for_each (unsigned n, RandomIt first, RandomIt last,
   for (int i = 0; i < count; ++i)
     {
       RandomIt end = first + elts_per_thread;
+      if (i < elts_left_over)
+	/* Distribute the leftovers over the worker threads, to avoid having
+	   to handle all of them in a single thread.  */
+	end++;
       if (parallel_for_each_debug)
 	debug_printf (_("Parallel for: elements on worker thread %i\t: %zu\n"),
 		      i, (size_t)(end - first));

      reply	other threads:[~2022-07-18  7:30 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-15  9:40 Tom de Vries
2022-07-15 19:05 ` Tom Tromey
2022-07-18  7:30   ` Tom de Vries [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b766c96b-c49f-d235-17b3-5b37d47de10d@suse.de \
    --to=tdevries@suse.de \
    --cc=gdb-patches@sourceware.org \
    --cc=tom@tromey.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).