From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 2205) id CE1833856254; Mon, 18 Jul 2022 06:34:09 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org CE1833856254 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable From: Tom de Vries To: gdb-cvs@sourceware.org Subject: [binutils-gdb] [gdbsupport] Improve thread scheduling in parallel_for_each X-Act-Checkin: binutils-gdb X-Git-Author: Tom de Vries X-Git-Refname: refs/heads/master X-Git-Oldrev: c3d3b64b34bff289f178e2267e6363f71b0c4234 X-Git-Newrev: 4319180c8139e579a6337eb020309c95fa1e00b3 Message-Id: <20220718063409.CE1833856254@sourceware.org> Date: Mon, 18 Jul 2022 06:34:09 +0000 (GMT) X-BeenThere: gdb-cvs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Jul 2022 06:34:09 -0000 https://sourceware.org/git/gitweb.cgi?p=3Dbinutils-gdb.git;h=3D4319180c8139= e579a6337eb020309c95fa1e00b3 commit 4319180c8139e579a6337eb020309c95fa1e00b3 Author: Tom de Vries Date: Mon Jul 18 08:34:06 2022 +0200 [gdbsupport] Improve thread scheduling in parallel_for_each =20 When running a task using parallel_for_each, we get the following distribution: ... Parallel for: n_elements: 7271 Parallel for: minimum elements per thread: 10 Parallel for: elts_per_thread: 1817 Parallel for: elements on worker thread 0 : 1817 Parallel for: elements on worker thread 1 : 1817 Parallel for: elements on worker thread 2 : 1817 Parallel for: elements on worker thread 3 : 0 Parallel for: elements on main thread : 1820 ... =20 Note that there are 4 active threads, and scheduling elts_per_thread on= each of those handles 4 * 1817 =3D=3D 7268, leaving 3 "left over" elements. =20 These leftovers are currently handled in the main thread. =20 That doesn't seem to matter much for this example, but for say 10 threa= ds and 99 elements, you'd have 9 threads handling 9 elements and 1 thread hand= ling 18 elements. =20 Instead, distribute the left over elements over the worker threads, suc= h that we have: ... Parallel for: elements on worker thread 0 : 1818 Parallel for: elements on worker thread 1 : 1818 Parallel for: elements on worker thread 2 : 1818 Parallel for: elements on worker thread 3 : 0 Parallel for: elements on main thread : 1817 ... =20 Tested on x86_64-linux. Diff: --- gdbsupport/parallel-for.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/gdbsupport/parallel-for.h b/gdbsupport/parallel-for.h index cfe8a6e4f09..bf40f125f0f 100644 --- a/gdbsupport/parallel-for.h +++ b/gdbsupport/parallel-for.h @@ -147,6 +147,8 @@ parallel_for_each (unsigned n, RandomIt first, RandomIt= last, size_t n_threads =3D n_worker_threads; size_t n_elements =3D last - first; size_t elts_per_thread =3D 0; + size_t elts_left_over =3D 0; + if (n_threads > 1) { /* Require that there should be at least N elements in a @@ -155,6 +157,8 @@ parallel_for_each (unsigned n, RandomIt first, RandomIt= last, if (n_elements / n_threads < n) n_threads =3D std::max (n_elements / n, (size_t) 1); elts_per_thread =3D n_elements / n_threads; + elts_left_over =3D n_elements % n_threads; + /* n_elements =3D=3D n_threads * elts_per_thread + elts_left_over. */ } =20 size_t count =3D n_threads =3D=3D 0 ? 0 : n_threads - 1; @@ -170,6 +174,10 @@ parallel_for_each (unsigned n, RandomIt first, RandomI= t last, for (int i =3D 0; i < count; ++i) { RandomIt end =3D first + elts_per_thread; + if (i < elts_left_over) + /* Distribute the leftovers over the worker threads, to avoid having + to handle all of them in a single thread. */ + end++; if (parallel_for_each_debug) debug_printf (_("Parallel for: elements on worker thread %i\t: %zu\n"), i, (size_t)(end - first));