From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tdevries@suse.de>
Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28])
 by sourceware.org (Postfix) with ESMTPS id 2C8F93858C50
 for <gdb-patches@sourceware.org>; Sat, 23 Jul 2022 05:55:40 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2C8F93858C50
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512)
 (No client certificate requested)
 by smtp-out1.suse.de (Postfix) with ESMTPS id 2D40B22CC0;
 Sat, 23 Jul 2022 05:55:10 +0000 (UTC)
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512)
 (No client certificate requested)
 by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 13CAD13A92;
 Sat, 23 Jul 2022 05:55:10 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
 by imap2.suse-dmz.suse.de with ESMTPSA id c/ymAz6N22LxawAAMHmgww
 (envelope-from <tdevries@suse.de>); Sat, 23 Jul 2022 05:55:10 +0000
Message-ID: <805b45d3-66bb-3c49-d254-147525766ea4@suse.de>
Date: Sat, 23 Jul 2022 07:55:09 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.9.0
Subject: Re: [PATCH][gdbsupport] Use task size in parallel_for_each
Content-Language: en-US
To: Tom Tromey <tom@tromey.com>,
 Tom de Vries via Gdb-patches <gdb-patches@sourceware.org>
Cc: Pedro Alves <pedro@palves.net>
References: <20220718194219.GA16823@delia.home>
 <4fc23fcd-c15d-7622-8b51-cc48cd3cba16@palves.net>
 <75931310-5dcd-059d-9221-6c94dbcd231f@suse.de> <87leslj786.fsf@tromey.com>
From: Tom de Vries <tdevries@suse.de>
In-Reply-To: <87leslj786.fsf@tromey.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, SPF_HELO_NONE,
 SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Sat, 23 Jul 2022 05:55:41 -0000

On 7/22/22 21:08, Tom Tromey wrote:
> Pedro> My passerby comment is that I wonder whether we should consider
> Pedro> switching to a work stealing implementation, so that threads that
> Pedro> are done with their chunk would grab another chunk from the work
> Pedro> pool.  I think this would spare us from having to worry about
> Pedro> these distribution heuristics.
> 
> Tom> I also though about a dynamic solution, but decided to try the
> Tom> simplest solution first.
> 
> Tom> Anyway, with a dynamic solution you still might want to decide how big
> Tom> a chunck is, for which you could still need this type of heuristics.
> 
> I think the idea of the work-stealing approach is that each worker grabs
> work as it can, and so no sizing is needed at all.  If one worker ends
> up with a very large CU, it will simply end up working on fewer CUs.
> 

Right, I understand that.

My concern was that the current parallel-for solution potentially 
exploits caching behaviour by accessing a sequence of elements, and a 
pure work-stealing approach is more likely to execute non-subsequent 
elements, which wouldn't exploit that.  But thinking about it now, 
that's reasoning from the point of view of a single cpu, and the 
work-stealing approach would also allow for exploiting caching behaviour 
between cpus.

Anyway, so I wondered if a hybrid approach would be a good idea:
where you first schedule a percentage of the workload, say 80% 
processing slices determined using size heuristics, and do the remaining 
20% using work-stealing to counteract the discrepancy between size 
heuristics and actually used execution time.

It also occurred to me today that if you'd have a large CU as the last 
CU, with pure work-stealing you may well end up with one thread 
processing that CU and the other threads idle, while if you use size 
heuristics you may force a thread to start early on it.

Thanks,
- Tom

> In this situation, work stealing might make the overall implementation
> simpler.  Perhaps the batching parameter patch (commit 82d734f7a) and
> the vector of results patch (commit f4565e4c9) could be removed.
> 
> Then, rather than using parallel_for, the DWARF reader could send N jobs
> to the thread pool, and each job would simply take the next available CU
> by incrementing an atomic counter.  When the counter reached the number
> of CUs, a job would stop.
> 
> Tom