From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 2C8F93858C50 for ; Sat, 23 Jul 2022 05:55:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2C8F93858C50 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 2D40B22CC0; Sat, 23 Jul 2022 05:55:10 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 13CAD13A92; Sat, 23 Jul 2022 05:55:10 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id c/ymAz6N22LxawAAMHmgww (envelope-from ); Sat, 23 Jul 2022 05:55:10 +0000 Message-ID: <805b45d3-66bb-3c49-d254-147525766ea4@suse.de> Date: Sat, 23 Jul 2022 07:55:09 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Subject: Re: [PATCH][gdbsupport] Use task size in parallel_for_each Content-Language: en-US To: Tom Tromey , Tom de Vries via Gdb-patches Cc: Pedro Alves References: <20220718194219.GA16823@delia.home> <4fc23fcd-c15d-7622-8b51-cc48cd3cba16@palves.net> <75931310-5dcd-059d-9221-6c94dbcd231f@suse.de> <87leslj786.fsf@tromey.com> From: Tom de Vries In-Reply-To: <87leslj786.fsf@tromey.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 23 Jul 2022 05:55:41 -0000 On 7/22/22 21:08, Tom Tromey wrote: > Pedro> My passerby comment is that I wonder whether we should consider > Pedro> switching to a work stealing implementation, so that threads that > Pedro> are done with their chunk would grab another chunk from the work > Pedro> pool. I think this would spare us from having to worry about > Pedro> these distribution heuristics. > > Tom> I also though about a dynamic solution, but decided to try the > Tom> simplest solution first. > > Tom> Anyway, with a dynamic solution you still might want to decide how big > Tom> a chunck is, for which you could still need this type of heuristics. > > I think the idea of the work-stealing approach is that each worker grabs > work as it can, and so no sizing is needed at all. If one worker ends > up with a very large CU, it will simply end up working on fewer CUs. > Right, I understand that. My concern was that the current parallel-for solution potentially exploits caching behaviour by accessing a sequence of elements, and a pure work-stealing approach is more likely to execute non-subsequent elements, which wouldn't exploit that. But thinking about it now, that's reasoning from the point of view of a single cpu, and the work-stealing approach would also allow for exploiting caching behaviour between cpus. Anyway, so I wondered if a hybrid approach would be a good idea: where you first schedule a percentage of the workload, say 80% processing slices determined using size heuristics, and do the remaining 20% using work-stealing to counteract the discrepancy between size heuristics and actually used execution time. It also occurred to me today that if you'd have a large CU as the last CU, with pure work-stealing you may well end up with one thread processing that CU and the other threads idle, while if you use size heuristics you may force a thread to start early on it. Thanks, - Tom > In this situation, work stealing might make the overall implementation > simpler. Perhaps the batching parameter patch (commit 82d734f7a) and > the vector of results patch (commit f4565e4c9) could be removed. > > Then, rather than using parallel_for, the DWARF reader could send N jobs > to the thread pool, and each job would simply take the next available CU > by incrementing an atomic counter. When the counter reached the number > of CUs, a job would stop. > > Tom