From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id 670013832358 for ; Fri, 16 Dec 2022 14:51:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 670013832358 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 6064E5D082; Fri, 16 Dec 2022 14:51:36 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1671202296; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aF2iKRaccaaxw90xw2sQCU0eQJZNg1M24TnW658l5Og=; b=Q6zqk1+E5wC6ToontlQ7sj/ERbH4GZFfMcff1nqdbNp1uwQ6YOb9l29Bc1qtBwYZ50EWn9 SeqkeQc0tkUUY2mhQ6ENIG6qNgWMQD9kiMTt5oKQLJLcnq9cDRq9y42orLy0s/vjXwylCt shU1ebSiVCU2V9fjWnSwIuQbUKDmRVA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1671202296; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aF2iKRaccaaxw90xw2sQCU0eQJZNg1M24TnW658l5Og=; b=RKNOqCgPDaxFMCEwnFYMdEf015EGsHaxmfIyajrF4JsotlASVVPfZpEpCXQJ/YfIWbGg8d hrvuCyPupDRbJLAw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 44445138FD; Fri, 16 Dec 2022 14:51:36 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 1ANtD/iFnGNpdQAAMHmgww (envelope-from ); Fri, 16 Dec 2022 14:51:36 +0000 Message-ID: <041a398e-5d1f-cbfa-5d7c-12f46b330143@suse.de> Date: Fri, 16 Dec 2022 15:51:35 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.5.1 Subject: Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx Content-Language: en-US To: Chung-Lin Tang , gcc-patches , Catherine Moore References: <8b974d21-e288-4596-7500-277a43c92771@gmail.com> From: Tom de Vries In-Reply-To: <8b974d21-e288-4596-7500-277a43c92771@gmail.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.1 required=5.0 tests=BAYES_00,BODY_8BITS,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On 9/21/22 09:45, Chung-Lin Tang wrote: > Hi Tom, > I had a patch submitted earlier, where I reported that the current way > of implementing > barriers in libgomp on nvptx created a quite significant performance > drop on some SPEChpc2021 > benchmarks: > https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html > > That previous patch wasn't accepted well (admittedly, it was kind of a > hack). > So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX. > Ack. > Basically, instead of trying to have the GPU do CPU-with-OS-like things > that it isn't suited for, > barriers are implemented simplistically with bar.* synchronization > instructions. > Tasks are processed after threads have joined, and only if > team->task_count != 0 > > (arguably, there might be a little bit of performance forfeited where > earlier arriving threads > could've been used to process tasks ahead of other threads. But that > again falls into requiring > implementing complex futex-wait/wake like behavior. Really, that kind of > tasking is not what target > offloading is usually used for) > Please try to add this insight somewhere as a comment in the code, f.i. in the header comment of bar.h. > Implementation highlight notes: > 1. gomp_team_barrier_wake() is now an empty function (threads never > "wake" in the usual manner) > 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction. > 3. gomp_barrier_wait_last() now is implemented using "bar.arrive" > > 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end(): >    The main synchronization is done using a 'bar.red' instruction. This > reduces across all threads >    the condition (team->task_count != 0), to enable the task processing > down below if any thread >    created a task. (this bar.red usage required the need of the second > GCC patch in this series) > > This patch has been tested on x86_64/powerpc64le with nvptx offloading, > using libgomp, ovo, omptests, > and sollve_vv testsuites, all without regressions. Also verified that > the SPEChpc 2021 521.miniswp_t > and 534.hpgmgfv_t performance regressions that occurred in the GCC12 > cycle has been restored to > devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk? > AFAIU the waiters and lock fields are longer used, so they can be removed. Yes, LGTM, please apply (after the other one). Thanks for addressing this. FWIW, tested on NVIDIA RTX A2000 with driver 525.60.11. > (also suggest backporting to GCC12 branch, if performance regression can > be considered a defect) > That's ok, but wait a while after applying on trunk before doing that, say a month. Thanks, - Tom > Thanks, > Chung-Lin > > libgomp/ChangeLog: > > 2022-09-21  Chung-Lin Tang  > >     * config/nvptx/bar.c (generation_to_barrier): Remove. >     (futex_wait,futex_wake,do_spin,do_wait): Remove. >     (GOMP_WAIT_H): Remove. >     (#include "../linux/bar.c"): Remove. >     (gomp_barrier_wait_end): New function. >     (gomp_barrier_wait): Likewise. >     (gomp_barrier_wait_last): Likewise. >     (gomp_team_barrier_wait_end): Likewise. >     (gomp_team_barrier_wait): Likewise. >     (gomp_team_barrier_wait_final): Likewise. >     (gomp_team_barrier_wait_cancel_end): Likewise. >     (gomp_team_barrier_wait_cancel): Likewise. >     (gomp_team_barrier_cancel): Likewise. >     * config/nvptx/bar.h (gomp_team_barrier_wake): Remove >     prototype, add new static inline function.