From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id 5117F3858D28 for ; Mon, 17 Oct 2022 14:29:14 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 5117F3858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.95,191,1661846400"; d="scan'208";a="84884750" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa4.mentor.iphmx.com with ESMTP; 17 Oct 2022 06:29:13 -0800 IronPort-SDR: qZVFVf9gked+nIl5P3eXVAoKhwqv1Wrk+66VVwlnToW/t0GsRIc5p8SBNvTLtG6/+V4BndBJXf POyJnwFJBY7j3H7sQTvmjeyDz9mYAI5ngqkJIqMWRbd3PDUAJu9Rxs2lPM49RogQKfUp67yTH9 pQGjv1AVkDphhTg3il4aJCDTZy9n11gI9BG3oOKaDZy5TNErhdZb2CJJC3Qer2ABKNVJr/8x8V pZV8yXHzpsoGtD2NSlUS0T+CVr64Nv/1eSkHA51J9onEGfDSYhIMn4/8HluTISddNU2fnD8Ne4 XKY= Message-ID: Date: Mon, 17 Oct 2022 22:29:01 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.3.3 Subject: Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx To: Chung-Lin Tang , gcc-patches , Tom de Vries , Catherine Moore References: <8b974d21-e288-4596-7500-277a43c92771@gmail.com> Content-Language: en-US From: Chung-Lin Tang In-Reply-To: <8b974d21-e288-4596-7500-277a43c92771@gmail.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-ClientProxiedBy: svr-orw-mbx-12.mgc.mentorg.com (147.34.90.212) To svr-orw-mbx-11.mgc.mentorg.com (147.34.90.211) X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,KAM_SHORT,NICE_REPLY_A,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Ping. On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote: > Hi Tom, > I had a patch submitted earlier, where I reported that the current way of implementing > barriers in libgomp on nvptx created a quite significant performance drop on some SPEChpc2021 > benchmarks: > https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html > > That previous patch wasn't accepted well (admittedly, it was kind of a hack). > So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX. > > Basically, instead of trying to have the GPU do CPU-with-OS-like things that it isn't suited for, > barriers are implemented simplistically with bar.* synchronization instructions. > Tasks are processed after threads have joined, and only if team->task_count != 0 > > (arguably, there might be a little bit of performance forfeited where earlier arriving threads > could've been used to process tasks ahead of other threads. But that again falls into requiring > implementing complex futex-wait/wake like behavior. Really, that kind of tasking is not what target > offloading is usually used for) > > Implementation highlight notes: > 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in the usual manner) > 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction. > 3. gomp_barrier_wait_last() now is implemented using "bar.arrive" > > 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end(): > The main synchronization is done using a 'bar.red' instruction. This reduces across all threads > the condition (team->task_count != 0), to enable the task processing down below if any thread > created a task. (this bar.red usage required the need of the second GCC patch in this series) > > This patch has been tested on x86_64/powerpc64le with nvptx offloading, using libgomp, ovo, omptests, > and sollve_vv testsuites, all without regressions. Also verified that the SPEChpc 2021 521.miniswp_t > and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle has been restored to > devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk? > > (also suggest backporting to GCC12 branch, if performance regression can be considered a defect) > > Thanks, > Chung-Lin > > libgomp/ChangeLog: > > 2022-09-21 Chung-Lin Tang > > * config/nvptx/bar.c (generation_to_barrier): Remove. > (futex_wait,futex_wake,do_spin,do_wait): Remove. > (GOMP_WAIT_H): Remove. > (#include "../linux/bar.c"): Remove. > (gomp_barrier_wait_end): New function. > (gomp_barrier_wait): Likewise. > (gomp_barrier_wait_last): Likewise. > (gomp_team_barrier_wait_end): Likewise. > (gomp_team_barrier_wait): Likewise. > (gomp_team_barrier_wait_final): Likewise. > (gomp_team_barrier_wait_cancel_end): Likewise. > (gomp_team_barrier_wait_cancel): Likewise. > (gomp_team_barrier_cancel): Likewise. > * config/nvptx/bar.h (gomp_team_barrier_wake): Remove > prototype, add new static inline function.