From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id F2E9A3854175 for ; Mon, 31 Oct 2022 14:18:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org F2E9A3854175 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.95,228,1661846400"; d="scan'208";a="85895190" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 31 Oct 2022 06:18:39 -0800 IronPort-SDR: 2+9wbgeuaMrag9nj88IYJJ3wquYKsMTcUjFT22Q/aBK7sa+Zozi8y7+pArgSRDHUlteb+9/Hkk udrhOJDF7DZSQhFIqbm0x/dMCseu+5h2Mlr2bQZ9hnoOLc0OxAuV6hF5gwWPZ0KCfPesgqFrsJ A/AYdeLEKywSXuVvZJ6ESwTDKxhdgnzulHUYCn1qcYhJSFIohMzL0cSLW8gvG4u2i5j3zVZuiL i1bOR4egrlx9PKhJZSS8wl5LRfSemVRxNjX0Caiph40/f06wU46tgYcmgLoAvPrDofHgLts1kY xoQ= Message-ID: <32ba851f-ad70-155e-c321-b9bfb610f353@codesourcery.com> Date: Mon, 31 Oct 2022 22:18:28 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.4.0 Subject: [Ping x2] Re: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx To: Chung-Lin Tang , gcc-patches , Tom de Vries , Catherine Moore References: <8b974d21-e288-4596-7500-277a43c92771@gmail.com> Content-Language: en-US From: Chung-Lin Tang In-Reply-To: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-ClientProxiedBy: svr-orw-mbx-12.mgc.mentorg.com (147.34.90.212) To svr-orw-mbx-10.mgc.mentorg.com (147.34.90.210) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,HEADER_FROM_DIFFERENT_DOMAINS,KAM_DMARC_STATUS,KAM_SHORT,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Ping x2. On 2022/10/17 10:29 PM, Chung-Lin Tang wrote: > Ping. > > On 2022/9/21 3:45 PM, Chung-Lin Tang via Gcc-patches wrote: >> Hi Tom, >> I had a patch submitted earlier, where I reported that the current way of implementing >> barriers in libgomp on nvptx created a quite significant performance drop on some SPEChpc2021 >> benchmarks: >> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html >> >> That previous patch wasn't accepted well (admittedly, it was kind of a hack). >> So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX. >> >> Basically, instead of trying to have the GPU do CPU-with-OS-like things that it isn't suited for, >> barriers are implemented simplistically with bar.* synchronization instructions. >> Tasks are processed after threads have joined, and only if team->task_count != 0 >> >> (arguably, there might be a little bit of performance forfeited where earlier arriving threads >> could've been used to process tasks ahead of other threads. But that again falls into requiring >> implementing complex futex-wait/wake like behavior. Really, that kind of tasking is not what target >> offloading is usually used for) >> >> Implementation highlight notes: >> 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in the usual manner) >> 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction. >> 3. gomp_barrier_wait_last() now is implemented using "bar.arrive" >> >> 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end(): >> The main synchronization is done using a 'bar.red' instruction. This reduces across all threads >> the condition (team->task_count != 0), to enable the task processing down below if any thread >> created a task. (this bar.red usage required the need of the second GCC patch in this series) >> >> This patch has been tested on x86_64/powerpc64le with nvptx offloading, using libgomp, ovo, omptests, >> and sollve_vv testsuites, all without regressions. Also verified that the SPEChpc 2021 521.miniswp_t >> and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle has been restored to >> devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk? >> >> (also suggest backporting to GCC12 branch, if performance regression can be considered a defect) >> >> Thanks, >> Chung-Lin >> >> libgomp/ChangeLog: >> >> 2022-09-21 Chung-Lin Tang >> >> * config/nvptx/bar.c (generation_to_barrier): Remove. >> (futex_wait,futex_wake,do_spin,do_wait): Remove. >> (GOMP_WAIT_H): Remove. >> (#include "../linux/bar.c"): Remove. >> (gomp_barrier_wait_end): New function. >> (gomp_barrier_wait): Likewise. >> (gomp_barrier_wait_last): Likewise. >> (gomp_team_barrier_wait_end): Likewise. >> (gomp_team_barrier_wait): Likewise. >> (gomp_team_barrier_wait_final): Likewise. >> (gomp_team_barrier_wait_cancel_end): Likewise. >> (gomp_team_barrier_wait_cancel): Likewise. >> (gomp_team_barrier_cancel): Likewise. >> * config/nvptx/bar.h (gomp_team_barrier_wake): Remove >> prototype, add new static inline function.