From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 9F6173839C70 for ; Tue, 27 Jul 2021 09:33:20 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9F6173839C70 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: G/whDVijj95XNqDZA8hALvRrFlzyLWpNba2Tlpp8DpJqoPO1IgdJta15NdFyEhtjpuWVU+VPIe 5Z1kk3ic9bBXax1aCNlVK6xx6hmy5QbxcD8NOIeXeo5UHaR0PvdztktyCPHUiF3u/998dYdfyz Lk7FcgvT9Zfy/12/85sEVcG/4pPMLZ/3lY1cv1ob0d5tq79YMl3U75lIS8EDjQXA3jrDA43Ro9 j1Os91e8yBrBOrzoRfCRqsDO6gRxyRKIehv25f1nggQpJVGoXELzJpq/wKi7/aKJ2G7yZqVUOf HkDqIeeTyEg2eoUuwYy7l3po X-IronPort-AV: E=Sophos;i="5.84,272,1620720000"; d="scan'208,223";a="63949829" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 27 Jul 2021 01:33:19 -0800 IronPort-SDR: iyl4T5UbieQeC3Seuu8VuOYspcZyg5PLSI+B2DPonOWLhHIip/j6CV1dtfkIWjBLJqlkXa4I7e WKLzuTRXmD7XFklUSBcTCHnoQSyfQ/kLbKC9py8SILdLd03IWwHlTFzOyMjsn5SO5rcisARKq6 FhA84cY3v/wa9RYYTmdTNCs88AkvM1liD4Pg7KS7SRAXjR6N1cbo6bvp7RUAQjFA+7koTl9voh 6/DPRISXOxvAkoQqIz0w9vw7DPFX7rL0//QuEgOGMv80YJyeV3ff+x3/pUo8ErdBE7TkNXbMeo o/A= From: Thomas Schwinge To: CC: Julian Brown Subject: Re: [committed] OpenACC Profiling Interface (incomplete) In-Reply-To: References: <87k28acit3.fsf@hertz.schwinge.homeip.net> <20190516155423.GA19695@tucnak> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/27.1 (x86_64-pc-linux-gnu) Date: Tue, 27 Jul 2021 11:33:09 +0200 Message-ID: <87k0lc85ka.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-07.mgc.mentorg.com (139.181.222.7) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Jul 2021 09:33:22 -0000 --=-=-= Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi! On 2019-05-17T21:19:07+0200, I wrote: > Committed to trunk in r271346 "OpenACC Profiling Interface (incomplete)" There in an 'async' issue in two of the test cases added here, as figured out during review/testing with GCN offloading Julian's patch to 'Fix OpenACC "ephemeral" asynchronous host-to-device copies'. I've pushed "[OpenACC] Clarify sequencing of 'async' data copying vs. profiling events in 'libgomp.oacc-c-c++-common/acc_prof-{init,parallel}-1.c'" to master branch in commit 29ddaf43f70e19fd1110b539e8b3d0436c757e34, see attached. Gr=C3=BC=C3=9Fe Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955 --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename="0001-OpenACC-Clarify-sequencing-of-async-data-copying-vs..patch" >From 29ddaf43f70e19fd1110b539e8b3d0436c757e34 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Fri, 23 Jul 2021 15:07:34 +0200 Subject: [PATCH] [OpenACC] Clarify sequencing of 'async' data copying vs. profiling events in 'libgomp.oacc-c-c++-common/acc_prof-{init,parallel}-1.c' ... as noticed with GCN offloading. Fix-up for r271346 (commit 5fae049dc272144f8e61af94ee0ba42b270915e5) "OpenACC Profiling Interface (incomplete)". libgomp/ * testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c: Clarify sequencing of 'async' data copying vs. profiling events. * testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c: Likewise. --- .../acc_prof-init-1.c | 49 ++++++++++++------ .../acc_prof-parallel-1.c | 51 +++++++++++++------ 2 files changed, 68 insertions(+), 32 deletions(-) diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c index 7d05f482f46..b5e77155460 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c @@ -19,6 +19,19 @@ #define DEBUG_printf(...) //__builtin_printf (__VA_ARGS__) +static acc_prof_reg reg; +static acc_prof_reg unreg; +static acc_prof_lookup_func lookup; +void acc_register_library (acc_prof_reg reg_, acc_prof_reg unreg_, acc_prof_lookup_func lookup_) +{ + DEBUG_printf ("%s\n", __FUNCTION__); + + reg = reg_; + unreg = unreg_; + lookup = lookup_; +} + + static int state = -1; #define STATE_OP(state, op) \ @@ -34,7 +47,7 @@ static int state = -1; static acc_device_t acc_device_type; static int acc_device_num; -static int acc_async; +static int acc_async = acc_async_sync; struct tool_info @@ -192,6 +205,21 @@ static void cb_compute_construct_end (acc_prof_info *prof_info, acc_event_info * assert (state == 11 || state == 111); +#if defined COPYIN + /* In an 'async' setting, this event may be triggered before actual 'async' + data copying has completed. Given that 'state' appears in 'COPYIN', we + first have to synchronize (that is, let the 'async' 'COPYIN' read the + current 'state' value)... */ + if (acc_async != acc_async_sync) + { + /* "We're not yet accounting for the fact that _OpenACC events may occur + during event processing_"; temporarily disable to avoid deadlock. */ + unreg (acc_ev_none, NULL, acc_toggle_per_thread); + acc_wait (acc_async); + reg (acc_ev_none, NULL, acc_toggle_per_thread); + } + /* ... before modifying it in the following. */ +#endif STATE_OP (state, ++); assert (tool_info != NULL); @@ -240,19 +268,6 @@ static void cb_compute_construct_end (acc_prof_info *prof_info, acc_event_info * } -static acc_prof_reg reg; -static acc_prof_reg unreg; -static acc_prof_lookup_func lookup; -void acc_register_library (acc_prof_reg reg_, acc_prof_reg unreg_, acc_prof_lookup_func lookup_) -{ - DEBUG_printf ("%s\n", __FUNCTION__); - - reg = reg_; - unreg = unreg_; - lookup = lookup_; -} - - int main() { acc_register_library (acc_prof_register, acc_prof_unregister, acc_prof_lookup); @@ -271,14 +286,15 @@ int main() acc_device_type = acc_get_device_type (); acc_device_num = acc_get_device_num (acc_device_type); - acc_async = 12; { int state_init; + acc_async = 12; #pragma acc parallel async(acc_async) COPYIN(state) copyout(state_init) { state_init = state; } + acc_async = acc_async_sync; #pragma acc wait assert (state_init == 11); } @@ -297,14 +313,15 @@ int main() acc_device_type = acc_get_device_type (); acc_device_num = acc_get_device_num (acc_device_type); - acc_async = 12; { int state_init; + acc_async = 12; #pragma acc parallel async(acc_async) COPYIN(state) copyout(state_init) { state_init = state; } + acc_async = acc_async_sync; #pragma acc wait assert (state_init == 111); } diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c index a5e9ab3f936..1f503861cb6 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c @@ -29,6 +29,19 @@ #define DEBUG_printf(...) //__builtin_printf (__VA_ARGS__) +static acc_prof_reg reg; +static acc_prof_reg unreg; +static acc_prof_lookup_func lookup; +void acc_register_library (acc_prof_reg reg_, acc_prof_reg unreg_, acc_prof_lookup_func lookup_) +{ + DEBUG_printf ("%s\n", __FUNCTION__); + + reg = reg_; + unreg = unreg_; + lookup = lookup_; +} + + static int state = -1; #define STATE_OP(state, op) \ @@ -44,7 +57,7 @@ static int state = -1; static acc_device_t acc_device_type; static int acc_device_num; -static int acc_async; +static int acc_async = acc_async_sync; struct tool_info @@ -235,6 +248,25 @@ static void cb_enter_data_end (acc_prof_info *prof_info, acc_event_info *event_i assert (state == 4 || state == 104); +#if defined COPYIN + /* Conceptually, 'acc_ev_enter_data_end' marks the end of data copying, + before 'acc_ev_enqueue_launch_start' marks invoking the compute region. + That's the 'state_init = state;' intended to be captured in the compute + regions. */ + /* In an 'async' setting, this event may be triggered before actual 'async' + data copying has completed. Given that 'state' appears in 'COPYIN', we + first have to synchronize (that is, let the 'async' 'COPYIN' read the + current 'state' value)... */ + if (acc_async != acc_async_sync) + { + /* "We're not yet accounting for the fact that _OpenACC events may occur + during event processing_"; temporarily disable to avoid deadlock. */ + unreg (acc_ev_none, NULL, acc_toggle_per_thread); + acc_wait (acc_async); + reg (acc_ev_none, NULL, acc_toggle_per_thread); + } + /* ... before modifying it in the following. */ +#endif STATE_OP (state, ++); assert (tool_info != NULL); @@ -664,19 +696,6 @@ static void cb_enqueue_launch_end (acc_prof_info *prof_info, acc_event_info *eve } -static acc_prof_reg reg; -static acc_prof_reg unreg; -static acc_prof_lookup_func lookup; -void acc_register_library (acc_prof_reg reg_, acc_prof_reg unreg_, acc_prof_lookup_func lookup_) -{ - DEBUG_printf ("%s\n", __FUNCTION__); - - reg = reg_; - unreg = unreg_; - lookup = lookup_; -} - - int main() { acc_register_library (acc_prof_register, acc_prof_unregister, acc_prof_lookup); @@ -696,7 +715,6 @@ int main() acc_device_type = acc_get_device_type (); acc_device_num = acc_get_device_num (acc_device_type); - acc_async = acc_async_sync; assert (state == 0); { @@ -713,15 +731,16 @@ int main() STATE_OP (state, = 100); - acc_async = 12; { int state_init; + acc_async = 12; #pragma acc parallel async(acc_async) COPYIN(state) copyout(state_init) { asm volatile ("" : : : "memory"); // TODO PR90488 state_init = state; } + acc_async = acc_async_sync; #pragma acc wait assert (state_init == 104); } -- 2.30.2 --=-=-=--