From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 4C5283857010 for ; Tue, 14 Jul 2020 11:01:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 4C5283857010 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Thomas_Schwinge@mentor.com IronPort-SDR: 0zIganQOwHL/JJMWROs6W2LVbVxI1lWqs6ThSh2cRoHsRF4wgfDf2rcnz/HOvRFGX2vCEfbZzM jnV1cnlQqB+QYYGDaq+Q9JXXbPs/+aajdZ5sDczlEPxH9/d3mg8WT0Hd3w0PbNoDtpo1W1vbws P2JEA00p6UqOOMQZWmKGHqGJWVK2EkOfHb0pKSFPBCUzyGzVsKOLaxF0JKTGiyF6d7kotUmuNE 1/wY4DtQLAHubsDytXZTGDCsBK4czRBiTAgHcs+ml8MGJXFmC8mfs1TByrCnUe56HnUddP6aV4 f7I= X-IronPort-AV: E=Sophos;i="5.75,350,1589270400"; d="scan'208,223";a="50878602" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa3.mentor.iphmx.com with ESMTP; 14 Jul 2020 03:01:08 -0800 IronPort-SDR: 7YOF3hdzlo4InHgcBt9R2WcJJC3dmlUC8H3fOqqXLqSAMoVLB2eNoV1ubG4EkZ7eSzG83BL9k2 fNT2CYdUFAqxw4oseycNq6CvJ6mwlZxDsTuoY4NWK3l3PdsTVwUwfUJPa7cco5Xvd0L6PU4XZn 0Y+RZSoaLzdh9DD8JICFaBm1KnKqI8B7j31sD9oqgZIvOp31DJBw3sN2bdGmtTySoppTq3NlgH Q0PPMySOhXaoIog+skNYBopkI7cm5YclIV6JfZm5yK3Cx5qxwfQKH1ptVPa9Aod+Wc7AdwJb+0 mTs= From: Thomas Schwinge To: Kwok Cheung Yeung CC: , Jakub Jelinek Subject: Re: [PATCH] libgomp: Fix hang when profiling OpenACC programs with CUDA 9.0 nvprof In-Reply-To: <7f849f69-0733-c72e-8bce-0b3999c86168@codesourcery.com> References: <7f849f69-0733-c72e-8bce-0b3999c86168@codesourcery.com> User-Agent: Notmuch/0.29.1+93~g67ed7df (https://notmuchmail.org) Emacs/26.3 (x86_64-pc-linux-gnu) Date: Tue, 14 Jul 2020 13:00:55 +0200 Message-ID: <87a702tl7c.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Spam-Status: No, score=-10.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Jul 2020 11:01:12 -0000 --=-=-= Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi Kwok! On 2020-07-13T16:29:14+0100, Kwok Cheung Yeung wrote= : > When the version of nvprof in CUDA 9.0 is run on an OpenACC program, [...= ] the > program deadlocks. > I have added a testcase that sets up the situation presented by nvprof. Thanks. I have extended this one a little bit, to add some state tracking to verify that we get the expected callbacks invoked, test what we expect returned from 'acc_get_device_type', and in addition to your 'acc_ev_device_init_start' also verify the corresponding 'acc_ev_device_init_end'. I've also updated the documentation. > This > testcase hangs without the patch (hence the short dg-timeout), and passes= with > it. (Thus, 'dg-timeout' not really necessary anymore, but OK to leave in if you'd like.) > Okay for master, GCC 10 branch and OG10? Thanks, OK, with the incremental patch merged in, unless there's anything to discuss further. > libgomp: Fix hang when profiling OpenACC programs with CUDA 9.0 nvpro= f > > The version of nvprof in CUDA 9.0 causes a hang when used to profile = an > OpenACC program. This is because it calls acc_get_device_type from > a callback called during device initialization, which then attempts > to acquire acc_device_lock while it is already taken, resulting in > deadlock. This works around the issue by returning acc_device_none > from acc_get_device_type without attempting to acquire the lock when > initialization has not completed yet. > > 2020-07-13 Tom de Vries Should use Tom's CodeSourcery address, given that's when this work was done. > Cesar Philippidis > Thomas Schwinge > Kwok Cheung Yeung Gr=C3=BC=C3=9Fe Thomas > libgomp/ > * oacc-init.c (acc_init_state_lock, acc_init_state, acc_init_thread= ): > New variable. > (acc_init_1): Set acc_init_thread to pthread_self (). Set > acc_init_state to initializing at the start, and to initialized at = the > end. > (self_initializing_p): New function. > (acc_get_device_type): Return acc_device_none if called by thread t= hat > is currently executing acc_init_1. > * testsuite/libgomp.oacc-c-c++-common/acc_prof-cb-call.c: New. > > diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c > index 5d786a5..1e7f934 100644 > --- a/libgomp/oacc-init.c > +++ b/libgomp/oacc-init.c > @@ -40,6 +40,11 @@ > > static gomp_mutex_t acc_device_lock; > > +static gomp_mutex_t acc_init_state_lock; > +static enum { uninitialized, initializing, initialized } acc_init_state > + =3D uninitialized; > +static pthread_t acc_init_thread; > + > /* A cached version of the dispatcher for the global "current" accelerat= or type, > e.g. used as the default when creating new host threads. This is the > device-type equivalent of goacc_device_num (which specifies which dev= ice to > @@ -228,6 +233,11 @@ acc_dev_num_out_of_range (acc_device_t d, int ord, i= nt ndevs) > static struct gomp_device_descr * > acc_init_1 (acc_device_t d, acc_construct_t parent_construct, int implic= it) > { > + gomp_mutex_lock (&acc_init_state_lock); > + acc_init_state =3D initializing; > + acc_init_thread =3D pthread_self (); > + gomp_mutex_unlock (&acc_init_state_lock); > + > bool check_not_nested_p; > if (implicit) > { > @@ -317,6 +327,14 @@ acc_init_1 (acc_device_t d, acc_construct_t parent_c= onstruct, int implicit) > &api_info); > } > > + /* We're setting 'initialized' *after* 'goacc_profiling_dispatch', so = that a > + nested 'acc_get_device_type' called from a profiling callback still= sees > + 'initializing', so that we don't deadlock when it then again tries = to lock > + 'goacc_prof_lock'. See also the discussion in 'acc_get_device_type= '. */ > + gomp_mutex_lock (&acc_init_state_lock); > + acc_init_state =3D initialized; > + gomp_mutex_unlock (&acc_init_state_lock); > + > return base_dev; > } > > @@ -643,6 +661,17 @@ acc_set_device_type (acc_device_t d) > > ialias (acc_set_device_type) > > +static bool > +self_initializing_p (void) > +{ > + bool res; > + gomp_mutex_lock (&acc_init_state_lock); > + res =3D (acc_init_state =3D=3D initializing > + && pthread_equal (acc_init_thread, pthread_self ())); > + gomp_mutex_unlock (&acc_init_state_lock); > + return res; > +} > + > acc_device_t > acc_get_device_type (void) > { > @@ -652,6 +681,15 @@ acc_get_device_type (void) > > if (thr && thr->base_dev) > res =3D acc_device_type (thr->base_dev->type); > + else if (self_initializing_p ()) > + /* The Cuda libaccinj64.so version 9.0+ calls acc_get_device_type du= ring the > + acc_ev_device_init_start event callback, which is dispatched duri= ng > + acc_init_1. Trying to lock acc_device_lock during such a call (a= s we do > + in the else clause below), will result in deadlock, since the loc= k has > + already been taken by the acc_init_1 caller. We work around this= problem > + by using the acc_get_device_type property "If the device type has= not yet > + been selected, the value acc_device_none may be returned". */ > + ; > else > { > acc_prof_info prof_info; > diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-cb-call= .c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-cb-call.c > new file mode 100644 > index 0000000..6114164 > --- /dev/null > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-cb-call.c > @@ -0,0 +1,39 @@ > +/* { dg-do run } */ > +/* { dg-timeout 10 } */ > + > +/* Test the calling of acc_get_device_type() from within the device_init= _start > + callback. This occurs when the CUDA 9.0 nvprof tool is used, and can > + deadlock if not handled properly. */ > + > +#include > + > +static acc_prof_reg reg; > +static acc_prof_reg unreg; > +static acc_prof_lookup_func lookup; > + > +void acc_register_library (acc_prof_reg reg_, acc_prof_reg unreg_, acc_p= rof_lookup_func lookup_) > +{ > + reg =3D reg_; > + unreg =3D unreg_; > + lookup =3D lookup_; > +} > + > +static acc_device_t acc_device_type; > + > +static void cb_device_init_start (acc_prof_info *prof_info, acc_event_in= fo *event_info, acc_api_info *api_info) > +{ > + acc_device_type =3D acc_get_device_type (); > +} > + > +int main(void) > +{ > + acc_register_library (acc_prof_register, acc_prof_unregister, acc_prof= _lookup); > + > + reg (acc_ev_device_init_start, cb_device_init_start, acc_reg); > + > + acc_init (acc_device_host); > + acc_shutdown (acc_device_host); > + > + acc_init (acc_device_default); > + acc_shutdown (acc_device_default); > +} ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstra=C3=9Fe 201, 80634 M=C3=BCnch= en / Germany Registergericht M=C3=BCnchen HRB 106955, Gesch=C3=A4ftsf=C3=BChrer: Thomas = Heurung, Alexander Walter --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename="0001-into-libgomp-Fix-hang-when-profiling-OpenACC-program.patch" >From 82e43a3263068e006dc96f9bf0ace033e45038ef Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Tue, 14 Jul 2020 12:43:53 +0200 Subject: [PATCH] into "libgomp: Fix hang when profiling OpenACC programs with CUDA 9.0 nvprof" --- libgomp/libgomp.texi | 11 +++ .../acc_prof-cb-call.c | 39 --------- .../acc_prof-init-2.c | 80 +++++++++++++++++++ 3 files changed, 91 insertions(+), 39 deletions(-) delete mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-cb-call.c create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-2.c diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index b946743f9b1..5331230c207 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -1967,6 +1967,12 @@ in @var{devicetype}, to use when executing a parallel or kernels region. This function returns what device type will be used when executing a parallel or kernels region. +This function returns @code{acc_device_none} if +@code{acc_get_device_type} is called from +@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end} +callbacks of the OpenACC Profiling Interface (@ref{OpenACC Profiling +Interface}), that is, if the device is currently being initialized. + @item @emph{C/C++}: @multitable @columnfractions .20 .80 @item @emph{Prototype}: @tab @code{acc_device_t acc_get_device_type(void);} @@ -3382,6 +3388,11 @@ every event that has been registered. We're not yet accounting for the fact that @cite{OpenACC events may occur during event processing}. +We just handle one case specially, as required by CUDA 9.0 +@command{nvprof}, that @code{acc_get_device_type} +(@ref{acc_get_device_type})) may be called from +@code{acc_ev_device_init_start}, @code{acc_ev_device_init_end} +callbacks. We're not yet implementing initialization via a @code{acc_register_library} function that is either statically linked diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-cb-call.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-cb-call.c deleted file mode 100644 index 6114164aa24..00000000000 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-cb-call.c +++ /dev/null @@ -1,39 +0,0 @@ -/* { dg-do run } */ -/* { dg-timeout 10 } */ - -/* Test the calling of acc_get_device_type() from within the device_init_start - callback. This occurs when the CUDA 9.0 nvprof tool is used, and can - deadlock if not handled properly. */ - -#include - -static acc_prof_reg reg; -static acc_prof_reg unreg; -static acc_prof_lookup_func lookup; - -void acc_register_library (acc_prof_reg reg_, acc_prof_reg unreg_, acc_prof_lookup_func lookup_) -{ - reg = reg_; - unreg = unreg_; - lookup = lookup_; -} - -static acc_device_t acc_device_type; - -static void cb_device_init_start (acc_prof_info *prof_info, acc_event_info *event_info, acc_api_info *api_info) -{ - acc_device_type = acc_get_device_type (); -} - -int main(void) -{ - acc_register_library (acc_prof_register, acc_prof_unregister, acc_prof_lookup); - - reg (acc_ev_device_init_start, cb_device_init_start, acc_reg); - - acc_init (acc_device_host); - acc_shutdown (acc_device_host); - - acc_init (acc_device_default); - acc_shutdown (acc_device_default); -} diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-2.c new file mode 100644 index 00000000000..fab595cd463 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-init-2.c @@ -0,0 +1,80 @@ +/* { dg-do run } */ +/* { dg-timeout 10 } */ + +/* Test the calling of 'acc_get_device_type' from within + 'cb_device_init_start', 'cb_device_init_end' callbacks. This occurs when + the CUDA 9.0 'nvprof' tool is used, and did deadlock. */ + +#include +#include +#include + +static acc_prof_reg reg; +static acc_prof_reg unreg; +static acc_prof_lookup_func lookup; + +void acc_register_library (acc_prof_reg reg_, acc_prof_reg unreg_, acc_prof_lookup_func lookup_) +{ + reg = reg_; + unreg = unreg_; + lookup = lookup_; +} + +static bool expect_cb_device_init_start; +static bool expect_cb_device_init_end; + +static void cb_device_init_start (acc_prof_info *prof_info, acc_event_info *event_info, acc_api_info *api_info) +{ + assert (expect_cb_device_init_start); + expect_cb_device_init_start = false; + + acc_device_t acc_device_type; + acc_device_type = acc_get_device_type (); + assert (acc_device_type == acc_device_none); + + expect_cb_device_init_end = true; +} + +static void cb_device_init_end (acc_prof_info *prof_info, acc_event_info *event_info, acc_api_info *api_info) +{ + assert (expect_cb_device_init_end); + expect_cb_device_init_end = false; + + acc_device_t acc_device_type; + acc_device_type = acc_get_device_type (); + assert (acc_device_type == acc_device_none); +} + +int main(void) +{ + acc_register_library (acc_prof_register, acc_prof_unregister, acc_prof_lookup); + + reg (acc_ev_device_init_start, cb_device_init_start, acc_reg); + reg (acc_ev_device_init_end, cb_device_init_end, acc_reg); + + expect_cb_device_init_start = true; + expect_cb_device_init_end = false; + acc_init (acc_device_host); + assert (!expect_cb_device_init_start); + assert (!expect_cb_device_init_end); + { + acc_device_t acc_device_type; + acc_device_type = acc_get_device_type (); + assert (acc_device_type == acc_device_host); + } + acc_shutdown (acc_device_host); + + expect_cb_device_init_start = true; + expect_cb_device_init_end = false; + acc_init (acc_device_default); + assert (!expect_cb_device_init_start); + assert (!expect_cb_device_init_end); + { + acc_device_t acc_device_type; + acc_device_type = acc_get_device_type (); + assert (acc_device_type != acc_device_none); + } + acc_shutdown (acc_device_default); + + return 0; +} -- 2.17.1 --=-=-=--