On Wed, 30 Jun 2021 11:40:33 +0100 Julian Brown wrote: > On Wed, 30 Jun 2021 10:28:00 +0200 > Thomas Schwinge wrote: > > > > - The OpenACC profiling-interface implementation did not measure > > > asynchronous operations properly. > > > > We'll need to be careful: (possibly, an older version of) that one > > we internally had identified to be causing some issues; see the > > "acc_prof-parallel-1.c intermittent failure on og10 branch" emails, > > 2020-07. > > Hmm, I'll check those. The problem here is that the async callbacks now execute in a different thread to the main program, so the direct sharing of the 'state' variable isn't safe. (I verified that by observing the result of "pthread_self ()" calls from the main thread and from the callback.) The attached patch appears to make the test run reliably on mainline (which still exhibits the failure with the parent patch series, very intermittently). A better solution might be to use the memory-model builtins for all 'state' variable accesses though. I think the async profiling callbacks *have to* run in a different thread to the main program, which would make this a testcase bug (the spec doesn't explicitly say this as of 3.0 though). However there might be an argument for making "acc_wait" and friends thread barriers with respect to the host (i.e. calling __atomic_thread_fence in the appropriate place in libgomp) -- otherwise you have "break out of the abstraction" provided by OpenACC and rely on a non-OpenACC API in order to observe any results measured in the async profiling callbacks. OTOH the memory-model stuff is part of C now, so maybe that's fine (and also, I'm doubtful that just adding the barrier and using regular global variable accesses is sufficient to ensure thread safety anyway). Thoughts? Thanks, Julian