Hi Kwok! On 2019-06-24T20:37:12+0100, Kwok Cheung Yeung wrote: > On 17/06/2019 6:24 pm, Thomas Schwinge wrote: >>> Okay to push to openacc-gcc-9-branch? >> I think what would be best, is the following approach: [...] > I have now ported over the mainline patch over to OG9, plus an > additional patch on top of that to bring in the bits from OG8 that did > not make it upstream. > > I have dropped the differences in comments, TODOs, documentation etc. in > favour of the upstream patch. There are also various places where the > OG8 patch sets up profiling, then gotos the end of the function to tear > it down again, whereas the mainline version just aborts early without > setting up profiling in the first place - these I have also resolved in > favour of the mainline version. ACK, thanks! > I have rerun the libgomp testsuite with no regressions noted. (I didn't actually test, but just reviewed the code changes.) > Okay to push to openacc-gcc-9-branch? Yes, please push after re-testing with minor modifications in the second commit, as follows: > From 341285e282f5b7ed73daaeb9fd2f820dc1fe91f9 Mon Sep 17 00:00:00 2001 > From: Kwok Cheung Yeung > Date: Fri, 21 Jun 2019 10:40:38 -0700 > Subject: [PATCH 2/2] Add changes to profiling interface from OG8 branch > > This bundles up the parts of the profiling code from the OG8 branch that were > not included in the upstream patch. > +2017-02-28 Thomas Schwinge > + > + [...] > + * oacc-parallel.c (GOACC_parallel_keyed_internal): Set device_api for > + profiling. > --- a/libgomp/oacc-parallel.c > +++ b/libgomp/oacc-parallel.c > @@ -275,6 +275,8 @@ GOACC_parallel_keyed_internal (int flags_m, int params, void (*fn) (void *), > goacc_call_host_fn (fn, mapnum, hostaddrs, params); > goto out_prof; > } > + else if (profiling_p) > + api_info.device_api = acc_device_api_cuda; That change is not quite right, and I'm pretty sure it wasn't me who introduced that code ;-P -- but that can be resolved later. > --- a/libgomp/Makefile.in > +++ b/libgomp/Makefile.in > @@ -524,7 +525,7 @@ search_path = $(addprefix $(top_srcdir)/config/, $(config_path)) $(top_srcdir) \ > > fincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)$(MULTISUBDIR)/finclude > libsubincludedir = $(libdir)/gcc/$(target_alias)/$(gcc_version)/include > -libgomp_la_LIBADD = $(LIBFFI) > +@USE_LIBFFI_TRUE@libgomp_la_LIBADD = $(LIBFFI) That must've been an earlier commit not properly regenerating the file, but we shall not bother with that now. > diff --git a/libgomp/config/nvptx/oacc-profiling-acc_register_library.c b/libgomp/config/nvptx/oacc-profiling-acc_register_library.c > new file mode 100644 > index 0000000..e69de29 > diff --git a/libgomp/config/nvptx/oacc-profiling.c b/libgomp/config/nvptx/oacc-profiling.c > new file mode 100644 > index 0000000..e69de29 Not sure if these empty files are actually needed, but no harm done, so that can be resolved later. > --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c > @@ -41,6 +41,7 @@ static int state = -1; > static acc_device_t acc_device_type; > static int acc_device_num; > static int num_gangs, num_workers, vector_length; > +static int async; All these 'async' changes in this file logically belong into the respective commit of the OpenACC 'kernels' changes. It's not a problem to have them included here; we shall just try to remember to include them in the OpenACC 'kernels' trunk changes. (I've made a note; no need for you to re-test/re-post.) > @@ -165,6 +166,15 @@ int main() > for (int i = 0; i < N; ++i) > x[i] = i * i; > } > +#ifdef __OPTIMIZE__ > + /* TODO. With -O2 optimizations enabled, the compiler believes that here > + "state == 0" still holds. It's not yet clear what's going on. > + Mis-optimization across the GOMP function call boundary? Per its > + gcc/omp-builtins.def definition, BUILT_IN_GOACC_PARALLEL > + "GOACC_parallel_keyed" doesn't have a "leaf" attribute, so the compiler > + must expect calls back into this compilation unit? */ > + asm volatile ("" : : : "memory"); > +#endif That workaround is no longer needed given the more specific workaround that I've added, marked with "TODO PR90488". > @@ -184,12 +196,22 @@ int main() > for (int i = 0; i < N; ++i) > x[i] = i * i; > } > +#ifdef __OPTIMIZE__ > + /* TODO. With -O2 optimizations enabled, the compiler believes that here > + "state == 0" still holds. It's not yet clear what's going on. > + Mis-optimization across the GOMP function call boundary? Per its > + gcc/omp-builtins.def definition, BUILT_IN_GOACC_PARALLEL > + "GOACC_parallel_keyed" doesn't have a "leaf" attribute, so the compiler > + must expect calls back into this compilation unit? */ > + asm volatile ("" : : : "memory"); > +#endif Likewise. > @@ -209,12 +233,22 @@ int main() > for (int i = 0; i < N; ++i) > x[i] = i * i; > } > +#ifdef __OPTIMIZE__ > + /* TODO. With -O2 optimizations enabled, the compiler believes that here > + "state == 0" still holds. It's not yet clear what's going on. > + Mis-optimization across the GOMP function call boundary? Per its > + gcc/omp-builtins.def definition, BUILT_IN_GOACC_PARALLEL > + "GOACC_parallel_keyed" doesn't have a "leaf" attribute, so the compiler > + must expect calls back into this compilation unit? */ > + asm volatile ("" : : : "memory"); > +#endif Likewise. > --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c > @@ -697,6 +695,15 @@ int main() > } > assert (state_init == 4); > } > +#ifdef __OPTIMIZE__ > + /* TODO. With -O2 optimizations enabled, the compiler believes that here > + "state == 0" still holds. It's not yet clear what's going on. > + Mis-optimization across the GOMP function call boundary? Per its > + gcc/omp-builtins.def definition, BUILT_IN_GOACC_PARALLEL > + "GOACC_parallel_keyed" doesn't have a "leaf" attribute, so the compiler > + must expect calls back into this compilation unit? */ > + asm volatile ("" : : : "memory"); > +#endif Likewise. In 'libgomp.oacc-c-c++-common/acc_prof-version-1.c:main', we should also remove the explicit call to 'acc_register_library'. Grüße Thomas