* Re: [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717] [not found] <YmA9iaVvWhwVGFmT@tucnak> @ 2022-04-25 21:19 ` Thomas Schwinge 2022-04-26 17:25 ` Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717] (was: [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717]) Thomas Schwinge 0 siblings, 1 reply; 3+ messages in thread From: Thomas Schwinge @ 2022-04-25 21:19 UTC (permalink / raw) To: Jakub Jelinek, gcc-patches; +Cc: Tobias Burnus, fortran [-- Attachment #1: Type: text/plain, Size: 4575 bytes --] Hi! On 2022-04-20T19:06:17+0200, Jakub Jelinek <jakub@redhat.com> wrote: > So that move_sese_region_to_fn works properly, OpenMP/OpenACC constructs > for which that function is invoked need an extra artificial BIND_EXPR > around their body so that we move all variables of the bodies. > > The C/C++ FEs do that both for OpenMP constructs like OMP_PARALLEL, OMP_TASK > or OMP_TARGET and for OpenACC constructs that behave similarly to > OMP_TARGET, but the Fortran FE only does that for OpenMP constructs. > > The following patch does that for OpenACC constructs too. > This fixes ICE on the attached testcase. ACK, thanks. I do see that over the past years, there have been a number of similar fixes been applied for other OMP constructs -- maybe that interface should generally be made more obvious and fool-proof? > Unfortunately, it also regresses > FAIL: gfortran.dg/goacc/privatization-1-compute-loop.f90 -O (test for excess errors) > FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O0 (test for excess errors) > FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O1 (test for excess errors) > FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) > FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) > FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O3 -g (test for excess errors) > FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -Os (test for excess errors) > Those emits emit tons of various messages and now there are some extra ones, I've fixed these up. > the previous as well as new ones are mostly on artificial variables created > by the compiler, so I wonder if we should emit those at all. Yes, I did wonder about that, too. (This affects a lot of test cases.) > Anyway, here it is the patch, appart from those regressions passed > bootstrap/regtested on powerpc64le-linux. > > 2022-04-20 Jakub Jelinek <jakub@redhat.com> > > PR fortran/104717 > * trans-openmp.cc (gfc_trans_oacc_construct): Wrap construct body > in an extra BIND_EXPR. > > * gfortran.dg/goacc/pr104717.f90: New test. > > --- gcc/fortran/trans-openmp.cc.jj 2022-04-06 09:59:32.729654664 +0200 > +++ gcc/fortran/trans-openmp.cc 2022-04-20 12:48:19.773402677 +0200 > @@ -4444,7 +4444,9 @@ gfc_trans_oacc_construct (gfc_code *code > gfc_start_block (&block); > oacc_clauses = gfc_trans_omp_clauses (&block, code->ext.omp_clauses, > code->loc, false, true); > + pushlevel (); > stmt = gfc_trans_omp_code (code->block->next, true); > + stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0)); > stmt = build2_loc (gfc_get_location (&code->loc), construct_code, > void_type_node, stmt, oacc_clauses); > gfc_add_expr_to_block (&block, stmt); > --- gcc/testsuite/gfortran.dg/goacc/pr104717.f90.jj 2022-04-20 12:53:54.002748265 +0200 > +++ gcc/testsuite/gfortran.dg/goacc/pr104717.f90 2022-04-20 12:53:12.811321862 +0200 > @@ -0,0 +1,22 @@ > +! PR fortran/104717 > +! { dg-do compile } > +! { dg-options "-O1 -fopenacc -fstack-arrays" } As the ICE was "during IPA pass: pta", I've added an explicit '-fipa-pta' here. > + > +program main > + implicit none (type, external) > + integer :: j > + integer, allocatable :: A(:) > + > + A = [(3*j, j=1, 10)] > + call foo (A, size(A)) > + deallocate (A) > +contains > + subroutine foo (array, nn) > + integer :: i, nn > + integer :: array(nn) > + > + !$acc parallel copyout(array) > + array = [(-i, i = 1, nn)] > + !$acc end parallel > + end subroutine foo > +end Pushed to master branch commit b2202431910e30d8505c94d1cb9341cac7080d10 "fortran: Fix up gfc_trans_oacc_construct [PR104717]", see attached. Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-fortran-Fix-up-gfc_trans_oacc_construct-PR104717.patch --] [-- Type: text/x-diff, Size: 7504 bytes --] From b2202431910e30d8505c94d1cb9341cac7080d10 Mon Sep 17 00:00:00 2001 From: Jakub Jelinek <jakub@redhat.com> Date: Wed, 20 Apr 2022 19:06:17 +0200 Subject: [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717] So that move_sese_region_to_fn works properly, OpenMP/OpenACC constructs for which that function is invoked need an extra artificial BIND_EXPR around their body so that we move all variables of the bodies. The C/C++ FEs do that both for OpenMP constructs like OMP_PARALLEL, OMP_TASK or OMP_TARGET and for OpenACC constructs that behave similarly to OMP_TARGET, but the Fortran FE only does that for OpenMP constructs. The following patch does that for OpenACC constructs too. PR fortran/104717 gcc/fortran/ * trans-openmp.cc (gfc_trans_oacc_construct): Wrap construct body in an extra BIND_EXPR. gcc/testsuite/ * gfortran.dg/goacc/pr104717.f90: New test. * gfortran.dg/goacc/privatization-1-compute-loop.f90: Adjust. libgomp/ * testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: Adjust. Co-authored-by: Thomas Schwinge <thomas@codesourcery.com> --- gcc/fortran/trans-openmp.cc | 2 ++ gcc/testsuite/gfortran.dg/goacc/pr104717.f90 | 22 +++++++++++++++++++ .../goacc/privatization-1-compute-loop.f90 | 7 +++--- .../libgomp.oacc-fortran/privatized-ref-2.f90 | 7 ++++++ 4 files changed, 35 insertions(+), 3 deletions(-) create mode 100644 gcc/testsuite/gfortran.dg/goacc/pr104717.f90 diff --git a/gcc/fortran/trans-openmp.cc b/gcc/fortran/trans-openmp.cc index 25dde826146..43d59abe9e0 100644 --- a/gcc/fortran/trans-openmp.cc +++ b/gcc/fortran/trans-openmp.cc @@ -4444,7 +4444,9 @@ gfc_trans_oacc_construct (gfc_code *code) gfc_start_block (&block); oacc_clauses = gfc_trans_omp_clauses (&block, code->ext.omp_clauses, code->loc, false, true); + pushlevel (); stmt = gfc_trans_omp_code (code->block->next, true); + stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0)); stmt = build2_loc (gfc_get_location (&code->loc), construct_code, void_type_node, stmt, oacc_clauses); gfc_add_expr_to_block (&block, stmt); diff --git a/gcc/testsuite/gfortran.dg/goacc/pr104717.f90 b/gcc/testsuite/gfortran.dg/goacc/pr104717.f90 new file mode 100644 index 00000000000..4ef16187c84 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/pr104717.f90 @@ -0,0 +1,22 @@ +! Extracted from 'libgomp.oacc-fortran/privatized-ref-2.f90'. + +! { dg-additional-options "-O1 -fstack-arrays -fipa-pta" } + +program main + implicit none (type, external) + integer :: j + integer, allocatable :: A(:) + + A = [(3*j, j=1, 10)] + call foo (A, size(A)) + deallocate (A) +contains + subroutine foo (array, nn) + integer :: i, nn + integer :: array(nn) + + !$acc parallel copyout(array) + array = [(-i, i = 1, nn)] + !$acc end parallel + end subroutine foo +end diff --git a/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute-loop.f90 b/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute-loop.f90 index 4dfeb7e07a2..13772c185ce 100644 --- a/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute-loop.f90 +++ b/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute-loop.f90 @@ -13,7 +13,7 @@ ! passed to 'incr' may be unset, and in that case, it will be set to [...]", ! so to maintain compatibility with earlier Tcl releases, we manually ! initialize counter variables: -! { dg-line l_dummy[variable c_loop 0] } +! { dg-line l_dummy[variable c_compute 0 c_loop 0] } ! { dg-message "dummy" "" { target iN-VAl-Id } l_dummy } to avoid ! "WARNING: dg-line var l_dummy defined, but not used". @@ -26,7 +26,7 @@ contains integer, parameter :: c = 3 integer, external :: g - !$acc parallel + !$acc parallel ! { dg-line l_compute[incr c_compute] } !$acc loop collapse(2) private(a) private(x, y) ! { dg-line l_loop[incr c_loop] } do i = 1, 20 do j = 1, 25 @@ -46,6 +46,8 @@ contains y = a end do end do + !$acc end parallel + ! { dg-note {variable 'count\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } ! { dg-note {variable 'count\.[0-9]+' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop } ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop } ! { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop$c_loop } @@ -54,6 +56,5 @@ contains ! { dg-note {variable 'y' in 'private' clause is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_loop$c_loop } ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_loop$c_loop } ! { dg-note {variable 'y' ought to be adjusted for OpenACC privatization level: 'vector'} "" { target *-*-* } l_loop$c_loop } - !$acc end parallel end subroutine f end module m diff --git a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 index 1d91e115d9f..b31f4066152 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 +++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 @@ -53,6 +53,9 @@ contains integer :: array(nn) !$acc parallel copyout(array) ! { dg-line l_compute[incr c_compute] } + ! { dg-note {variable 'atmp\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } + ! { dg-note {variable 'shadow_loopvar\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } + ! { dg-note {variable 'offset\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } ! { dg-note {variable 'S\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } array = [(-i, i = 1, nn)] !$acc end parallel @@ -81,6 +84,10 @@ contains integer :: array(:) !$acc parallel copyout(array) ! { dg-line l_compute[incr c_compute] } + ! { dg-note {variable 'atmp\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } + ! { dg-note {variable 'parm\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } + ! { dg-note {variable 'shadow_loopvar\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } + ! { dg-note {variable 'offset\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } ! { dg-note {variable 'S\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } array = [(-2*i, i = 1, size(array))] !$acc end parallel -- 2.25.1 ^ permalink raw reply [flat|nested] 3+ messages in thread
* Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717] (was: [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717]) 2022-04-25 21:19 ` [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717] Thomas Schwinge @ 2022-04-26 17:25 ` Thomas Schwinge 2022-04-26 17:51 ` Thomas Schwinge 0 siblings, 1 reply; 3+ messages in thread From: Thomas Schwinge @ 2022-04-26 17:25 UTC (permalink / raw) To: gcc-patches, Julian Brown, Andrew Stubbs Cc: Jakub Jelinek, Tobias Burnus, fortran [-- Attachment #1: Type: text/plain, Size: 5305 bytes --] Hi! On 2022-04-25T23:19:26+0200, I wrote: > On 2022-04-20T19:06:17+0200, Jakub Jelinek <jakub@redhat.com> wrote: >> So that move_sese_region_to_fn works properly, OpenMP/OpenACC constructs >> for which that function is invoked need an extra artificial BIND_EXPR >> around their body so that we move all variables of the bodies. >> >> The C/C++ FEs do that both for OpenMP constructs like OMP_PARALLEL, OMP_TASK >> or OMP_TARGET and for OpenACC constructs that behave similarly to >> OMP_TARGET, but the Fortran FE only does that for OpenMP constructs. >> >> The following patch does that for OpenACC constructs too. >> This fixes ICE on the attached testcase. > > ACK, thanks. >> Unfortunately, it also regresses >> FAIL: gfortran.dg/goacc/privatization-1-compute-loop.f90 -O (test for excess errors) >> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O0 (test for excess errors) >> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O1 (test for excess errors) >> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) >> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) >> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O3 -g (test for excess errors) >> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -Os (test for excess errors) >> Those emits emit tons of various messages and now there are some extra ones, > > I've fixed these up. One more issue became apparent, where the code changes pushed actually do lead to a GCN offloading compilation failure: [...]/libgomp.oacc-fortran/print-1.f90: In function ‘MAIN__._omp_fn.0’: [...]/libgomp.oacc-fortran/print-1.f90:13:14: error: 512 bytes of gang-private data-share memory exhausted (increase with ‘-mgang-private-size=560’, for example) 13 | !$acc parallel | ^ In my configuration, I may indeed fix GCN offloading compilation with '-foffload-options=amdgcn-amdhsa=-mgang-private-size=560', but I don't think that's generally correct/sufficient, so in the the attached "Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717]", I instead "raise '-mgang-private-size' to an arbitrary high value". This avoids having to route the actual 'sizeof' from GCC build down to the test suite harness (which ought to be doable, but non-trivial). OK to push that: +! For GCN offloading compilation, when gang-privatizing 'dt_parm.N' +! (see below), we run into an 'gang-private data-share memory exhausted' +! error: the default '-mgang-private-size' is too small. Per +! 'gcc/fortran/trans-io.cc'/'libgfortran/io/io.h', that one is +! 'struct st_parameter_dt', which indeed is rather big. Instead of +! working out its exact size (which may vary per GCC configuration), +! raise '-mgang-private-size' to an arbitrary high value. +! { dg-additional-options "-foffload-options=amdgcn-amdhsa=-mgang-private-size=13579" { target openacc_radeon_accel_selected } } ... to master branch? (This doubles the use/testing of the '-mgang-private-size' option!) ;-) We've currently not been doing OpenACC privatization scanning in 'libgomp.oacc-fortran/print-1.f90', which I've now added, to help document the issue; no need to review that. Of course, the issue could alternatively be fixed by adding more logic to the GCN back end to auto-scale the allocation, or be fixed by adding more logic to the compiler to avoid gang-privatizing varibales such as 'dt_parm.N' in such cases, but that's not something I'm going to look into at this point. Or, of course, be avoided by re-writing the test case to not require gang-privatizing 'dt_parm.N', but the test case is correct as it is. Grüße Thomas > PR fortran/104717 > gcc/fortran/ > * trans-openmp.cc (gfc_trans_oacc_construct): Wrap construct body > in an extra BIND_EXPR. > --- a/gcc/fortran/trans-openmp.cc > +++ b/gcc/fortran/trans-openmp.cc > @@ -4444,7 +4444,9 @@ gfc_trans_oacc_construct (gfc_code *code) > gfc_start_block (&block); > oacc_clauses = gfc_trans_omp_clauses (&block, code->ext.omp_clauses, > code->loc, false, true); > + pushlevel (); > stmt = gfc_trans_omp_code (code->block->next, true); > + stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0)); > stmt = build2_loc (gfc_get_location (&code->loc), construct_code, > void_type_node, stmt, oacc_clauses); > gfc_add_expr_to_block (&block, stmt); ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 0001-Fix-up-libgomp.oacc-fortran-print-1.f90-GCN-offloadi.patch --] [-- Type: text/x-diff, Size: 3300 bytes --] From 3dfea06371aa9bcc84ad75a2bc821a45e131dca6 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge <thomas@codesourcery.com> Date: Tue, 26 Apr 2022 18:41:23 +0200 Subject: [PATCH] Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717] That got broken by recent commit b2202431910e30d8505c94d1cb9341cac7080d10 "fortran: Fix up gfc_trans_oacc_construct [PR104717]". PR fortran/104717 libgomp/ * testsuite/libgomp.oacc-fortran/print-1.f90: Add OpenACC privatization scanning. For GCN offloading compilation, raise '-mgang-private-size'. --- .../libgomp.oacc-fortran/print-1.f90 | 30 ++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/libgomp/testsuite/libgomp.oacc-fortran/print-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/print-1.f90 index 7b7f73741fe..42a8538e1fb 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/print-1.f90 +++ b/libgomp/testsuite/libgomp.oacc-fortran/print-1.f90 @@ -6,11 +6,39 @@ ! Separate file 'print-1-nvptx.f90' for nvptx offloading. ! { dg-skip-if "separate file" { offload_target_nvptx } } +! For GCN offloading compilation, when gang-privatizing 'dt_parm.N' +! (see below), we run into an 'gang-private data-share memory exhausted' +! error: the default '-mgang-private-size' is too small. Per +! 'gcc/fortran/trans-io.cc'/'libgfortran/io/io.h', that one is +! 'struct st_parameter_dt', which indeed is rather big. Instead of +! working out its exact size (which may vary per GCC configuration), +! raise '-mgang-private-size' to an arbitrary high value. +! { dg-additional-options "-foffload-options=amdgcn-amdhsa=-mgang-private-size=13579" { target openacc_radeon_accel_selected } } + +! { dg-additional-options "-fopt-info-note-omp" } +! { dg-additional-options "-foffload=-fopt-info-note-omp" } + +! { dg-additional-options "--param=openacc-privatization=noisy" } +! { dg-additional-options "-foffload=--param=openacc-privatization=noisy" } +! Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types): +! { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */ + +! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName' +! passed to 'incr' may be unset, and in that case, it will be set to [...]", +! so to maintain compatibility with earlier Tcl releases, we manually +! initialize counter variables: +! { dg-line l_dummy[variable c_compute 0] } +! { dg-message dummy {} { target iN-VAl-Id } l_dummy } to avoid +! "WARNING: dg-line var l_dummy defined, but not used". + program main implicit none integer :: var = 42 -!$acc parallel +!$acc parallel ! { dg-line l_compute[incr c_compute] } + ! { dg-note {variable 'dt_parm\.[0-9]+' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute$c_compute } + ! { dg-note {variable 'dt_parm\.[0-9]+' ought to be adjusted for OpenACC privatization level: 'gang'} {} { target *-*-* } l_compute$c_compute } + ! { dg-note {variable 'dt_parm\.[0-9]+' adjusted for OpenACC privatization level: 'gang'} {} { target { ! openacc_host_selected } } l_compute$c_compute } write (0, '("The answer is ", I2)') var !$acc end parallel -- 2.25.1 ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717] (was: [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717]) 2022-04-26 17:25 ` Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717] (was: [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717]) Thomas Schwinge @ 2022-04-26 17:51 ` Thomas Schwinge 0 siblings, 0 replies; 3+ messages in thread From: Thomas Schwinge @ 2022-04-26 17:51 UTC (permalink / raw) To: gcc-patches, Julian Brown, Andrew Stubbs Cc: Jakub Jelinek, Tobias Burnus, fortran Hi! On 2022-04-26T19:25:31+0200, I wrote: > On 2022-04-25T23:19:26+0200, I wrote: >> On 2022-04-20T19:06:17+0200, Jakub Jelinek <jakub@redhat.com> wrote: >>> So that move_sese_region_to_fn works properly, OpenMP/OpenACC constructs >>> for which that function is invoked need an extra artificial BIND_EXPR >>> around their body so that we move all variables of the bodies. >>> >>> The C/C++ FEs do that both for OpenMP constructs like OMP_PARALLEL, OMP_TASK >>> or OMP_TARGET and for OpenACC constructs that behave similarly to >>> OMP_TARGET, but the Fortran FE only does that for OpenMP constructs. >>> >>> The following patch does that for OpenACC constructs too. >>> This fixes ICE on the attached testcase. >> >> ACK, thanks. > >>> Unfortunately, it also regresses >>> FAIL: gfortran.dg/goacc/privatization-1-compute-loop.f90 -O (test for excess errors) >>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O0 (test for excess errors) >>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O1 (test for excess errors) >>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors) >>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors) >>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O3 -g (test for excess errors) >>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -Os (test for excess errors) >>> Those emits emit tons of various messages and now there are some extra ones, >> >> I've fixed these up. > > One more issue became apparent, where the code changes pushed actually do > lead to a GCN offloading compilation failure: > > [...]/libgomp.oacc-fortran/print-1.f90: In function ‘MAIN__._omp_fn.0’: > [...]/libgomp.oacc-fortran/print-1.f90:13:14: error: 512 bytes of gang-private data-share memory exhausted (increase with ‘-mgang-private-size=560’, for example) > 13 | !$acc parallel > | ^ > > In my configuration, I may indeed fix GCN offloading compilation with > '-foffload-options=amdgcn-amdhsa=-mgang-private-size=560', but I don't > think that's generally correct/sufficient, so in the the attached > "Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation > [PR104717]", I instead "raise '-mgang-private-size' to an arbitrary high > value". This avoids having to route the actual 'sizeof' from GCC build > down to the test suite harness (which ought to be doable, but > non-trivial). OK to push that: > > +! For GCN offloading compilation, when gang-privatizing 'dt_parm.N' > +! (see below), we run into an 'gang-private data-share memory exhausted' > +! error: the default '-mgang-private-size' is too small. Per > +! 'gcc/fortran/trans-io.cc'/'libgfortran/io/io.h', that one is > +! 'struct st_parameter_dt', which indeed is rather big. Instead of > +! working out its exact size (which may vary per GCC configuration), > +! raise '-mgang-private-size' to an arbitrary high value. > +! { dg-additional-options "-foffload-options=amdgcn-amdhsa=-mgang-private-size=13579" { target openacc_radeon_accel_selected } } > > ... to master branch? (This doubles the use/testing of the > '-mgang-private-size' option!) ;-) Eh. That only works with the default GCN multilib '-march=fiji', testing on gfx803 amdfury2 system. For all of '-march=gfx900' (amdnano2), '-march=gfx906' (amd_ryzen3), '-march=gfx908' (amd-instinct1), I get: libgomp: GCN fatal error: Asynchronous queue error Runtime message: HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address. ..., and I still get that if lowering the allocation to the minimum, '-foffload-options=amdgcn-amdhsa=-mgang-private-size=560'. This is a really simple OpenACC 'parallel' construct: !$acc parallel write (0, '("The answer is ", I2)') var !$acc end parallel ..., which ought to launch a 1-gang x 1-worker x 1-vector GPU kernel, so I'd assume '-mgang-private-size=560' (or '-mgang-private-size=13579' in fact) is not a problem? Help? Grüße Thomas > We've currently not been doing OpenACC privatization scanning in > 'libgomp.oacc-fortran/print-1.f90', which I've now added, to help > document the issue; no need to review that. > > Of course, the issue could alternatively be fixed by adding more logic to > the GCN back end to auto-scale the allocation, or be fixed by adding more > logic to the compiler to avoid gang-privatizing varibales such as > 'dt_parm.N' in such cases, but that's not something I'm going to look > into at this point. > > Or, of course, be avoided by re-writing the test case to not require > gang-privatizing 'dt_parm.N', but the test case is correct as it is. > > > Grüße > Thomas > > >> PR fortran/104717 >> gcc/fortran/ >> * trans-openmp.cc (gfc_trans_oacc_construct): Wrap construct body >> in an extra BIND_EXPR. > >> --- a/gcc/fortran/trans-openmp.cc >> +++ b/gcc/fortran/trans-openmp.cc >> @@ -4444,7 +4444,9 @@ gfc_trans_oacc_construct (gfc_code *code) >> gfc_start_block (&block); >> oacc_clauses = gfc_trans_omp_clauses (&block, code->ext.omp_clauses, >> code->loc, false, true); >> + pushlevel (); >> stmt = gfc_trans_omp_code (code->block->next, true); >> + stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0)); >> stmt = build2_loc (gfc_get_location (&code->loc), construct_code, >> void_type_node, stmt, oacc_clauses); >> gfc_add_expr_to_block (&block, stmt); ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-04-26 17:51 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <YmA9iaVvWhwVGFmT@tucnak> 2022-04-25 21:19 ` [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717] Thomas Schwinge 2022-04-26 17:25 ` Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717] (was: [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717]) Thomas Schwinge 2022-04-26 17:51 ` Thomas Schwinge
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).