From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 76BD93858C83; Tue, 26 Apr 2022 17:51:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 76BD93858C83 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.90,291,1643702400"; d="scan'208";a="77573372" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa1.mentor.iphmx.com with ESMTP; 26 Apr 2022 09:51:28 -0800 IronPort-SDR: Nu6qkzbrV2okgKGu+Cbz3BFsMJvSsfH8vkVMhOXtE+X0txZOBaux7HNtBaX3ozQGa2TySrLa3k 5COG6zAGNWxmMtu+1P6foN9bwiD9gHfI9EKSlUBKie2VxM3+2+wHxeghW8XRz5rOh9mWro44AE VeKfCKQsc5oCUuFVX71zMPe7FCcUNv52hFRcP1FgiyWjM4kP/PkvCVYM8CU+GHDh9/0dImNZ+V uDvBOlF69aP6oGLgjPtks+uS5wF1pbUgn15fPLkEci2vO78el0CSAiyZZLy25QutVIuHMI7p4v +IM= From: Thomas Schwinge To: , Julian Brown , Andrew Stubbs CC: Jakub Jelinek , Tobias Burnus , Subject: Re: Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717] (was: [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717]) In-Reply-To: <87levrsqno.fsf@dem-tschwing-1.ger.mentorg.com> References: <87tuagsvxd.fsf@dem-tschwing-1.ger.mentorg.com> <87levrsqno.fsf@dem-tschwing-1.ger.mentorg.com> User-Agent: Notmuch/0.29.1+93~g67ed7df (https://notmuchmail.org) Emacs/26.3 (x86_64-pc-linux-gnu) Date: Tue, 26 Apr 2022 19:51:19 +0200 Message-ID: <87fslzspgo.fsf@dem-tschwing-1.ger.mentorg.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) X-Spam-Status: No, score=-6.0 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: fortran@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Fortran mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 26 Apr 2022 17:51:41 -0000 Hi! On 2022-04-26T19:25:31+0200, I wrote: > On 2022-04-25T23:19:26+0200, I wrote: >> On 2022-04-20T19:06:17+0200, Jakub Jelinek wrote: >>> So that move_sese_region_to_fn works properly, OpenMP/OpenACC construct= s >>> for which that function is invoked need an extra artificial BIND_EXPR >>> around their body so that we move all variables of the bodies. >>> >>> The C/C++ FEs do that both for OpenMP constructs like OMP_PARALLEL, OMP= _TASK >>> or OMP_TARGET and for OpenACC constructs that behave similarly to >>> OMP_TARGET, but the Fortran FE only does that for OpenMP constructs. >>> >>> The following patch does that for OpenACC constructs too. >>> This fixes ICE on the attached testcase. >> >> ACK, thanks. > >>> Unfortunately, it also regresses >>> FAIL: gfortran.dg/goacc/privatization-1-compute-loop.f90 -O (test fo= r excess errors) >>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host= =3D1 -DACC_MEM_SHARED=3D1 -foffload=3Ddisable -O0 (test for excess errors= ) >>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host= =3D1 -DACC_MEM_SHARED=3D1 -foffload=3Ddisable -O1 (test for excess errors= ) >>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host= =3D1 -DACC_MEM_SHARED=3D1 -foffload=3Ddisable -O2 (test for excess errors= ) >>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host= =3D1 -DACC_MEM_SHARED=3D1 -foffload=3Ddisable -O3 -fomit-frame-pointer -fu= nroll-loops -fpeel-loops -ftracer -finline-functions (test for excess erro= rs) >>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host= =3D1 -DACC_MEM_SHARED=3D1 -foffload=3Ddisable -O3 -g (test for excess err= ors) >>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host= =3D1 -DACC_MEM_SHARED=3D1 -foffload=3Ddisable -Os (test for excess errors= ) >>> Those emits emit tons of various messages and now there are some extra = ones, >> >> I've fixed these up. > > One more issue became apparent, where the code changes pushed actually do > lead to a GCN offloading compilation failure: > > [...]/libgomp.oacc-fortran/print-1.f90: In function =E2=80=98MAIN__._= omp_fn.0=E2=80=99: > [...]/libgomp.oacc-fortran/print-1.f90:13:14: error: 512 bytes of gan= g-private data-share memory exhausted (increase with =E2=80=98-mgang-privat= e-size=3D560=E2=80=99, for example) > 13 | !$acc parallel > | ^ > > In my configuration, I may indeed fix GCN offloading compilation with > '-foffload-options=3Damdgcn-amdhsa=3D-mgang-private-size=3D560', but I do= n't > think that's generally correct/sufficient, so in the the attached > "Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation > [PR104717]", I instead "raise '-mgang-private-size' to an arbitrary high > value". This avoids having to route the actual 'sizeof' from GCC build > down to the test suite harness (which ought to be doable, but > non-trivial). OK to push that: > > +! For GCN offloading compilation, when gang-privatizing 'dt_parm.N' > +! (see below), we run into an 'gang-private data-share memory exhaus= ted' > +! error: the default '-mgang-private-size' is too small. Per > +! 'gcc/fortran/trans-io.cc'/'libgfortran/io/io.h', that one is > +! 'struct st_parameter_dt', which indeed is rather big. Instead of > +! working out its exact size (which may vary per GCC configuration), > +! raise '-mgang-private-size' to an arbitrary high value. > +! { dg-additional-options "-foffload-options=3Damdgcn-amdhsa=3D-mgan= g-private-size=3D13579" { target openacc_radeon_accel_selected } } > > ... to master branch? (This doubles the use/testing of the > '-mgang-private-size' option!) ;-) Eh. That only works with the default GCN multilib '-march=3Dfiji', testing on gfx803 amdfury2 system. For all of '-march=3Dgfx900' (amdnano2), '-march=3Dgfx906' (amd_ryzen3), '-march=3Dgfx908' (amd-instinct1), I get: libgomp: GCN fatal error: Asynchronous queue error Runtime message: HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent = attempted to access memory beyond the largest legal address. ..., and I still get that if lowering the allocation to the minimum, '-foffload-options=3Damdgcn-amdhsa=3D-mgang-private-size=3D560'. This is a really simple OpenACC 'parallel' construct: !$acc parallel write (0, '("The answer is ", I2)') var !$acc end parallel ..., which ought to launch a 1-gang x 1-worker x 1-vector GPU kernel, so I'd assume '-mgang-private-size=3D560' (or '-mgang-private-size=3D13579' in fact) is not a problem? Help? Gr=C3=BC=C3=9Fe Thomas > We've currently not been doing OpenACC privatization scanning in > 'libgomp.oacc-fortran/print-1.f90', which I've now added, to help > document the issue; no need to review that. > > Of course, the issue could alternatively be fixed by adding more logic to > the GCN back end to auto-scale the allocation, or be fixed by adding more > logic to the compiler to avoid gang-privatizing varibales such as > 'dt_parm.N' in such cases, but that's not something I'm going to look > into at this point. > > Or, of course, be avoided by re-writing the test case to not require > gang-privatizing 'dt_parm.N', but the test case is correct as it is. > > > Gr=C3=BC=C3=9Fe > Thomas > > >> PR fortran/104717 >> gcc/fortran/ >> * trans-openmp.cc (gfc_trans_oacc_construct): Wrap construct body >> in an extra BIND_EXPR. > >> --- a/gcc/fortran/trans-openmp.cc >> +++ b/gcc/fortran/trans-openmp.cc >> @@ -4444,7 +4444,9 @@ gfc_trans_oacc_construct (gfc_code *code) >> gfc_start_block (&block); >> oacc_clauses =3D gfc_trans_omp_clauses (&block, code->ext.omp_clauses= , >> code->loc, false, true); >> + pushlevel (); >> stmt =3D gfc_trans_omp_code (code->block->next, true); >> + stmt =3D build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0)); >> stmt =3D build2_loc (gfc_get_location (&code->loc), construct_code, >> void_type_node, stmt, oacc_clauses); >> gfc_add_expr_to_block (&block, stmt); ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955