From: Thomas Schwinge <thomas@codesourcery.com>
To: <gcc-patches@gcc.gnu.org>, Julian Brown <julian@codesourcery.com>,
Andrew Stubbs <ams@codesourcery.com>
Cc: Jakub Jelinek <jakub@redhat.com>,
Tobias Burnus <tobias@codesourcery.com>, <fortran@gcc.gnu.org>
Subject: Re: Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717] (was: [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717])
Date: Tue, 26 Apr 2022 19:51:19 +0200 [thread overview]
Message-ID: <87fslzspgo.fsf@dem-tschwing-1.ger.mentorg.com> (raw)
In-Reply-To: <87levrsqno.fsf@dem-tschwing-1.ger.mentorg.com>
Hi!
On 2022-04-26T19:25:31+0200, I wrote:
> On 2022-04-25T23:19:26+0200, I wrote:
>> On 2022-04-20T19:06:17+0200, Jakub Jelinek <jakub@redhat.com> wrote:
>>> So that move_sese_region_to_fn works properly, OpenMP/OpenACC constructs
>>> for which that function is invoked need an extra artificial BIND_EXPR
>>> around their body so that we move all variables of the bodies.
>>>
>>> The C/C++ FEs do that both for OpenMP constructs like OMP_PARALLEL, OMP_TASK
>>> or OMP_TARGET and for OpenACC constructs that behave similarly to
>>> OMP_TARGET, but the Fortran FE only does that for OpenMP constructs.
>>>
>>> The following patch does that for OpenACC constructs too.
>>> This fixes ICE on the attached testcase.
>>
>> ACK, thanks.
>
>>> Unfortunately, it also regresses
>>> FAIL: gfortran.dg/goacc/privatization-1-compute-loop.f90 -O (test for excess errors)
>>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O0 (test for excess errors)
>>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O1 (test for excess errors)
>>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O2 (test for excess errors)
>>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O3 -fomit-frame-pointer -funroll-loops -fpeel-loops -ftracer -finline-functions (test for excess errors)
>>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -O3 -g (test for excess errors)
>>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -Os (test for excess errors)
>>> Those emits emit tons of various messages and now there are some extra ones,
>>
>> I've fixed these up.
>
> One more issue became apparent, where the code changes pushed actually do
> lead to a GCN offloading compilation failure:
>
> [...]/libgomp.oacc-fortran/print-1.f90: In function ‘MAIN__._omp_fn.0’:
> [...]/libgomp.oacc-fortran/print-1.f90:13:14: error: 512 bytes of gang-private data-share memory exhausted (increase with ‘-mgang-private-size=560’, for example)
> 13 | !$acc parallel
> | ^
>
> In my configuration, I may indeed fix GCN offloading compilation with
> '-foffload-options=amdgcn-amdhsa=-mgang-private-size=560', but I don't
> think that's generally correct/sufficient, so in the the attached
> "Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation
> [PR104717]", I instead "raise '-mgang-private-size' to an arbitrary high
> value". This avoids having to route the actual 'sizeof' from GCC build
> down to the test suite harness (which ought to be doable, but
> non-trivial). OK to push that:
>
> +! For GCN offloading compilation, when gang-privatizing 'dt_parm.N'
> +! (see below), we run into an 'gang-private data-share memory exhausted'
> +! error: the default '-mgang-private-size' is too small. Per
> +! 'gcc/fortran/trans-io.cc'/'libgfortran/io/io.h', that one is
> +! 'struct st_parameter_dt', which indeed is rather big. Instead of
> +! working out its exact size (which may vary per GCC configuration),
> +! raise '-mgang-private-size' to an arbitrary high value.
> +! { dg-additional-options "-foffload-options=amdgcn-amdhsa=-mgang-private-size=13579" { target openacc_radeon_accel_selected } }
>
> ... to master branch? (This doubles the use/testing of the
> '-mgang-private-size' option!) ;-)
Eh. That only works with the default GCN multilib '-march=fiji', testing
on gfx803 amdfury2 system. For all of '-march=gfx900' (amdnano2),
'-march=gfx906' (amd_ryzen3), '-march=gfx908' (amd-instinct1), I get:
libgomp: GCN fatal error: Asynchronous queue error
Runtime message: HSA_STATUS_ERROR_MEMORY_APERTURE_VIOLATION: The agent attempted to access memory beyond the largest legal address.
..., and I still get that if lowering the allocation to the minimum,
'-foffload-options=amdgcn-amdhsa=-mgang-private-size=560'.
This is a really simple OpenACC 'parallel' construct:
!$acc parallel
write (0, '("The answer is ", I2)') var
!$acc end parallel
..., which ought to launch a 1-gang x 1-worker x 1-vector GPU kernel, so
I'd assume '-mgang-private-size=560' (or '-mgang-private-size=13579' in
fact) is not a problem?
Help?
Grüße
Thomas
> We've currently not been doing OpenACC privatization scanning in
> 'libgomp.oacc-fortran/print-1.f90', which I've now added, to help
> document the issue; no need to review that.
>
> Of course, the issue could alternatively be fixed by adding more logic to
> the GCN back end to auto-scale the allocation, or be fixed by adding more
> logic to the compiler to avoid gang-privatizing varibales such as
> 'dt_parm.N' in such cases, but that's not something I'm going to look
> into at this point.
>
> Or, of course, be avoided by re-writing the test case to not require
> gang-privatizing 'dt_parm.N', but the test case is correct as it is.
>
>
> Grüße
> Thomas
>
>
>> PR fortran/104717
>> gcc/fortran/
>> * trans-openmp.cc (gfc_trans_oacc_construct): Wrap construct body
>> in an extra BIND_EXPR.
>
>> --- a/gcc/fortran/trans-openmp.cc
>> +++ b/gcc/fortran/trans-openmp.cc
>> @@ -4444,7 +4444,9 @@ gfc_trans_oacc_construct (gfc_code *code)
>> gfc_start_block (&block);
>> oacc_clauses = gfc_trans_omp_clauses (&block, code->ext.omp_clauses,
>> code->loc, false, true);
>> + pushlevel ();
>> stmt = gfc_trans_omp_code (code->block->next, true);
>> + stmt = build3_v (BIND_EXPR, NULL, stmt, poplevel (1, 0));
>> stmt = build2_loc (gfc_get_location (&code->loc), construct_code,
>> void_type_node, stmt, oacc_clauses);
>> gfc_add_expr_to_block (&block, stmt);
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
prev parent reply other threads:[~2022-04-26 17:51 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <YmA9iaVvWhwVGFmT@tucnak>
2022-04-25 21:19 ` [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717] Thomas Schwinge
2022-04-26 17:25 ` Fix up 'libgomp.oacc-fortran/print-1.f90' GCN offloading compilation [PR104717] (was: [PATCH] fortran: Fix up gfc_trans_oacc_construct [PR104717]) Thomas Schwinge
2022-04-26 17:51 ` Thomas Schwinge [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87fslzspgo.fsf@dem-tschwing-1.ger.mentorg.com \
--to=thomas@codesourcery.com \
--cc=ams@codesourcery.com \
--cc=fortran@gcc.gnu.org \
--cc=gcc-patches@gcc.gnu.org \
--cc=jakub@redhat.com \
--cc=julian@codesourcery.com \
--cc=tobias@codesourcery.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).