From: Jerry D <jvdelisle2@gmail.com>
To: Thomas Schwinge <thomas@codesourcery.com>,
Richard Biener <richard.guenther@gmail.com>,
Tom de Vries <tdevries@suse.de>,
gcc-patches@gcc.gnu.org
Cc: Janne Blomqvist <blomqvist.janne@gmail.com>,
fortran@gcc.gnu.org, Alexander Monakov <amonakov@ispras.ru>
Subject: Re: nvptx: '-mframe-malloc-threshold', '-Wframe-malloc-threshold' (was: Handling of large stack objects in GPU code generation -- maybe transform into heap allocation?)
Date: Fri, 23 Dec 2022 13:23:40 -0800 [thread overview]
Message-ID: <758e50d1-5d28-cac3-ff4e-dc632cda1455@gmail.com> (raw)
In-Reply-To: <87ili2p60p.fsf@euler.schwinge.homeip.net>
On 12/23/22 6:08 AM, Thomas Schwinge wrote:
> Hi!
>
> On 2022-11-11T15:35:44+0100, Richard Biener via Fortran <fortran@gcc.gnu.org> wrote:
>> On Fri, Nov 11, 2022 at 3:13 PM Thomas Schwinge <thomas@codesourcery.com> wrote:
>>> For example, for Fortran code like:
>>>
>>> write (*,*) "Hello world"
>>>
>>> ..., 'gfortran' creates:
>>>
>>> struct __st_parameter_dt dt_parm.0;
>>>
>>> try
>>> {
>>> dt_parm.0.common.filename = &"source-gcc/libgomp/testsuite/libgomp.oacc-fortran/print-1_.f90"[1]{lb: 1 sz: 1};
>>> dt_parm.0.common.line = 29;
>>> dt_parm.0.common.flags = 128;
>>> dt_parm.0.common.unit = 6;
>>> _gfortran_st_write (&dt_parm.0);
>>> _gfortran_transfer_character_write (&dt_parm.0, &"Hello world"[1]{lb: 1 sz: 1}, 11);
>>> _gfortran_st_write_done (&dt_parm.0);
>>> }
>>> finally
>>> {
>>> dt_parm.0 = {CLOBBER(eol)};
>>> }
>>>
>>> The issue: the stack object 'dt_parm.0' is a half-KiB in size (yes,
>>> really! -- there's a lot of state in Fortran I/O apparently). That's a
>>> problem for GPU execution -- here: OpenACC/nvptx -- where typically you
>>> have small stacks. (For example, GCC/OpenACC/nvptx: 1 KiB per thread;
>>> GCC/OpenMP/nvptx is an exception, because of its use of '-msoft-stack'
>>> "Use custom stacks instead of local memory for automatic storage".)
>>>
>>> Now, the Nvidia Driver tries to accomodate for such largish stack usage,
>>> and dynamically increases the per-thread stack as necessary (thereby
>>> potentially reducing parallelism) -- if it manages to understand the call
>>> graph. In case of libgfortran I/O, it evidently doesn't. Not being able
>>> to disprove existance of recursion is the common problem, as I've read.
>>> At run time, via 'CU_JIT_INFO_LOG_BUFFER' you then get, for example:
>>>
>>> warning : Stack size for entry function 'MAIN__$_omp_fn$0' cannot be statically determined
>>>
>>> That's still not an actual problem: if the GPU kernel's stack usage still
>>> fits into 1 KiB. Very often it does, but if, as happens in libgfortran
>>> I/O handling, there is another such 'dt_parm' put onto the stack, the
>>> stack then overflows; device-side SIGSEGV.
>>>
>>> (There is, by the way, some similar analysis by Tom de Vries in
>>> <https://gcc.gnu.org/PR85519> "[nvptx, openacc, openmp, testsuite]
>>> Recursive tests may fail due to thread stack limit".)
>>>
>>> Of course, you shouldn't really be doing I/O in GPU kernels, but people
>>> do like their occasional "'printf' debugging", so we ought to make that
>>> work (... without pessimizing any "normal" code).
>>>
>>> I assume that generally reducing the size of 'dt_parm' etc. is out of
>>> scope.
There are so many wiggles and turns and corner cases and the like of
nightmares in I/O I would advise not trying to reduce the dt_parm. It
could probably be done.
For debugging GPU, would it not be better to have a way you signal back
to a main thread to do a print from there, like some sort of call back
in the users code under test.
Putting this another way, recommend users debugging to use a different
method than embedding print statements for debugging rather than do a
tone of work to enable something that is not really a legitimate use case.
FWIW,
Jerry
next prev parent reply other threads:[~2022-12-23 21:23 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <ae825c453f484ffd99c9be34af726089@mentor.com>
[not found] ` <87mtaigz3l.fsf@dem-tschwing-1.ger.mentorg.com>
2022-11-11 14:12 ` Handling of large stack objects in GPU code generation -- maybe transform into heap allocation? Thomas Schwinge
2022-11-11 14:35 ` Richard Biener
2022-12-23 14:08 ` nvptx: '-mframe-malloc-threshold', '-Wframe-malloc-threshold' (was: Handling of large stack objects in GPU code generation -- maybe transform into heap allocation?) Thomas Schwinge
2022-12-23 21:23 ` Jerry D [this message]
2023-01-11 12:06 ` [PING] " Thomas Schwinge
2023-01-12 2:46 ` Jerry D
2022-11-11 14:38 ` Handling of large stack objects in GPU code generation -- maybe transform into heap allocation? Janne Blomqvist
2023-01-20 21:04 ` nvptx, libgcc: Stub unwinding implementation Thomas Schwinge
2023-01-20 21:16 ` nvptx, libgfortran: Switch out of "minimal" mode Thomas Schwinge
2023-01-20 22:10 ` Thomas Koenig
2023-01-24 9:37 ` Update 'libgomp/libgomp.texi' for 'nvptx, libgfortran: Switch out of "minimal" mode' (was: nvptx, libgfortran: Switch out of "minimal" mode) Thomas Schwinge
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=758e50d1-5d28-cac3-ff4e-dc632cda1455@gmail.com \
--to=jvdelisle2@gmail.com \
--cc=amonakov@ispras.ru \
--cc=blomqvist.janne@gmail.com \
--cc=fortran@gcc.gnu.org \
--cc=gcc-patches@gcc.gnu.org \
--cc=richard.guenther@gmail.com \
--cc=tdevries@suse.de \
--cc=thomas@codesourcery.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).