public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/94358] New: [OMP] Privatize internal array variables introduced by the Fortran FE
@ 2020-03-27 13:40 tschwinge at gcc dot gnu.org
  2020-03-27 13:41 ` [Bug fortran/94358] " tschwinge at gcc dot gnu.org
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2020-03-27 13:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94358

            Bug ID: 94358
           Summary: [OMP] Privatize internal array variables introduced by
                    the Fortran FE
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Keywords: openacc, openmp
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tschwinge at gcc dot gnu.org
                CC: burnus at gcc dot gnu.org, jakub at gcc dot gnu.org
  Target Milestone: ---

This is discussed in context of OpenACC, but I suppose also applies to OpenMP?


Just "a while ago", Cesar submitted for gomp-4_0-branch in
<http://mid.mail-archive.com/561D65A9.3060903@codesourcery.com> "[gomp4]
privatize internal array variables introduced by the fortran FE", and
committed:

    gomp-4_0-branch, openacc-gcc-7-branch commit
fb7036c540273b8da507dac254f5fcf2033a8fb1
    Author:     cesar <cesar@138bc75d-0d04-0410-961f-82ee72b054a4>
    AuthorDate: Tue Oct 13 20:17:47 2015 +0000
    Commit:     cesar <cesar@138bc75d-0d04-0410-961f-82ee72b054a4>
    CommitDate: Tue Oct 13 20:17:47 2015 +0000

                gcc/fortran/
                * trans-array.c (gfc_trans_array_bounds): Add an INIT_VLA
argument
                to control whether VLAs should be initialized.  Don't mark this
                function as static.
                (gfc_trans_auto_array_allocation): Update call to
                gfc_trans_array_bounds.
                (gfc_trans_g77_array): Likewise.
                * trans-array.h: Declare gfc_trans_array_bounds.
                * trans-openmp.c (gfc_scan_nodesc_arrays): New function.
                (gfc_privatize_nodesc_arrays_1): New function.
                (gfc_privatize_nodesc_arrays): New function.
                (gfc_init_nodesc_arrays): New function.
                (gfc_trans_oacc_construct): Initialize any internal variables
for
                arrays without array descriptors inside the offloaded parallel
and
                kernels region.
                (gfc_trans_oacc_combined_directive): Likewise.

                gcc/testsuite/
                * gfortran.dg/goacc/default_none.f95: New test.

        git-svn-id:
svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@228782
138bc75d-0d04-0410-961f-82ee72b054a4

     gcc/fortran/ChangeLog.gomp                       |  18 ++++++++++++++++++
     gcc/fortran/trans-array.c                        |  12 +++++++-----
     gcc/fortran/trans-array.h                        |   2 ++
     gcc/fortran/trans-openmp.c                       | 179
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
     gcc/testsuite/ChangeLog.gomp                     |   4 ++++
     gcc/testsuite/gfortran.dg/goacc/default_none.f95 |  59
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
     6 files changed, 266 insertions(+), 8 deletions(-)

Later, a bug fix got submitted for gomp-4_0-branch in
<http://mid.mail-archive.com/5695AD6B.5060802@codesourcery.com> "[gomp4] arrays
inside modules" (but wrong patch file attached), and committed:

    gomp-4_0-branch, openacc-gcc-7-branch commit
b9a99e0c7b39ab4b82f5dd148923f03c0dba7aa4
    commit b9a99e0c7b39ab4b82f5dd148923f03c0dba7aa4
    Author:     cesar <cesar@138bc75d-0d04-0410-961f-82ee72b054a4>
    AuthorDate: Wed Jan 13 01:43:16 2016 +0000
    Commit:     cesar <cesar@138bc75d-0d04-0410-961f-82ee72b054a4>
    CommitDate: Wed Jan 13 01:43:16 2016 +0000

                gcc/fortran/
                * trans-openmp.c (gfc_privatize_nodesc_arrays): Use
                gfc_get_symbol_decl instead of accessing backend_decl directly.

                gcc/testsuite/
                * gfortran.dg/goacc/mod-array.f90: New test.

        git-svn-id:
svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@232307
138bc75d-0d04-0410-961f-82ee72b054a4

     gcc/fortran/ChangeLog.gomp                    |  5 +++++
     gcc/fortran/trans-openmp.c                    |  2 +-
     gcc/testsuite/ChangeLog.gomp                  |  4 ++++
     gcc/testsuite/gfortran.dg/goacc/mod-array.f90 | 23 +++++++++++++++++++++++
     4 files changed, 33 insertions(+), 1 deletion(-)

These two got merged in og8:

    openacc-gcc-8-branch commit be4fecc76b4166e4af6bd1d9646393830ba28df4
    commit be4fecc76b4166e4af6bd1d9646393830ba28df4
    Author:     Cesar Philippidis <cesar@codesourcery.com>
    AuthorDate: Tue Oct 13 20:17:47 2015 +0000
    Commit:     Thomas Schwinge <thomas@codesourcery.com>
    CommitDate: Wed May 23 08:36:03 2018 +0200

        Privatize internal array variables introduced by the fortran FE

                gcc/fortran/
                * trans-array.c (gfc_trans_array_bounds): Add an INIT_VLA
argument
                to control whether VLAs should be initialized.  Don't mark this
                function as static.
                (gfc_trans_auto_array_allocation): Update call to
                gfc_trans_array_bounds.
                (gfc_trans_g77_array): Likewise.
                * trans-array.h: Declare gfc_trans_array_bounds.
                * trans-openmp.c (gfc_scan_nodesc_arrays): New function.
                (gfc_privatize_nodesc_arrays_1): New function.
                (gfc_privatize_nodesc_arrays): New function.
                (gfc_init_nodesc_arrays): New function.
                (gfc_trans_oacc_construct): Initialize any internal variables
for
                arrays without array descriptors inside the offloaded parallel
and
                kernels region.
                (gfc_trans_oacc_combined_directive): Likewise.
                gcc/testsuite/
                * gfortran.dg/goacc/mod-array.f90: New test.

        (cherry picked from gomp-4_0-branch r228782 and r232307)

     gcc/fortran/ChangeLog.openacc                 |  18 ++++++++++++++++++
     gcc/fortran/trans-array.c                     |  12 +++++++-----
     gcc/fortran/trans-array.h                     |   2 ++
     gcc/fortran/trans-openmp.c                    | 179
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---
     gcc/testsuite/ChangeLog.openacc               |   4 ++++
     gcc/testsuite/gfortran.dg/goacc/mod-array.f90 |  23
+++++++++++++++++++++++
     6 files changed, 230 insertions(+), 8 deletions(-)

This patch did not make it into og9.

The original test case, 'gcc/testsuite/gfortran.dg/goacc/default_none.f95', had
already gotten committed (unmodified) to trunk in r229832 (commit
ef014f95d542ae7b18cb4150cbb1f1ef68160f7b, 2015-11-06) -- so presumably didn't
exhibit the original 'default(none)' problem any more.

That's presumably due to <http://mid.mail-archive.com/564378BB.8050400@acm.org>
"open acc default data attribute", where per the "fortran FE" comment, the new
'DECL_ARTIFICIAL' handling might indeed be the reason why we're no longer
seeing the original 'default(none)' correctness issue.

But, even under the assumtion that this now no longer is a 'default(none)'
correctness issue, that still leaves the patch's aspect of performance
optimization.  Maybe that one is no longer as important as it once was (because
of other performance optimizations being applied, especially on og8 etc., not
yet on master branch), but it conceptually still seems "the right thing to do"
to me?

Given og8 sources, let's consider the '!$acc parallel loop' in
'gcc/testsuite/gfortran.dg/goacc/default_none.f95:foobar'.

Comparing the "baseline" (both og8 commit
7c2f3f5c50e029bec704605b065d63273c2425dd (as a prerequisite, see comment to
follow), and og8 commit be4fecc76b4166e4af6bd1d9646393830ba28df4 reverted) to
"Privatize internal array variables introduced by the fortran FE" (only og8
commit 7c2f3f5c50e029bec704605b065d63273c2425dd reverted):

In the 'original' dump, we see a number of 'private' clauses being added:
'private(stride.4) private(ubound.1) private(size.6) private(ubound.3)
private(lbound.0) private(offset.5) private(lbound.2)', and 'offset.5' as well
as 'D.3818' being re-computed inside the offloaded region, via from 'lbound',
'stride', etc., based on data that is available anyway: 'n'.

In the 'gimple' dump, we then see "baseline": 'firstprivate(D.3818)
firstprivate(offset.5)' (data transfers) vs. "Privatize internal array
variables introduced by the fortran FE": 'private(offset.5)' (no data
transfer), and 'D.3818' has been made local to the offloaded region (no data
transfer).

I have not looked at further optimizations happening, but the theory -- I
suppose -- is that the "Privatize internal array variables introduced by the
fortran FE" variant can then be optimized better, compared to the "baseline"
variant, where we've got "un-optimizable" (or, at least hard-to-optimize) data
transfers due to the two 'firstprivate's.

Even though there may be other ways to achieve this optimization (here: have
the compiler move the 'offset.5' etc. calculations *into* the offloaded region,
and then apply something as discussed in PR90573 "Avoid unnecessary data
transfer into OMP construct" to re-write 'firstprivate' into 'private'), it
seems beneficial to me, to have the front ends produce proper (conceptually
simple, explicit) code as much as possible.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug fortran/94358] [OMP] Privatize internal array variables introduced by the Fortran FE
  2020-03-27 13:40 [Bug fortran/94358] New: [OMP] Privatize internal array variables introduced by the Fortran FE tschwinge at gcc dot gnu.org
@ 2020-03-27 13:41 ` tschwinge at gcc dot gnu.org
  2020-03-27 13:41 ` tschwinge at gcc dot gnu.org
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2020-03-27 13:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94358

Thomas Schwinge <tschwinge at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |burnus at gcc dot gnu.org
   Last reconfirmed|                            |2020-03-27
             Status|UNCONFIRMED                 |ASSIGNED
     Ever confirmed|0                           |1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug fortran/94358] [OMP] Privatize internal array variables introduced by the Fortran FE
  2020-03-27 13:40 [Bug fortran/94358] New: [OMP] Privatize internal array variables introduced by the Fortran FE tschwinge at gcc dot gnu.org
  2020-03-27 13:41 ` [Bug fortran/94358] " tschwinge at gcc dot gnu.org
@ 2020-03-27 13:41 ` tschwinge at gcc dot gnu.org
  2020-04-30 11:30 ` tschwinge at gcc dot gnu.org
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2020-03-27 13:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94358

--- Comment #1 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
To help with OpenACC 'kernels' support, Gergő later submitted for og8 an
additional change in
<http://mid.mail-archive.com/31e3dfc5-1bbe-0d11-c7c8-16a0cefa0b45@mentor.com>
"Rework OpenACC Fortran DO loop initialization", and committed:

    openacc-gcc-8-branch commit 7c2f3f5c50e029bec704605b065d63273c2425dd
    Author:     Gergö Barany <gergo@codesourcery.com>
    AuthorDate: Mon Jan 21 03:08:57 2019 -0800
    Commit:     Gergö Barany <gergo@codesourcery.com>
    CommitDate: Tue Jan 29 05:21:14 2019 -0800

        Rework OpenACC Fortran DO loop initialization

        Fortran DO loops on arrays with non-constant bounds (like a(lo:hi))
need
        special setup code to compute the bounds and offsets for the iteration.
In
        an OpenACC region containing multiple loops, this used to be done in a
block
        of code at the start of the region for all of the loops. But the
upcoming
        kernels conversion expects this kind of setup code to immediately
precede
        the corresponding loop, and variables are not mapped correctly
otherwise.
        This patch separates out the initialization part for each loop and
places it
        immediately before the loop.

            gcc/fortran/
            * trans-openmp.c (gfc_privatize_nodesc_array_clauses): Renamed from
            gfc_privatize_nodesc_arrays, initialization part factored out to...
            (gfc_reinitialize_privatized_arrays): ... this new function,
called...
            (gfc_trans_omp_do): ... from here for OpenACC loops.

            libgomp/
            * testsuite/libgomp.oacc-fortran/initialize_kernels_loops.f90: New
test.

        Reviewed-by: Thomas Schwinge <thomas@codesourcery.com>

     gcc/fortran/ChangeLog.openacc                                       |  7
+++++++
     gcc/fortran/trans-openmp.c                                          | 86
++++++++++++++++++++++++++++++++++++++++++++++++++------------------------------------
     libgomp/ChangeLog.openacc                                           |  4
++++
     libgomp/testsuite/libgomp.oacc-fortran/initialize_kernels_loops.f90 | 36
++++++++++++++++++++++++++++++++++++
     4 files changed, 97 insertions(+), 36 deletions(-)

(I did approve that for the og8 development branch, but had not actually
reviewed.)

This patch did not make it into og9 either (obviously, as it depends on the
other one).

Comparing the previous dumps to this "Rework OpenACC Fortran DO loop
initialization" variant (no og8 commits reverted):

In the 'original' dump, we now again see the 'private' clauses disappear (so,
same as in the "baseline" variant) -- yet the re-computation of 'offset.5' as
well as 'D.3818' remaining inside the region, "unprivatized".

In the 'gimple' dump, we then see 'firstprivate(size.6) firstprivate(offset.5)
firstprivate(stride.4) firstprivate(ubound.3) firstprivate(lbound.2)
firstprivate(ubound.1) firstprivate(lbound.0)' appear (supposedly due to
implicit 'firstprivate' inside OpenACC 'parallel'?) -- huh, that doesn't seem
right, given that the point was to avoid all these 'firstprivate' (data
transfers)!  (..., and for OpenACC 'kernels', these would all be implicit
'copy', thus global device data, slow, and prone to race-conditions -- so, much
worse?)

That doesn't seem to make sense, and will need further investigation.  But
let's first concentrate on the "Privatize internal array variables introduced
by the fortran FE" changes alone.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug fortran/94358] [OMP] Privatize internal array variables introduced by the Fortran FE
  2020-03-27 13:40 [Bug fortran/94358] New: [OMP] Privatize internal array variables introduced by the Fortran FE tschwinge at gcc dot gnu.org
  2020-03-27 13:41 ` [Bug fortran/94358] " tschwinge at gcc dot gnu.org
  2020-03-27 13:41 ` tschwinge at gcc dot gnu.org
@ 2020-04-30 11:30 ` tschwinge at gcc dot gnu.org
  2020-05-07  9:10 ` burnus at gcc dot gnu.org
  2020-05-07  9:12 ` burnus at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2020-04-30 11:30 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94358

--- Comment #2 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
(In reply to Thomas Schwinge from comment #0)
> The original test case, 'gcc/testsuite/gfortran.dg/goacc/default_none.f95',
> had already gotten committed (unmodified) to trunk in r229832 (commit
> ef014f95d542ae7b18cb4150cbb1f1ef68160f7b, 2015-11-06) -- so presumably
> didn't exhibit the original 'default(none)' problem any more.
> 
> That's presumably due to
> <http://mid.mail-archive.com/564378BB.8050400@acm.org> "open acc default
> data attribute", where per the "fortran FE" comment, the new
> 'DECL_ARTIFICIAL' handling might indeed be the reason why we're no longer
> seeing the original 'default(none)' correctness issue.

So the first thing (please, separately from other changes/patches mentioned
here), is to make sure that these (supposedly) 'DECL_ARTIFICIAL' objects
generated by the Fortran front end do play well regarding PR94876 "[OpenACC]
Unhelpful mapping for 'DECL_ARTIFICIAL' inside OpenACC 'kernels' regions".

Assuming that the Fortran front end does generate 'DECL_ARTIFICIAL' objects
which instead of 'firstprivate' clauses will then get 'copy' clauses generated
for OpenACC 'kernels' constructs, and the Fortran front end supposedly
generates code to initialize these 'DECL_ARTIFICIAL' objects inside the
offloaded region, we'd then have race-conditions, with all device threads
writing into the same global memory concurrently?


Tobias, as a first step, can you please provide an exemplary Fortran test case
that shows some cases of what code the Fortran front end generates to calculare
array lower and upper bounds etc., for a multi-level OpenACC gang, worker,
vector loop?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug fortran/94358] [OMP] Privatize internal array variables introduced by the Fortran FE
  2020-03-27 13:40 [Bug fortran/94358] New: [OMP] Privatize internal array variables introduced by the Fortran FE tschwinge at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2020-04-30 11:30 ` tschwinge at gcc dot gnu.org
@ 2020-05-07  9:10 ` burnus at gcc dot gnu.org
  2020-05-07  9:12 ` burnus at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: burnus at gcc dot gnu.org @ 2020-05-07  9:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94358

--- Comment #3 from Tobias Burnus <burnus at gcc dot gnu.org> ---
(In reply to Thomas Schwinge from comment #2)
> Tobias, as a first step, can you please provide an exemplary Fortran test
> case that shows some cases of what code the Fortran front end generates to
> calculare array lower and upper bounds etc., for a multi-level OpenACC gang,
> worker, vector loop?

Well, in principle there are local aux variables generated for:

* Explicit-shape ("array([<lb>:] ub)") and assumed-size ("array([<lb>:] *)"
  arrays, which pass a contiguous pointer to the actual data.

* Assumed-shape arrays ("array([<lb>] : )" which can be noncontiguous
  (unless "contiguous") and have an array descriptor – the lower bound of the
  actual argument is reset to '1' (or the specified lb).

Due to the bound changes, the latter uses "y.0 = y->data;" for the later access
to the variable, the former uses the argument directly. 

And with optional there are additional issues – especially for
assumed-shape arrays as one has to check 'y == NULL' (argument absent) and
'y->data == NULL' (unallocated or nonassociated actual argument) (→ PR93464). 


Currently, 'y' has to be mapped as well for assumed-shape arrays for the
'present' check and also all generated variables (y.0, lower.0, upper.0,
stride.0, …) have to be mapped – and there the issue with the data-sharing type
(omp default() etc.) and warning plays a role.


Not a full test case, but all generated variables are exposed by:

subroutine sub_explicit(x, n1, n2)
  implicit none
  integer, value :: n1, n2
  integer :: x(n1:n2) ! Contiguous memory, same: x(n1:*) = assumed size
  ! optional :: x   ! uncomment for 'optional' arguments
  integer :: j, i, ub, sz

  ! uses 'x' directly, sets lbound based on expr.
  !   j = (*x)[(integer(kind=8)) i + offset.2];
  j = x(i)
  ub = ubound(x, dim=1)
  sz = size(x)
end

subroutine sub_assumed_shape2(y, n1, n2)
  implicit none
  integer, value :: n1, n2
  integer, optional :: y(n1:,n2:) ! Can be strided, reset lower bound (dft:
'1')
  ! optional :: y   ! uncomment for 'optional' arguments
  integer :: j, i1, i2, ub, sz

  ! uses y.0 instead of y (→ optional issue), sets
  ! lbound.0, stride.0 etc.
  !   j = (*y.0)[(integer(kind=8)) i * stride.6 + offset.7]
  j = y(i1,i2)
  ub = ubound(y, dim=1) + ubound(y, dim=2)
  sz = size(y)
end

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Bug fortran/94358] [OMP] Privatize internal array variables introduced by the Fortran FE
  2020-03-27 13:40 [Bug fortran/94358] New: [OMP] Privatize internal array variables introduced by the Fortran FE tschwinge at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2020-05-07  9:10 ` burnus at gcc dot gnu.org
@ 2020-05-07  9:12 ` burnus at gcc dot gnu.org
  4 siblings, 0 replies; 6+ messages in thread
From: burnus at gcc dot gnu.org @ 2020-05-07  9:12 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94358

--- Comment #4 from Tobias Burnus <burnus at gcc dot gnu.org> ---
(In reply to Tobias Burnus from comment #3)
> And with optional there are additional issues – especially for
> assumed-shape arrays as one has to check 'y == NULL' (argument absent) and
> 'y->data == NULL' (unallocated or nonassociated actual argument) (→
> PR93464).

Correction: PR fortran/94672

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-05-07  9:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-27 13:40 [Bug fortran/94358] New: [OMP] Privatize internal array variables introduced by the Fortran FE tschwinge at gcc dot gnu.org
2020-03-27 13:41 ` [Bug fortran/94358] " tschwinge at gcc dot gnu.org
2020-03-27 13:41 ` tschwinge at gcc dot gnu.org
2020-04-30 11:30 ` tschwinge at gcc dot gnu.org
2020-05-07  9:10 ` burnus at gcc dot gnu.org
2020-05-07  9:12 ` burnus at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).