[RFC] Fortran: OpenMP (Coarray?) – handling transfer/mapping of allocatable componens, esp. polymorphic ones

public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed

* [RFC] Fortran: OpenMP (Coarray?) – handling transfer/mapping of allocatable componens, esp. polymorphic ones
@ 2021-03-18 12:28 Tobias Burnus
  0 siblings, 0 replies; only message in thread
From: Tobias Burnus @ 2021-03-18 12:28 UTC (permalink / raw)
  To: Jakub Jelinek, gcc-patches, fortran, Andre Vehreschild,
	Paul Richard Thomas, Catherine Moore

Fortran itself: suggestion is to add a new entry to the vtable
(breaking change) — thus, please also comment if you are not
interested in OpenMP (or coarrays).

For OpenMP: When mapping a derived-type to a non-shared-memory
(accelerator/GPU) device, it gets complicated with (polymorphic)
allocatable components — as OpenMP requires a deep copy of
_allocatable_ components.
[Side note: 'virtual calls' on the device are also permitted,
i.e. the vtable also has to be mapped properly.]

For coarrays: I thought there is the same issue with CO_REDUCE
(arbitrary type w/ user-defined reduction function), but I now think
that I either missed a constraint or that J3/WG5 missed to add one.
See thread starting with my just written email (no reply so far):
to J3: https://mailman.j3-fortran.org/pipermail/j3/2021-March/012965.html

[C++: Side note – OpenMP 5.1 now also permits virtual calls;
but the deep copy problem does not seem to exist (excpt next item?).]

For OpenMP, I think there is a relation between this issue and
how MAPPER might be implemented. — However, I have not looked at
mappers, hence, it could be a completely separate implementation or not.

  * * *

(A) EXAMPLES AS PREREMARK

   type recursive_t
     type(recursive_t) :: A   ! recursive types; OpenMP: valid since 5.1
   end type

   type t
   end type t
   type, extends(t) :: t2
    integer, allocatable :: A(:)  ! allocatable component
   end type t2
   type t3
     class(t), allocatable :: C  ! allocatable polymorphic component
   end type t3

   type(recursive_t) :: rt, rtc[*]
   class(t), allocatable :: B
   type(t3) :: C[*]
...
   !$omp target enter data map(to:B, rt, C)
...
   call CO_REDUCE (rtc, my_reduct_proc, result_image=1)
   call CO_REDUCE (C, my_reduct_proc3, result_image=1)

And for OpenMP also the following (virtual call on device):

   class(*), intent(in) :: dummy_class
   !$omp target map(to:dummy_class)
     select class(dummy_class)
       class is my_cmplx_class
         call dummy_class%type_bound_proc(5) ! TBP / virtual call

(B) DESCRIPTION OF THE PROBLEM

Coarrays: While there are some restrictions regarding the use of coarrays,
especially with user-defined reductions data has to be accessed on the
remote image with limited data available on this_image() about details on
the remote image.

OpenMP: While OpenMP 4.5 mostly avoided all pitfalls, 5.0 permitted a lot
more and 5.1 removed additional restrictions. For unified shared memory
or when not using 'target' constructs, there is no issue beyond the normal
Fortran issues (e.g. data-sharing firstprivate with polymorphic variables).
However, when the memory is not shared it becomes harder.

In any case, the information is distributed over several places:

* Run-time library
   libgomp: knows how to transfer the data between the host and the
   device and update pointers
   libcaf: knows how to access remote memory. I think pointer mapping
   (like remove vs. local vtable) is not required, but it looks as
   if the vtab->hash value has to be obtainable for
   same_type_as(var[i], var[j])

* Type and associated: At the location of the type declaration and
   vtable generation, all details about the type is known (except for
   array bounds and the depth of the recursive types, which are both
   only known at run time).

* Code location which calls into the library (openMP construct,
   co_reduction call etc.):
   Here, both the need for the data transfer and the declared type is known
   – including which parts have to be handled in a loop form
   (for A%B(:)%recursive, the compiler can generate an outer loop over A%B(i)
   and then an inner loop over A%B(i)%recursive%recursive%...%recursive).

   For the used data ref itself, the compiler can also add code to handle
   the dynamic type of the last partref - that is both vtable and
   obtaining the vtab->size or similar.
   But if the last partref is a polymorphic type, neither
   allocatable nonpolymorphic nor allocatable polymorphic components
   are known at the code location.

(B) CURRENT LIB SUPPORT

(1) For OpenMP

The current code generation does not permit run-time dependent
mapping as everything is folded into a single libgomp mapping call:
    map(A)
may become something like:
    map(to:a.p [len: 64]) map(to:*a.p.data [len: D.3953 * 4]) map(always_pointer: a.p.data [pointer assign, bias: 0]) ...
which then calls
    __builtin_GOMP_target_enter_exit_data (-1, 1, &.omp_data_arr.4, &.omp_data_sizes.5, &.omp_data_kinds.6, 0, 0B);

Taking the example of the recusive type (valid since OpenMP 5.1), we would need something like:
    __builtin_GOMP_target_enter_exit_data_begin
     → map 'A'
    prev = A;
    for (ptr = A.rt; rt != NULL; rt = rt->rt, prev = prev->rt)
      map (ptr) map(alwaysptr: ptr → prev%rt)
    __builtin_GOMP_target_enter_exit_data_end

(2) Likewise for Coarrays which also only get as argument:
   _gfortran_caf_co_reduce (gfc_descriptor_t *a, ..., int a_len)
which also does not help with allocatable components.

While for simple cases, a loop would do (cf. above), in the
general case it would not. Hence:

(C) PROPOSED SOLUTION

I think we we need callbacks:

If doing on the user side 'call co_reduce(A, ...), we would call:
    caf_co_reduce (A, ..., A->vptr->callbackftn)

and in caf_co_reduce, we call
    callbackftn (A, mytoken, transfer_fn)

and in the compiler-generated part of the user code (vtable),
i.e. in A->_vptr->callbackftn:
       for (rt = A->rt; rt.next != NULL; rt = rt.next) {
           transfer_fn (mytoken, rt.next, size, NULL, NULL);
           attach_fn (mytoken, rt.next, &rt.next);
         }

Or another example case:
       // B%a%{_vptr, _data} already handled as it is inside 'B', size() = arraysize
       transfer_fn (mytoken, B%a%_data%data, B%A->vptr->size * size(B%A),
                    B%A%_data, B%A%_vptr->callbackftn)
       attach_fn (mytoken, &B%a%_data%data, B%a%data%_data);
       attach_fn (mytoken, &B%a%vpn, B%a%vpn);

Or some implementation like that.

BACKWARD COMPATIBILITY ISSUE:
In order that this works, we need a new entry in the VTABLE,
which is a breaking change. (Unless we want to restrict it to
-fcoarray=lib / -fopenmp but that is bound to break when mixing
code compiled with different flags.)

SIDE REMARK:
When breaking code, we could also do the following change:
* Instead of always generating the VTABLE when a derived-type is
   declared, we should only generate it (was weak symbol) when
   actually using it in a polymorphic context, e.g.
     class(t)
   or
     class(*) ... :: c
     allocate(t :: c)
     c = t()
   or when needed for nonpolymorphic types for the purposes
   discussed in this text.

   If a vtable has been generated, this can be noted in the
   MODULE file (and in gsym for the translation unit) to
   avoid streaming out the vtable multiple times.

* Bumping the module version is probably sufficient to mark
   the incompatibility.

SIDE REMARK 2: If we break backward compatibility, we could
consider doing some other cleanup – some random thoughts:
* I think we do mishandle the decl with dimension(..) especially
   when there is a coarray token at the end.
* Removing now/then unused functions from libgfortran?
* Some fixes for the array descriptor/C-class array descriptor/
   its convert aux functions (e.g. moving to the FE [can be done
   w/o breaking] + removing it from libgfortran)
* Other fixes (which?)

(D) WHAT NEEDS TO BE SUPPORTED

* Association of the vptr to the virtual table for polymorphic
   components
* Association of 'pointers' (C/internal impl sense), i.e.
   allocated/copied actual data → 'data' comp of the array descriptor
* Recursive allocatable types
* Array-valued components (by construction: contiguous), which
   again have components, which might be allocatable (and polymorphic)

RFC:

(a) Future extension?
   'maybe_attach' for data/procedure pointer components to
   update the pointer address, if available?
   (Note: pointer might be 'undefined'
   and point to 0xDEADBEEF besides NULL or a proper value.)
If we think it might become useful, we should reserve space
for a flag or a function pointer.

(b) Other flags? For instance to walk backwards when freeing
     memory? Probably not needed (tracking the library, could do
     some reverse handling itself)?

(c) Current way of the implementation (example above or desc below)
     assumes that first all data is transferred and then the pointers
     attached; alternatively, it could be combined in the transfer
     call (extra argument), albeit vptr probably would remain a
     different call. Could be before or after the transfer.

(d) Should vtabs tagged in a special way in the call to the library?

(F) IMPLEMENTATION

(See also above's RFC.)

Compiler side:

* New copy-callback function in the vtable of type 't'
   Arguments:
   - 'void* token' to be used by the caller
   - 'type(t) this' pointer
   - Function to be called for the data transfer
   - Function to be called for pointer assignment
     (vtab or just transferred allocatable or also data pointer/proc pointer?)

   * Will transfer the data for all allocatable components; and either
     NULL/NULL or a new callback function if the component is a CLASS or
     a derived type with allocatable components
   * For recursive types, it could be either handled as in previous item
     (call callback recursively) or in a loop (as in the example above).

Library side:

   * Provide a function to handle data transfers:
     - token (provided by the library, e.g. storing whether a specific
       allocator should be used to distinguish 'map(to:' from 'firstprivate'
     - pointer to the data to be handled
     - its size (contiguous memory chunk)
If required by the data type:
     - another callback function
     - the data argument for that one
if not required: last two arguments are NULL

  * * *

Thoughts? Remarks?

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-03-18 12:28 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-18 12:28 [RFC] Fortran: OpenMP (Coarray?) – handling transfer/mapping of allocatable componens, esp. polymorphic ones Tobias Burnus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).