Hi Jakub, On 11.10.22 14:24, Jakub Jelinek wrote: There is another issue besides what I wrote in my last review, and I'm afraid I don't know what to do about it, hoping Tobias has some ideas. The problem is that without the allocate-stmt associated allocate directive, Fortran allocatables are easily always allocated with malloc and freed with free. The deallocation can be implicit through reallocation, or explicit deallocate statement etc. ... But when some allocatables are now allocated with a different allocator (when allocate-stmt associated allocate directive is used), some allocatables are allocated with malloc and others with GOMP_alloc but we need to free them with the corresponding allocator based on how they were allocated, what has been allocated with malloc should be deallocated with free, what has been allocated with GOMP_alloc should be deallocated with GOMP_free. I think the most common case is: integer, allocatable :: var(:) !$omp allocators allocator(my_alloc) ! must be in same scope as decl of 'var' ... ! optionally: deallocate(var) end ! of scope: block/subroutine/... - automatic deallocation Those can be easily handled. It gets more complicated with control flow: if (...) then !$omp allocators allocator(...) allocate(...) else allocate (...) endif However, the problem is really that there is is no mandatory '!$omp deallocators' and also the wording like: "If any operation of the base language causes a reallocation of an array that is allocated with a memory allocator then that memory allocator will be used to release the current memory and to allocate the new memory." (OpenMP 5.0 wording) There has been some attempt to relax the rules a bit, e.g. by adding the wording: "For allocated allocatable components of such variables, the allocator that will be used for the deallocation and allocation is unspecified." And some wording change (→issues 3189) to clarify related component issues. But nonetheless, there is still the issue of: (a) explicit DEALLOCATE in some other translation unit (b) some intrinsic operation which reallocate the memory, either via libgomp or in the source code: a = [1,2,3] ! possibly reallocates str = trim(str) ! possibly reallocates where the first one calls 'realloc' directly in the code and the second one calls 'libgomp' for that. * * * I don't see a good solution – and there is in principle the same issue with unified-shared memory (USM) on hardware that does not support transparently accessing all host memory on the device. Compilers support this case by allocating memory in some special memory, which is either accessible from both sides ('pinned') or migrates on the first access from the device side - but remains there until the accessing device kernel ends ('managed memory'). Newer hardware (+ associated Linux kernel support) permit accessing all memory in a somewhat fast way, avoiding this issue (and special handling is then left to the user.) For AMDGCN, my understanding is that all hardware supported by GCC supports this - but glacial speed until the last hardware architectures. For Nvidia, this is supported since Pascal (I think for Titan X, P100, i.e. sm_5.2/sm_60) - but I believe not for all Pascal/Kepler hardware. I mention this because the USM implementation at https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597976.html suffers from this. And https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601059.html tries to solve the the 'trim' example issue above - i.e. the case where libgomp reallocates pinned/managed (pseudo-)USM memory. * * * The deallocation can be done in a completely different TU from where it has been allocated, in theory it could be also not compiled with -fopenmp, etc. So, I'm afraid we need to store somewhere whether we used malloc or GOMP_alloc for the allocation (say somewhere in the array descriptor and for other stuff somewhere on the side?) and slow down all code that needs deallocation to check that bit (or say we don't support deallocation/reallocation of OpenMP allocated allocatables without -fopenmp on the deallocation TU and only slow down -fopenmp compiled code)? The problem with storing is that gfortran inserts the malloc/realloc/free calls directly, i.e. without library preloading, intercepting those libcalls, I do not see how it can work at all. I also do not know how to handle the pinned-memory case above correctly, either. One partial support would be requiring that the code using allocatables cannot do any reallocation/deallocation by only permitting calls to procedures which do not permit allocatables. (Such that no reallocation can happen.) – And print a 'sorry' for the rest. Other implementations seem to have a Fortran library call for (re)allocations, which permits to swap the allocator from the generic one to the omp_default_mem_alloc. * * * In terms of the array descriptor, we have inside 'struct dtype_type' the 'signed short attribute', which currently only holds CFI_attribute_pointer/CFI_attribute_allocatable/CFI_attribute_other (=0,1,2). And this is only used together with ISO C binding, permitting to use the other bits for other purpose (for the non-ISO-C case). Still, the question is *how* to use it in that case. Thoughts on the generic issue on those thoughts? Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955