From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id CEF113858401 for ; Tue, 11 Oct 2022 14:15:33 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CEF113858401 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665497733; h=from:from:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gkKT67xtAllNdDWIpAQXmr78wDbsgnxW5S43MNFBiHQ=; b=Jg0PbG9UTDL5Hg48NF3NLbPzhYGYvVW0vcqrVZaEvOfX0to5+F5t7Mwd793EZ8hXFK3Wp1 1dWQN2ngh3EuUB/gao7XnQeJGOjvOtMTWJoIqaWYfZ1rlYrFyH4ivaRiq4mVbUb0HeMcAJ Drsmd434M5FLXF6JNnOwE8yn5KNx9N4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-612-_MBPKGyOPySNtgNcTz0cyQ-1; Tue, 11 Oct 2022 10:15:28 -0400 X-MC-Unique: _MBPKGyOPySNtgNcTz0cyQ-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 04E57101A528; Tue, 11 Oct 2022 14:15:28 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.192.55]) by smtp.corp.redhat.com (Postfix) with ESMTPS id B1203414A809; Tue, 11 Oct 2022 14:15:27 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 29BEFOEF2969894 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 11 Oct 2022 16:15:25 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 29BEFOFO2969893; Tue, 11 Oct 2022 16:15:24 +0200 Date: Tue, 11 Oct 2022 16:15:24 +0200 From: Jakub Jelinek To: Tobias Burnus Cc: Hafiz Abid Qadeer , gcc-patches@gcc.gnu.org, fortran@gcc.gnu.org Subject: Re: [PATCH 2/5] [gfortran] Translate allocate directive (OpenMP 5.0). Message-ID: Reply-To: Jakub Jelinek References: <20220113145320.3153087-1-abidh@codesourcery.com> <20220113145320.3153087-3-abidh@codesourcery.com> <3683274e-33d7-d2a1-ffd8-d678cecba5d8@codesourcery.com> MIME-Version: 1.0 In-Reply-To: <3683274e-33d7-d2a1-ffd8-d678cecba5d8@codesourcery.com> X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.6 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Tue, Oct 11, 2022 at 03:22:02PM +0200, Tobias Burnus wrote: > Hi Jakub, > > On 11.10.22 14:24, Jakub Jelinek wrote: > > There is another issue besides what I wrote in my last review, > and I'm afraid I don't know what to do about it, hoping Tobias > has some ideas. > The problem is that without the allocate-stmt associated allocate directive, > Fortran allocatables are easily always allocated with malloc and freed with > free. The deallocation can be implicit through reallocation, or explicit > deallocate statement etc. > ... > But when some allocatables are now allocated with a different > allocator (when allocate-stmt associated allocate directive is used), > some allocatables are allocated with malloc and others with GOMP_alloc > but we need to free them with the corresponding allocator based on how > they were allocated, what has been allocated with malloc should be > deallocated with free, what has been allocated with GOMP_alloc should be > deallocated with GOMP_free. > > > > I think the most common case is: > > integer, allocatable :: var(:) > !$omp allocators allocator(my_alloc) ! must be in same scope as decl of 'var' > ... > ! optionally: deallocate(var) > end ! of scope: block/subroutine/... - automatic deallocation So you talk here about the declarative directive the patch does sorry on, or about the executable one above allocate stmt? Anyway, even this simple case has the problem that one can have subroutine foo (var) integer, allocatable:: var(:) var = [1, 2, 3] ! reallocate end subroutine and call foo (var) above. > Those can be easily handled. It gets more complicated with control flow: > > if (...) then > !$omp allocators allocator(...) > allocate(...) > else > allocate (...) > endif > > > > However, the problem is really that there is is no mandatory > '!$omp deallocators' and also the wording like: > > "If any operation of the base language causes a reallocation of > an array that is allocated with a memory allocator then that > memory allocator will be used to release the current memory > and to allocate the new memory." (OpenMP 5.0 wording) > > There has been some attempt to relax the rules a bit, e.g. by > adding the wording: > "For allocated allocatable components of such variables, the allocator that > will be used for the deallocation and allocation is unspecified." > > And some wording change (→issues 3189) to clarify related component issues. > > But nonetheless, there is still the issue of: > > (a) explicit DEALLOCATE in some other translation unit > (b) some intrinsic operation which reallocate the memory, either via libgomp > or in the source code: > a = [1,2,3] ! possibly reallocates > str = trim(str) ! possibly reallocates > where the first one calls 'realloc' directly in the code and the second one > calls 'libgomp' for that. > > * * * > > I don't see a good solution – and there is in principle the same issue with > unified-shared memory (USM) on hardware that does not support transparently > accessing all host memory on the device. > > Compilers support this case by allocating memory in some special memory, > which is either accessible from both sides ('pinned') or migrates on the > first access from the device side - but remains there until the accessing > device kernel ends ('managed memory'). > > Newer hardware (+ associated Linux kernel support) permit accessing all > memory in a somewhat fast way, avoiding this issue (and special handling > is then left to the user.) For AMDGCN, my understanding is that all hardware > supported by GCC supports this - but glacial speed until the last hardware > architectures. For Nvidia, this is supported since Pascal (I think for Titan X, > P100, i.e. sm_5.2/sm_60) - but I believe not for all Pascal/Kepler hardware. > > I mention this because the USM implementation at > https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597976.html > suffers from this. > And https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601059.html > tries to solve the the 'trim' example issue above - i.e. the case where > libgomp reallocates pinned/managed (pseudo-)USM memory. > > * * * > > The deallocation can be done in a completely different TU from where it has > been allocated, in theory it could be also not compiled with -fopenmp, etc. > So, I'm afraid we need to store somewhere whether we used malloc or > GOMP_alloc for the allocation (say somewhere in the array descriptor and for > other stuff somewhere on the side?) and slow down all code that needs > deallocation to check that bit (or say we don't support > deallocation/reallocation of OpenMP allocated allocatables without -fopenmp > on the deallocation TU and only slow down -fopenmp compiled code)? > > The problem with storing is that gfortran inserts the malloc/realloc/free calls directly, i.e. without library preloading, intercepting those libcalls, I do not see how it can work at all. Well, it can use a weak symbol, if not linked against libgomp, the bit that it is OpenMP shouldn't be set and so realloc/free will be used and do if (arrdescr.gomp_alloced_bit) GOMP_free (arrdescr.data, 0); else free (arrdescr.data); and similar. And I think we can just document that we do this only for -fopenmp compiled code. But do we have a place to store that bit? I presume in array descriptors there could be some bit for it, but what to do about scalar allocatables, or allocatable components etc.? In theory we could use ugly stuff like if all the allocations would be guaranteed to have at least 2 byte alignment use LSB bit of the pointer to mark GOMP_alloc allocated memory for the scalar allocatables etc. but then would need in -fopenmp compiled code to strip it away. As for pinned memory, if it is allocated through libgomp allocators, that should just work if GOMP_free/GOMP_realloc is used, that is why we have those extra data in front of the allocations where we store everything we need. But those also make the OpenMP allocations incompatible with malloc/free allocations. Jakub