From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 61020385BF81 for ; Fri, 5 Jun 2020 22:03:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 61020385BF81 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Julian_Brown@mentor.com IronPort-SDR: KDpMIF6u+vT1veZ+wYTEU0HM12hkIsNAwkrSfaEV/GkP3V0bXN/3d+j5WgSRMp325PCinRKyhh Bs81xNBktwJnKUxM0zfy4udhc8IOyOl3IkVcobjrSJfpWp8yjhVfZ2ApQG21OpqIwsoxVFPiZC PbLxPgxR1YeR7eNVs7u+m2I1GJTHJxZM7PRSKhPr2Hnvuhdmi5a3cKX9ijfrn0PIzxQqH2Eioe MCv5hJnzhPdiu9LBeKs7LYnvHeBGSOY6A0Ugxcd5xmI0B22HtWcv5vDTpqImkES4gVCPnoAEkl AZU= X-IronPort-AV: E=Sophos;i="5.73,477,1583222400"; d="diff'?f90'?scan'208";a="51639419" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa1.mentor.iphmx.com with ESMTP; 05 Jun 2020 14:03:49 -0800 IronPort-SDR: YsSPRiokjQUw3oFzGuiMl5Ps6VskzywlghbPHNND+hypxfqlr+7e/SesoXfoXsZzhAtZ0WU1+C gnuPot1LQ3MO+WG3aTpSZN1IIllHBVYPc4jmsZJQ84tEMQxZZRtEd9l7BtUWtstJBVf+NL+4AA 81r5LhTRh3QdtYHmWdSH2vV+E0oSKowe5Cm7gdFfr3DNzDLms8hvyGOp87Hzi7EdysnP4JrhDn 1c2aXeROW1TcxY2yuWpbSrGKXr3UHuJmWetmeYZz2gEIRnl1aPoJPzPcxlH5J+QBavTqbSfNuE YZM= Date: Fri, 5 Jun 2020 23:03:41 +0100 From: Julian Brown To: Thomas Schwinge CC: , , Tobias Burnus Subject: Re: [PATCH 2/7] [OpenACC] Adjust dynamic reference count semantics Message-ID: <20200605230341.1fb4b257@squid.athome> In-Reply-To: <87eeqw6uss.fsf@euler.schwinge.homeip.net> References: <87k10o72dd.fsf@euler.schwinge.homeip.net> <87eeqw6uss.fsf@euler.schwinge.homeip.net> Organization: Mentor Graphics X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="MP_/u/DTNq7x5sYxcc9p4ZOXGNq" X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 05 Jun 2020 22:03:52 -0000 --MP_/u/DTNq7x5sYxcc9p4ZOXGNq Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Content-Disposition: inline On Wed, 3 Jun 2020 17:19:47 +0200 Thomas Schwinge wrote: > Hi Julian! > > On 2020-06-03T14:36:14+0200, I wrote: > > On 2020-05-22T15:16:05-0700, Julian Brown > > wrote: > >> This patch adjusts the semantics of dynamic reference counts, as > >> described in the parent email. > > > > Thanks! > > > > A few questions, but no need to send an updated patch. > > > >> --- a/libgomp/oacc-mem.c > >> +++ b/libgomp/oacc-mem.c > > > >> @@ -1018,13 +1036,102 @@ goacc_enter_data_internal (struct > >> gomp_device_descr *acc_dev, size_t mapnum, { > >> for (size_t i = 0; i < mapnum; i++) > >> { > >> - int group_last = find_group_last (i, mapnum, sizes, kinds); > >> + splay_tree_key n; > >> + size_t group_last = find_group_last (i, mapnum, sizes, > >> kinds); > >> + bool struct_p = false; > >> + size_t size, groupnum = (group_last - i) + 1; > >> > >> - gomp_map_vars_async (acc_dev, aq, > >> - (group_last - i) + 1, > >> - &hostaddrs[i], NULL, > >> - &sizes[i], &kinds[i], true, > >> - GOMP_MAP_VARS_OPENACC_ENTER_DATA); > >> + switch (kinds[i] & 0xff) > >> + { > >> + case GOMP_MAP_STRUCT: > >> + { > >> + int last = i + sizes[i]; > > > > The 'last' calculated here must always equal the 'group_last' > > calculated above. ;-) (... so we might just use 'group_last' > > instead of 'last' in the following.) > > > >> + size = (uintptr_t) hostaddrs[last] + sizes[last] > >> + - (uintptr_t) hostaddrs[i]; > >> + struct_p = true; > >> + } > >> + break; > >> + > >> + case GOMP_MAP_ATTACH: > >> + size = sizeof (void *); > >> + break; > >> + > >> + default: > >> + size = sizes[i]; > >> + } > >> + > >> + n = lookup_host (acc_dev, hostaddrs[i], size); > >> + > > > >> + if (n && struct_p) > >> + { > >> + if (n->refcount != REFCOUNT_INFINITY) > >> + n->refcount += groupnum - 1; > >> + n->dynamic_refcount += groupnum - 1; > >> + gomp_mutex_unlock (&acc_dev->lock); > >> + } > > > > Is the 'GOMP_MAP_STRUCT' handling here specifically necessary, or > > is that just an optimization of the 'n && groupnum > 1' case below? > > > > Eh, OK, I think I see where this is going; the 'n && groupnum > 1' > case below might not necessarily take care of the 'groupnum - 1' > refcounts that we're filing here? Right. GOMP_MAP_STRUCT is a little special in this case. > >> + else if (n && groupnum == 1) > >> + { > >> + void *h = hostaddrs[i]; > >> + size_t s = sizes[i]; > >> + > >> + /* A standalone attach clause. */ > >> + if ((kinds[i] & 0xff) == GOMP_MAP_ATTACH) > >> + gomp_attach_pointer (acc_dev, aq, &acc_dev->mem_map, > >> n, > >> + (uintptr_t) h, s, NULL); > >> + else if (h + s > (void *) n->host_end) > >> + { > >> + gomp_mutex_unlock (&acc_dev->lock); > >> + gomp_fatal ("[%p,+%d] not mapped", (void *)h, > >> (int)s); > >> + } > >> + > >> + assert (n->refcount != REFCOUNT_LINK); > >> + if (n->refcount != REFCOUNT_INFINITY) > >> + n->refcount++; > >> + n->dynamic_refcount++; > >> + > >> + gomp_mutex_unlock (&acc_dev->lock); > >> + } > > > >> + else if (n && groupnum > 1) > >> + { > >> + assert (n->refcount != REFCOUNT_INFINITY > >> + && n->refcount != REFCOUNT_LINK); > >> + > >> + bool processed = false; > >> + > >> + struct target_mem_desc *tgt = n->tgt; > >> + for (size_t j = 0; j < tgt->list_count; j++) > >> + if (tgt->list[j].key == n) > >> + { > >> + for (size_t k = 0; k < groupnum; k++) > >> + if (j + k < tgt->list_count && tgt->list[j + > >> k].key) > >> + { > >> + tgt->list[j + k].key->refcount++; > >> + tgt->list[j + k].key->dynamic_refcount++; > >> + } > >> + processed = true; > >> + } > >> + > >> + gomp_mutex_unlock (&acc_dev->lock); > >> + if (!processed) > >> + gomp_fatal ("dynamic refcount incrementing failed for > >> " > >> + "pointer/pset"); > >> + } > > > > Please add some text to explain the nested 'j', 'k' loops and their > > 'if' conditionals, and the 'groupnum' usage in the 'k' loop > > boundary. Should the 'k' loop maybe run 'for (size_t k = j; k < > > tgt->list_count; ++k)' (..., or is 'groupnum' relevant?), and in > > the loop body then use 'k' instead of 'j + k'? (Maybe I've now > > confused myself, staring at this for a while...) > > Audacious as I am sometimes, I did put a '__builtin_abort' right after > 'tgt->list[j].key == n' -- and it doesn't trigger one single time for > the current libgomp test cases, meaning this is all dead code? I'm > confused. Huh, I didn't expect that! Indeed that stanza appears to be dead code (at least with mapping clauses generated from current GCC). The reason is a late bug-fix to the manual deep copy code that strips GOMP_MAP_TO_PSET and GOMP_MAP_POINTER from OpenACC enter/exit mappings altogether. (In https://gcc.gnu.org/legacy-ml/gcc-patches/2019-12/msg01253.html). That means "grouped" mappings are actually only now used for GOMP_MAP_STRUCT, so actually even more of the find_group_last function is probably dead now too, modulo backward compatibility issues. Rewinding a bit, here is an explanation of the problem that the removal of those clauses fixes, in case we want to revisit that. With the attached patch (reverting the fix), the attached test case fails (e.g. compiled at -O0). The problem is that with a dynamic data lifetime, it's possible for an array descriptor on the stack to go out of scope before the array data it is associated with does. This might well be violating either Fortran rules or OpenACC semantics -- if that's the case, then we had no problem here. (I did see a similar problem "in the wild", but hadn't come up with a standalone test case until now.) The attached test case starts out with a explicit-shape array local. It passes this to a subroutine "enterdata_wrapper". This subroutine fabricates an assumed-shape array pointer to its argument (creating an array descriptor), and passes it to another subroutine "enterdata". The "enterdata" subroutine then performs an OpenACC "enter data" operation with the array -- whose data comes from the original explicit-shape array in the main program, but whose descriptor comes from the stack frame of the caller (i.e. "enterdata_wrapper"). This descriptor then goes out of scope before returning to the main program. The test case tries to fiddle with the stack layout by adding arbitrary other arrays, and does the same dance again with nested subroutines to perform an "exit data" operation. But now the address of the (new) descriptor is different, and the unmapping operation fails. In short -- OpenACC "enter data" operations can (could) create hidden dangling references to array descriptors, in some circumstances. So, the fix was to strip out GOMP_MAP_TO_PSET (and GOMP_MAP_POINTER, which I don't think has any meaning on these directives) from OpenACC "enter data" and "exit data" directives altogether. If an array has a descriptor when we get to a compute kernel, that descriptor is copied to the target anyway, *even for present clauses*, so passing the array descriptor to "enter data" descriptor doesn't appear to be necessary, even in cases where it stays in scope before unmapping from the target. So, questions: 1. Does the attached program violate Fortran semantics in some way? 2. Or OpenACC semantics? 3. Are there unintended side-effects of removing GOMP_MAP_TO_PSET and GOMP_MAP_POINTER from OpenACC enter/exit data directives? 4. Should the clauses be stripped from the equivalent OpenMP directives too? (FAOD, I'm not asking for review on the attached patch at this time.) HTH, Julian --MP_/u/DTNq7x5sYxcc9p4ZOXGNq Content-Type: text/x-fortran Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="enter-data-pset.f90" program myprog implicit none integer :: a(16) integer :: i call enterdata_wrapper(a, 16) call exitdata_wrapper(a, 16) contains subroutine enterdata_wrapper(a, n) implicit none integer :: n integer, target :: a(n) integer :: aa(16) integer :: bb(16) integer, pointer :: ap(:) integer :: cc(16) integer :: dd(16) ! An array descriptor appears somewhere around here... ap => a !$acc enter data copyin(aa,bb,cc,dd) call enterdata(ap) !$acc exit data copyout(aa,bb,cc,dd) ! ...and goes out of scope. end subroutine enterdata_wrapper subroutine enterdata(a) implicit none integer, pointer :: a(:) ! Map "to(a.data) to_pset(a) pointer(a.data)" !$acc enter data copyin(a) end subroutine enterdata subroutine exitdata_wrapper(a, n) implicit none integer :: n integer, target :: a(n) integer :: aa(32) integer :: bb(32) integer, pointer :: ap(:) integer :: cc(32) integer :: dd(32) ! A different array descriptor appears... ap => a !$acc enter data copyin(aa,bb,cc,dd) call exitdata(ap) !$acc exit data copyout(aa,bb,cc,dd) ! ...and goes out of scope. end subroutine exitdata_wrapper subroutine exitdata(a) implicit none integer, pointer :: a(:) ! Try to unmap the fresh array descriptor: FAILS. !$acc exit data copyout(a) end subroutine exitdata end program myprog --MP_/u/DTNq7x5sYxcc9p4ZOXGNq Content-Type: text/x-patch Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="remove-pset-dangling-ref-bugfix-1.diff" commit 7a4d9939a7c5f770f3d2fcd02be01bfd146589ce Author: Julian Brown Date: Fri Jun 5 14:46:41 2020 -0700 Remove GOMP_MAP_TO_PSET dangling reference bugfix diff --git a/gcc/gimplify.c b/gcc/gimplify.c index e14932fafaf..79120e53129 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -8748,6 +8748,8 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, case OMP_TARGET_DATA: case OMP_TARGET_ENTER_DATA: case OMP_TARGET_EXIT_DATA: + case OACC_ENTER_DATA: + case OACC_EXIT_DATA: case OACC_HOST_DATA: if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER || (OMP_CLAUSE_MAP_KIND (c) @@ -8756,15 +8758,6 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p, mapped, but not the pointer to it. */ remove = true; break; - case OACC_ENTER_DATA: - case OACC_EXIT_DATA: - if (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER - || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_TO_PSET - || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_POINTER - || (OMP_CLAUSE_MAP_KIND (c) - == GOMP_MAP_FIRSTPRIVATE_REFERENCE)) - remove = true; - break; default: break; } diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c index bc25527c616..c462cbb1007 100644 --- a/libgomp/oacc-mem.c +++ b/libgomp/oacc-mem.c @@ -1015,9 +1015,12 @@ find_group_last (int pos, size_t mapnum, size_t *sizes, unsigned short *kinds) switch (kind0) { case GOMP_MAP_TO_PSET: - while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) == GOMP_MAP_POINTER) + while (pos + 1 < mapnum + && ((kinds[pos + 1] & 0xff) == GOMP_MAP_POINTER + || (kinds[pos + 1] & 0xff) == GOMP_MAP_ATTACH)) pos++; - /* We expect at least one GOMP_MAP_POINTER after a GOMP_MAP_TO_PSET. */ + /* We expect at least one GOMP_MAP_POINTER (or GOMP_MAP_ATTACH) + after a GOMP_MAP_TO_PSET. */ assert (pos > first_pos); break; @@ -1044,7 +1047,9 @@ find_group_last (int pos, size_t mapnum, size_t *sizes, unsigned short *kinds) /* We can have zero or more GOMP_MAP_POINTER mappings after a to/from (etc.) mapping. */ - while (pos + 1 < mapnum && (kinds[pos + 1] & 0xff) == GOMP_MAP_POINTER) + while (pos + 1 < mapnum + && ((kinds[pos + 1] & 0xff) == GOMP_MAP_POINTER + || (kinds[pos + 1] & 0xff) == GOMP_MAP_ATTACH)) pos++; } @@ -1122,6 +1127,15 @@ goacc_enter_data_internal (struct gomp_device_descr *acc_dev, size_t mapnum, assert (n->refcount != REFCOUNT_INFINITY && n->refcount != REFCOUNT_LINK); + for (size_t j = i + 1; j <= group_last; j++) + if ((kinds[j] & 0xff) == GOMP_MAP_ATTACH) + { + splay_tree_key m + = lookup_host (acc_dev, hostaddrs[j], sizeof (void *)); + gomp_attach_pointer (acc_dev, aq, &acc_dev->mem_map, m, + (uintptr_t) hostaddrs[j], sizes[j], NULL); + } + bool processed = false; struct target_mem_desc *tgt = n->tgt; --MP_/u/DTNq7x5sYxcc9p4ZOXGNq--