From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 56C4B3857C41 for ; Fri, 24 Jul 2020 22:36:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 56C4B3857C41 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Julian_Brown@mentor.com IronPort-SDR: Q09ExWNk33mKzqxwr5jtyZFmP0+TKMPWha141jJkHUuZjp6jat4JZcyl8jGqyuh9hd5wPaHVR0 ucbrJ+C37OTOxeCFFxZrlxjU9BQnhSW2QjeeWt7beZMoeiMwKFJPLqkUh4U9Yi8R3oAFBeFDiF SKgEhV/laWnL3RIXMDKzkiX+SUhjqjJO+6rXyvoQR1hbdOQSC+hbYUWUPKPxYP2bpYo1JxcEy8 hqwV+cHSwUkDoxYtwDRi17o1ojeXRxC2pLaYXcbREQf/DjAOb2afkh6il79rcyPKFmRnl6Ep3m 6vg= X-IronPort-AV: E=Sophos;i="5.75,392,1589270400"; d="scan'208";a="51288639" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 24 Jul 2020 14:36:11 -0800 IronPort-SDR: t1afxxittps8dHNGqFmTQn9SAmmLXU+0vdIiVXlc4llia/zPdgvx5WPhJbq1hV71haC2XUW3kp NW2GqRkQNwr5xyzeZYM57Q63mIapEtX0JjSCaX8Fjogrd/kKCGLXxUb/z7n7ddKKwHAMg9byHk pYxalespL//XRoF3EBL8HbgOchgfYASbNu6QIVhbcBW7K8l+xRWirQRCQHUM9Gb2P41PvpylEO wbalmYvpHlAzdSwkxiPG/s82EglZCJxqbsiG3wQoRzf8nQx2q/YhWcmwr34bOHHf091PGnS/OV SGw= Date: Fri, 24 Jul 2020 23:36:00 +0100 From: Julian Brown To: Thomas Schwinge CC: Jakub Jelinek , Subject: Re: [PATCH 2/2] [OpenACC] Detect pointer updates for attach operations (PR95590) Message-ID: <20200724233600.37e1bbcb@squid.athome> In-Reply-To: <87zh7p57sd.fsf@euler.schwinge.homeip.net> References: <87zh7p57sd.fsf@euler.schwinge.homeip.net> Organization: Mentor Graphics X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.32; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) To SVR-IES-MBX-04.mgc.mentorg.com (139.181.222.4) X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, KAM_SHORT, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 24 Jul 2020 22:36:23 -0000 On Fri, 24 Jul 2020 16:04:02 +0200 Thomas Schwinge wrote: > Hi Julian! > > On 2020-06-22T05:14:44-0700, Julian Brown > wrote: > > As mentioned in the parent email, this is a fix for PR95590 that > > detects updates of attached pointers in blocks, and rewrites the > > attached pointer and resets its attachment counter appropriately. I > > am however not entirely sure this is desirable or required by the > > spec: points against are: > > > > - To avoid expensive copies from the device to the host and/or > > "wrong way" device-to-host splay tree lookups, it requires keeping > > an extra shadow copy of mapped blocks on the host in order to > > detect if a host pointer with attachments in the block has been > > changed between attach operations. > > I haven't spent too much time trying, but I too have not yet seen a > way to avoid keeping this state ("shadow copy"), or looking it up on > demand ("expensive copies from the device to the host"). > > I suppose we cannot get the necessary information/state from the > host-side pointer (value) alone, and/or other state kept in the > 'splay_tree_key n' etc.? I don't think so. A different implementation might keep the attachment counters associated with the target_mem_desc (on the "target side"), rather than the splay tree key (the "host side"), in which case the reset-on-host-pointer-modification might sort-of happen for free. But I think that would be quite problematic for other reasons with our current implementation. (Purely speculating, but maybe it "works" somewhat accidentally for PGI because of the way its host-to-device pointer mapping is implemented?) > > We incur this overhead unconditionally if > > attach/detach are in use for what's not likely to be a common > > use case > > Is the overhead so bad, though? As soon as there's an 'attach', we > have to 'malloc' anyway (can combine the two, as you've done), and the > checking overhead doesn't seem so bad either? > > Should we reach out to other OpenACC compiler implementors, and ask > for their understanding/approach to this aspect? I haven't measured the performance impact (it's probably negligible). It may be worth trying to get clarification from OpenACC upstream, though. > > (it's slightly tricky to write a test case to exercise the > > behaviour, even -- Thomas's unmodified original for the PR raises > > an error after the previous patch in this series). > > Challange accepted! ;-P (..., but not right now.) There was a test case attached to the parent email, too :-). > > - From a user perspective, I think it's going to be quite easy to > > get confused wrt. the hidden attachment counter state > > (Indeed that "hidden" aspect is a bit confusing. I've even thought > whether we should add some 'gomp_get_attach_count' function just for > our own testing purposes.) Yeah, maybe. > > with this kind of > > reset-on-host-pointer-modification behaviour. Mind you, > > silently *not* doing the update is likewise going to be confusing > > (the stale device pointer would be updated at present). Maybe this > > should be detected as an error instead? > > I don't understand that, I'm afraid, because as I have quoted in > "OpenACC 'attach' behavior if already > attached to different data", OpenACC explicitly mandates the > "reset-on-host-pointer-modification" behavior, so I don't see a way to > avoid implementing that? > > > - The text in "2.6.8. Attachment Counter" *might* contribute to the > > argument that this kind of pointer-update detection is not > > required. > > Do you think these texts are in conflict in some way (that's not > obvious to me)? I'm still not sure that the intended meaning (in OpenACC 2.6, 2.7.2. "Data Clause Actions", "Attach Action") is what you are reading into it. See also "2.7.1. Data Specification in Data Clauses", under Restrictions: "* In C and C++, modifying pointers in pointer arrays during the data lifetime, either on the host or on the device, may result in undefined behavior." That isn't explicitly about pointers within structs (as we're talking about here), but is of a similar flavour, I think -- in that recognizing host pointer modifications in arrays of pointers would require similar housekeeping in the runtime, but OpenACC 2.6 makes such modifications undefined behaviour instead. The text in "2.6.7. Attachment Counter" (in OpenACC 2.6) is specifically about update operations (acc_update API routines or equivalent directives), but again, detecting pointer modifications (on the host side) between successive "attach" operations seems like a departure from *not* needing to do the same for update operations. Should we also support modifications of attached pointers (e.g. in mapped structs) in device-side code? Why or why not? (That wouldn't be impossible, but the details of how it could work would be ugly indeed...) Here's a quick example of "weird" behaviour that would arise with the pointer-modification detection patch: #include #include struct mystr { int *ptr; }; #define N 1024 int main (int argc, char *argv[]) { int *arr1 = malloc (sizeof (int) * N); int *arr2 = malloc (sizeof (int) * N); struct mystr s; for (int i = 0; i < N; i++) { arr1[i] = i; arr2[i] = i * 2; } s.ptr = arr1; #pragma acc enter data copyin(s) #pragma acc data copy(s.ptr[0:N]) { s.ptr = arr2; #pragma acc parallel loop copy(s.ptr[0:N]) for (int i = 0; i < N; i++) s.ptr[i] = i * 3; } for (int i = 0; i < N; i++) { assert (arr1[i] == i); assert (arr2[i] == i * 3); } free (arr1); free (arr2); } With the patch, this gives: libgomp: attach count underflow Though of course it doesn't work properly without the pointer-modification detection patch either. This example could be made to work, but it would mean *not* resetting the attachment counter to one on detecting a modified host pointer -- the pointer mapping would be modified but the attachment counter would be incremented as usual (at the start of the "acc parallel"). That's arguably the right thing to do perhaps, but it's clearly not what the spec says, even with your reading. HTH, Julian