From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa1.mentor.iphmx.com (esa1.mentor.iphmx.com [68.232.129.153]) by sourceware.org (Postfix) with ESMTPS id 03004383440F; Tue, 9 Feb 2021 15:37:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 03004383440F Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Thomas_Schwinge@mentor.com IronPort-SDR: 2CZR+nXgZ42sHEBfHTCY1wi2W0A0WXebwG9sw9hNENrM4el6CXSKxMDgdef1YAKpj3Pyr7c8w7 GU0YpRt9/wI+tLq8gg2jHLemAP0gi361uKtAXbAdob6F9eb+c8UKGsA5h0prlY1JBO6rPYvRlT mv3njc9/BY/0nL8XEiFL4vIYrNbduPpsO+jnGp8PeT06wy1YFA4u4J80OsxkJLhAq9S719z5YO vmVL5e9/uqkfzOdYXYgarFhTW/RiAbaWjQgu5ToSgI5B5+gDQ1VMXzkYlf4AsrsW0vgYojtj7R 2gI= X-IronPort-AV: E=Sophos;i="5.81,165,1610438400"; d="scan'208";a="60227933" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa1.mentor.iphmx.com with ESMTP; 09 Feb 2021 07:37:37 -0800 IronPort-SDR: lJO8SR8Sg58xvb9xTC70M/Lz+pb1fn5ondmB7IizRwogxfdDOCLgXHQEo0zWN3+fYzdIFs1DCP YdP606SF9DR3nx4jHvXtcPERzjFClS82NASekSstfi0hA1qlc8/EZyduwhZbbxmekFG4nNgV6G UqXUdhHUMs59wh1U6cfsLIX+E69VUAY6wvTi8nyl4s3XQ+MvbsTn4xMWOwS5mr1pYl2EHeFKqj C0b5mMEoC+ogZFkyISx2Z4KdpaTTo7dLwZXQiGm0EEaDmog5LzbII4DuiVY/9lbBZeeQS+WHaE fcw= From: Thomas Schwinge To: Julian Brown , Tobias Burnus CC: Jakub Jelinek , , Subject: Re: [Patch] Fortran: %re/%im fixes for OpenMP/OpenACC + gfc_is_simplify_contiguous In-Reply-To: <20210209130522.280554b9@squid.athome> References: <87wnvhfuci.fsf@euler.schwinge.homeip.net> <87zh0dv4e4.fsf@dem-tschwing-1.ger.mentorg.com> <82276ca1-9b8c-61cf-9613-18636ce7fbe7@codesourcery.com> <20210209130522.280554b9@squid.athome> User-Agent: Notmuch/0.29.1+93~g67ed7df (https://notmuchmail.org) Emacs/25.2.2 (x86_64-pc-linux-gnu) Date: Tue, 9 Feb 2021 16:37:31 +0100 Message-ID: <87sg65uu9w.fsf@dem-tschwing-1.ger.mentorg.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-07.mgc.mentorg.com (139.181.222.7) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-Spam-Status: No, score=-6.8 required=5.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: fortran@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Fortran mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Feb 2021 15:37:41 -0000 Hi! On 2021-02-09T13:05:22+0000, Julian Brown wrote: > On Tue, 9 Feb 2021 13:45:36 +0100 > Tobias Burnus wrote: > >> On 09.02.21 12:58, Thomas Schwinge wrote: >> >> Granted. The array(:)%re access might update too much, but that's >> >> not different to array with strides or with contiguous arrays >> >> sections which contain component reference (and more than one >> >> component). >> > (Is that indeed allowed to "update too much"?) >> >> Yes - that's the general problem with strides or bit sets; >> copying only a subset =E2=80=93 and doing so atomically =E2=80=93 is not >> always possible or feasible. >> >> *OpenACC* 3.1 has for "2.14.4 Update Directive" the restriction: >> >> "Noncontiguous subarrays may appear. It is implementation-specific >> whether noncontiguous regions are updated by using one transfer >> for each contiguous subregion, or whether the noncontiguous data >> is packed, transferred once, and unpacked, or whether one or more >> larger subarrays (no larger than the smallest contiguous region >> that contains the specified subarray) are updated." >> >> For map, I saw that that's the case =E2=80=93 but I think Julian's >> patch does not handle this correctly for: >> >> type t >> integer :: i, j, k >> end type t >> type(t) :: A(100) >> ... host(A(:)%j) So I understand the variants in the quoted OpenACC part as follows. However I don't claim that I necessarily got all the fine detail right at the language level (English as well as OpenACC)! So, please verify. "Using one transfer for each contiguous subregion": for n in 1..100: transfer A(n)%j individually "Noncontiguous data is packed, transferred once, and unpacked": device: integer :: tmp(100) for n in 1..100: tmp(n) =3D A(n)%j transfer tmp host: for n in 1..100: A(n)%j =3D tmp(n) In this example here, I understand "subarrays (no larger than the smallest contiguous region that contains the specified subarray)" to again resolve to 'A(n)%j', so doesn't add other another variant. This -- per my reading! -- would be different here: type t integer :: i, j1, j2, k end type t type(t) :: A(100) ... host(A(:)%j1, A(:)%j2) ... where I understand this to mean that each 'A(n)%j1' plus 'A(:)%j2' may be transferred together: either "using one transfer for each contiguous subregion": for n in 1..100: transfer memory region of A(n)%j1..A(n)%j2 individuall= y ..., or "packed, transferred once, and unpacked": device: integer :: tmp(2 * 100) for n in 1..100: tmp(2 * n) =3D A(n)%j1 tmp(2 * n + 1) =3D A(n)%j2 transfer tmp host: for n in 1..100: A(n)%j1 =3D tmp(2 * n) A(n)%j2 =3D tmp(2 * n + 1) I do however not read into this that the following would be valid: >> I think instead of transferring A(1)%j to A(100)%j, it transfers >> all of A(:), i.e. also A(1)%i and A(100)%k :-( I don't think it's appropriate for an 'update' to alter anything else than the exact 'var's as specified. > Yes it will -- but given that A(2)%i and A(99)%k (and all the in-between > values) can legitimately be transferred according to the spec I don't read it that way, I'm afraid. :-O > how much > of a problem is that? In particular, are there situations where this > "over-updating" can lead to incorrect results in a conforming program? In your reading indeed it wouldn't, because the user couldn't expect the following: > Perhaps the question is, can a user legitimately expect the host and > offloaded versions of some given memory block to hold different data, > like maintaining different data in a cache than the storage backing > that cache? One use-case for that might be double buffering a "single > array" (i.e. the host and device versions of that array). I don't think > that's something we'd want to encourage, though. I find the wording in the spec rather explicitly, for example: OpenACC 3.1, 2.7 "Data Clauses": | In all cases, the compiler will allocate and manage a copy of the 'var' | in the memory of the current device, creating a *visible device copy* | of that 'var', for data not in shared memory. Emphasis mine; and I indeed understand this to mean that the user can "legitimately expect the host and offloaded versions of some given memory block to hold different data, [...]" (your words from above). > I think, rather, that partial updates are an optimisation the user can > use when they know that only part of an array has been updated, so > slight over-copying is harmless. Interesting -- and, again: that makes sense in your reading. So. Should everybody work through this again, trying to reach consensus? Do we need to clarify that with OpenACC? As for OpenMP, Tobias stated: |On 2021-02-09T13:45:36+0100, Tobias Burnus wrote= : |> For OpenMP and map, I recall encountering a code which did do |> this for OpenMP (i.e. contiguous subsection). I think it was |> related to derived-type 'map', but I do not recall anymore. |> |> |> Looking at the *OpenMP* 5.1 spec, I see that 'target update' also |> allows: "The list items that appear in the to or from clauses |> may include array sections with stride expressions." |> While for the map clause, there is: |> 'If a list item is an array section, it must specify contiguous |> storage.' (I suppose we all agree on that.) |> But I did not see a more explicit description how that should be |> handled, contrary to the rather explicit description for OpenACC. Surprising. Can some relevant wording (like OpenACC's "visible device copy") be found elsewhere (in the hundreds of pages...)? By default, I would've assumed "defensive", so again: "I don't think it's appropriate for an 'update' to alter anything else than the exact 'var's as specified." (my words from above)? Gr=C3=BC=C3=9Fe Thomas ----------------- Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 M=C3=BCnchen R= egistergericht M=C3=BCnchen HRB 106955, Gesch=C3=A4ftsf=C3=BChrer: Thomas H= eurung, Frank Th=C3=BCrauf