public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed
From: Julian Brown <julian@codesourcery.com>
To: Thomas Schwinge <thomas@codesourcery.com>
Cc: Tobias Burnus <tobias@codesourcery.com>,
	Jakub Jelinek <jakub@redhat.com>,  <gcc-patches@gcc.gnu.org>,
	<fortran@gcc.gnu.org>
Subject: Re: [Patch] Fortran: %re/%im fixes for OpenMP/OpenACC + gfc_is_simplify_contiguous
Date: Tue, 9 Feb 2021 16:08:57 +0000	[thread overview]
Message-ID: <20210209160857.021c5608@squid.athome> (raw)
In-Reply-To: <87sg65uu9w.fsf@dem-tschwing-1.ger.mentorg.com>

On Tue, 9 Feb 2021 16:37:31 +0100
Thomas Schwinge <thomas@codesourcery.com> wrote:

> Hi!
> 
> On 2021-02-09T13:05:22+0000, Julian Brown <julian@codesourcery.com>
> wrote:
> > On Tue, 9 Feb 2021 13:45:36 +0100
> > Tobias Burnus <tobias@codesourcery.com> wrote:
> >  
> >> On 09.02.21 12:58, Thomas Schwinge wrote:  
> >> >> Granted. The array(:)%re access might update too much, but
> >> >> that's not different to array with strides or with contiguous
> >> >> arrays sections which contain component reference (and more
> >> >> than one component).    
> >> > (Is that indeed allowed to "update too much"?)    
> >> 
> >> Yes - that's the general problem with strides or bit sets;
> >> copying only a subset – and doing so atomically – is not
> >> always possible or feasible.
> >> 
> >> *OpenACC* 3.1 has for "2.14.4  Update Directive" the restriction:
> >> 
> >> "Noncontiguous subarrays may appear. It is implementation-specific
> >>   whether noncontiguous regions are updated by using one transfer
> >>   for each contiguous subregion, or whether the noncontiguous data
> >>   is packed, transferred once, and unpacked, or whether one or more
> >>   larger subarrays (no larger than the smallest contiguous region
> >>   that contains the specified subarray) are updated."
> >> 
> >> For map, I saw that that's the case – but I think Julian's
> >> patch does not handle this correctly for:
> >> 
> >> type t
> >>    integer :: i, j, k
> >> end type t
> >> type(t) :: A(100)
> >>    ... host(A(:)%j)  
> 
> So I understand the variants in the quoted OpenACC part as follows.
> However I don't claim that I necessarily got all the fine detail
> right at the language level (English as well as OpenACC)!  So, please
> verify.
> 
> "Using one transfer for each contiguous subregion":
> 
>     for n in 1..100: transfer A(n)%j individually
> 
> "Noncontiguous data is packed, transferred once, and unpacked":
> 
>     device:
>       integer :: tmp(100)
>       for n in 1..100: tmp(n) = A(n)%j
>     transfer tmp
>     host:
>       for n in 1..100: A(n)%j = tmp(n)
> 
> In this example here, I understand "subarrays (no larger than the
> smallest contiguous region that contains the specified subarray)" to
> again resolve to 'A(n)%j', so doesn't add other another variant.
> 
> This -- per my reading! -- would be different here:
> 
>     type t
>        integer :: i, j1, j2, k
>     end type t
>     type(t) :: A(100)
>        ... host(A(:)%j1, A(:)%j2)
> 
> ... where I understand this to mean that each 'A(n)%j1' plus 'A(:)%j2'
> may be transferred together: either "using one transfer for each
> contiguous subregion":

I think you're overthinking it. My reading is that "each contiguous
subregion" just means, e.g. each "j" in:

  !$acc update host(A(:)%j)

although your case with multiple updates to adjacent fields would be
possible too, I guess.

>     for n in 1..100: transfer memory region of A(n)%j1..A(n)%j2
> individually
> 
> ..., or "packed, transferred once, and unpacked":
> 
>     device:
>       integer :: tmp(2 * 100)
>       for n in 1..100:
>         tmp(2 * n) = A(n)%j1
>         tmp(2 * n + 1) = A(n)%j2
>     transfer tmp
>     host:
>       for n in 1..100:
>         A(n)%j1 = tmp(2 * n)
>         A(n)%j2 = tmp(2 * n + 1)

Yes, fine, but...

> I do however not read into this that the following would be valid:
> 
> >> I think instead of transferring A(1)%j to A(100)%j, it transfers
> >> all of A(:), i.e. also A(1)%i and A(100)%k :-(  
> 
> I don't think it's appropriate for an 'update' to alter anything else
> than the exact 'var's as specified.

I disagree here. I think the "one or more larger subarrays" clause is
meant to indicate that a single transfer covering all the data (with
gaps) is acceptable. In terms of implementation -- much of the time
that will likely be more efficient than either transferring lots of
tiny fragments, or copying via an intermediate contiguous buffer, so
your reading of the spec might be shooting ourselves in the foot
somewhat.

One could imagine a heuristic that chooses between one large ("gappy")
transfer and, say, copying via a temporary buffer though, using the
latter if say the density of data transferred is less than some
percentage of the full amount.

> > Yes it will -- but given that A(2)%i and A(99)%k (and all the
> > in-between values) can legitimately be transferred according to the
> > spec  
> 
> I don't read it that way, I'm afraid.  :-O
> 
> > how much
> > of a problem is that? In particular, are there situations where this
> > "over-updating" can lead to incorrect results in a conforming
> > program?  
> 
> In your reading indeed it wouldn't, because the user couldn't expect
> the following:
> 
> > Perhaps the question is, can a user legitimately expect the host and
> > offloaded versions of some given memory block to hold different
> > data, like maintaining different data in a cache than the storage
> > backing that cache? One use-case for that might be double buffering
> > a "single array" (i.e. the host and device versions of that array).
> > I don't think that's something we'd want to encourage, though.  
> 
> I find the wording in the spec rather explicitly, for example: OpenACC
> 3.1, 2.7 "Data Clauses":
> 
> | In all cases, the compiler will allocate and manage a copy of the
> 'var' | in the memory of the current device, creating a *visible
> device copy* | of that 'var', for data not in shared memory.
> 
> Emphasis mine; and I indeed understand this to mean that the user can
> "legitimately expect the host and offloaded versions of some given
> memory block to hold different data, [...]" (your words from above).

I think that's just describing that the copy of the data on the device
is distinct (for pragmatic reasons), not that the user should rely on
it being a separate copy of the data for algorithmic reasons. To do so
would mean a difference in semantics between a program with OpenACC
directives enabled and without, which is I think against the spirit of
the API.

Thanks,

Julian

  reply	other threads:[~2021-02-09 16:09 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08 17:50 Tobias Burnus
2021-02-09  9:45 ` Thomas Schwinge
2021-02-09 11:41   ` Tobias Burnus
2021-02-09 11:58     ` Thomas Schwinge
2021-02-09 12:45       ` Tobias Burnus
2021-02-09 13:05         ` Julian Brown
2021-02-09 13:05           ` Julian Brown
2021-02-09 15:37           ` Thomas Schwinge
2021-02-09 16:08             ` Julian Brown [this message]
2021-02-16 14:37     ` *ping* – " Tobias Burnus
2021-02-16 15:57 ` Paul Richard Thomas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210209160857.021c5608@squid.athome \
    --to=julian@codesourcery.com \
    --cc=fortran@gcc.gnu.org \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=jakub@redhat.com \
    --cc=thomas@codesourcery.com \
    --cc=tobias@codesourcery.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).