public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug fortran/111661] New: [OpenACC] Detach+Attach of DT component gives libgomp: [0x405140,96] is not mapped
@ 2023-10-02 10:40 burnus at gcc dot gnu.org
  2023-10-04 13:53 ` [Bug fortran/111661] [OpenACC] Detach+Attach of DT component gives libgomp: [0x405140,96] is not mapped when running 'acc update' on DT var itself patrick.begou@univ-grenoble-alpes.fr
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: burnus at gcc dot gnu.org @ 2023-10-02 10:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111661

            Bug ID: 111661
           Summary: [OpenACC] Detach+Attach of DT component gives libgomp:
                    [0x405140,96] is not mapped
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: openacc
          Severity: normal
          Priority: P3
         Component: fortran
          Assignee: unassigned at gcc dot gnu.org
          Reporter: burnus at gcc dot gnu.org
                CC: patrick.begou@univ-grenoble-alpes.fr,
                    tschwinge at gcc dot gnu.org
  Target Milestone: ---

Reported by Patrick. The following snippet gives:

libgomp: [0x406180,96] is not mapped

with GCC (mainline) and reportedly likewise GCC 13.2 but working with the
Nvidia and Cray compiler (see comments in the snipped). [Same result with an
older OG13 build: 13.1.1 20230710 [OG13].]


The problem seems to be the 'acc update' command. Here,
  0x406180
is the host address of 'tab':
          406180- tab - host
         1847B20- tab%val - host
      1304140000- tab - device
      1304140200- tab%val - device
          406180- tab - host
         18FF8C0- tab%val - host

libgomp: [0x406180,96] is not mapped


In other words, the DETACH seems to not only DETACH 'tab%val' but also somehow
unmap 'tab'?

#pragma omp target oacc_exit_data map(delete:tab.val.data [len: 88])
map(delete:MEM <real(kind=8)[0:]> [(real(kind=8)[0:] *)_34] [len: _33])
map(to:tab.val [pointer set, len: 88]) map(force_detach:tab.val.data [bias: 0])
finalize

...
          #pragma omp target oacc_update map(force_to:tab [len: 96])



!----------cut-------------------
use iso_c_binding
    implicit none

    type r2tab
             double precision, dimension(:,:), allocatable :: val
             integer :: dim1
             integer :: dim2
     end type r2tab
       type(r2tab) :: tab
       integer :: i,j
       integer(c_intptr_t) :: iloc(2)

       !$acc enter data copyin(tab)
       tab%dim1 = 10
       tab%dim2 = 20
       allocate (tab%val(tab%dim1,tab%dim2))
print '(z16,a)', loc(tab), "- tab - host"
print '(z16,a)', loc(tab%val), "- tab%val - host"
       !$acc enter data copyin(tab%val)
!$acc serial copyout(iloc)
iloc(1) = loc(tab)
iloc(2) = loc(tab%val)
!$acc end serial
print '(z16,a)', iloc(1), "- tab - device"
print '(z16,a)', iloc(2), "- tab%val - device"


!...   
       !$acc exit data delete(tab%val) finalize
! Works as is with nvfortran and CCE ftn but gfortran 13 requires
! additionally:
!      !$acc exit data delete(tab) finalize

       tab%dim1=11
       tab%dim2=30
       deallocate(tab%val)
       allocate (tab%val(tab%dim1,tab%dim2))
print '(z16,a)', loc(tab), "- tab - host"
print '(z16,a)', loc(tab%val), "- tab%val - host"
! For nvfortran and ftn:
       !$acc update device(tab)
! gfortran with the change above requires instead:
!         !$acc enter data copyin(tab)

       !$acc enter data create(tab%val)
!$acc parallel loop
  do j = 1, tab%dim2
    do i = 1, tab%dim1
      tab%val(i,j) = j * 100 + i
    end do
  end do
!$acc end  parallel loop
!$acc exit data copyout(tab%val) finalize
!$acc exit data delete(tab)

  do j = 1, tab%dim2
    do i = 1, tab%dim1
      if (tab%val(i,j) /= j * 100 + i) error stop
    end do
  end do
end

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug fortran/111661] [OpenACC] Detach+Attach of DT component gives libgomp: [0x405140,96] is not mapped when running 'acc update' on DT var itself
  2023-10-02 10:40 [Bug fortran/111661] New: [OpenACC] Detach+Attach of DT component gives libgomp: [0x405140,96] is not mapped burnus at gcc dot gnu.org
@ 2023-10-04 13:53 ` patrick.begou@univ-grenoble-alpes.fr
  2023-10-13  8:45 ` burnus at gcc dot gnu.org
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: patrick.begou@univ-grenoble-alpes.fr @ 2023-10-04 13:53 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111661

--- Comment #1 from Patrick Bégou <patrick.begou@univ-grenoble-alpes.fr> ---
With tab an instantiation of an r2tab type described above by Tobias Burnus
this is what I am doing to enlarge the allocatable tab%val attribute to [n,m]
elements with gcc13.2.
       !$acc exit data delete(tab%val) finalize
       !$acc exit data delete(tab) finalize
       deallocate(tab%val)
       allocate(tab%val(n,m))
       tab%dim1=n
       tab%dim2=m
       !$acc enter data copyin(tab)
       !$acc enter data create(tab%val)

Previously (with nvfortran or ftn) I was using:
       !$acc exit data delete(tab%val) finalize
       deallocate(tab%val)
       allocate(tab%val(n,m))
       tab%dim1=n
       tab%dim2=m
       !$acc update device(tab)
       !$acc enter data create(tab%val)

Unfortunatly, as soon as the user defined type contains more that one
allocatable attribute and only one should be resized, the workaround is no more
usable. The same limitation occur with chained lists uploaded on the GPU if one
of the elements contains an attribute allocatable that must be resize: it is no
more possible to remove and then offload again the tab component on the GPU.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug fortran/111661] [OpenACC] Detach+Attach of DT component gives libgomp: [0x405140,96] is not mapped when running 'acc update' on DT var itself
  2023-10-02 10:40 [Bug fortran/111661] New: [OpenACC] Detach+Attach of DT component gives libgomp: [0x405140,96] is not mapped burnus at gcc dot gnu.org
  2023-10-04 13:53 ` [Bug fortran/111661] [OpenACC] Detach+Attach of DT component gives libgomp: [0x405140,96] is not mapped when running 'acc update' on DT var itself patrick.begou@univ-grenoble-alpes.fr
@ 2023-10-13  8:45 ` burnus at gcc dot gnu.org
  2023-10-13 14:46 ` patrick.begou@univ-grenoble-alpes.fr
  2023-11-16 20:59 ` tschwinge at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: burnus at gcc dot gnu.org @ 2023-10-13  8:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111661

--- Comment #2 from Tobias Burnus <burnus at gcc dot gnu.org> ---
@Patrick: It seems to work fine without "finalize".

Can you check whether the big program then works for you?
Usually, one should be able to do proper ref counting such that a
 'finalize' -> force setting refcounts to zero
shouldn't be needed.

* * *

Looking at the code more closely, the problem is:

  #pragma omp target oacc_exit_data map(delete:tab.val.data [len: 88])

this tries to 'delete' the array descriptor - but as tab.val.data is part of
'tab', this deletes all of "tab".


Compare the C example:

struct t { int *a; int n; };
void f() {
  struct t s;
  #pragma acc enter data copyin(s.a[:s.n])
  #pragma acc exit data delete(s.a[:s.n])
  // for completeness, not relevant here:
  #pragma acc exit data detach(s.a)
  #pragma acc exit data delete(s.a)
}


GCC does:

 #pragma omp target oacc_enter_data map(struct:s [len: 1]) \
     map(alloc:s.a [len: 8]) map(to:*_4 [len: _3]) map(attach:s.a [bias: 0])

 #pragma omp target oacc_exit_data map(release:s.a [len: 8]) \
     map(release:*_8 [len: _7]) map(detach:s.a [bias: 0])

 #pragma omp target oacc_exit_data map(detach:s.a [bias: 0])
 #pragma omp target oacc_exit_data map(release:s.a [len: 8])

which seems to be at least consistent. Again, here a 'finalize' would force the
reference counts to zero and, hence, also delete 's' and not only the
pointee/pointer target *s.a / s.a[0:.n] but also the pointer 's.a' itself.

(BTW: Same result since GCC 10; GCC 9 rejects that code.)

 * * *

QUESTION: Is the current code for C (and Fortran) correct according to the
OpenACC specification or not?

FOLLOW UP QUESTION: If GCC's result is incorrect, what should the compiler do
instead?
And if it is correct, the question is: why do both ftn and nvfortran work in
the same way?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug fortran/111661] [OpenACC] Detach+Attach of DT component gives libgomp: [0x405140,96] is not mapped when running 'acc update' on DT var itself
  2023-10-02 10:40 [Bug fortran/111661] New: [OpenACC] Detach+Attach of DT component gives libgomp: [0x405140,96] is not mapped burnus at gcc dot gnu.org
  2023-10-04 13:53 ` [Bug fortran/111661] [OpenACC] Detach+Attach of DT component gives libgomp: [0x405140,96] is not mapped when running 'acc update' on DT var itself patrick.begou@univ-grenoble-alpes.fr
  2023-10-13  8:45 ` burnus at gcc dot gnu.org
@ 2023-10-13 14:46 ` patrick.begou@univ-grenoble-alpes.fr
  2023-11-16 20:59 ` tschwinge at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: patrick.begou@univ-grenoble-alpes.fr @ 2023-10-13 14:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111661

--- Comment #3 from Patrick Bégou <patrick.begou@univ-grenoble-alpes.fr> ---
Hi Tobias,

thanks for this information.
- yes removing the "finalize" make this small test case working. In my 
mind it should only remove the allocated attribute from the GPU with 
setting the count to zero. Is it because the attribute is an allocatable 
and not a pointer ? Is it the same behaviour with a pointer as attribute ?

- unfortunately this modification doesn't make some significant progress 
with porting my large code (things are much more complex) but with Gnu 
compilers Gdb is working. So it is a big step for investigating. I have 
isolated the low level data management to test this module 
independently. It works with ftn and Nvidia but not with Gnu at this 
time. I have to investigate.

Patrick

Le 13/10/2023 à 10:45, burnus at gcc dot gnu.org a écrit :
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111661
>
> --- Comment #2 from Tobias Burnus <burnus at gcc dot gnu.org> ---
> @Patrick: It seems to work fine without "finalize".
>
> Can you check whether the big program then works for you?
> Usually, one should be able to do proper ref counting such that a
>   'finalize' -> force setting refcounts to zero
> shouldn't be needed.
>
> * * *
>
> Looking at the code more closely, the problem is:
>
>    #pragma omp target oacc_exit_data map(delete:tab.val.data [len: 88])
>
> this tries to 'delete' the array descriptor - but as tab.val.data is part of
> 'tab', this deletes all of "tab".
>
>
> Compare the C example:
>
> struct t { int *a; int n; };
> void f() {
>    struct t s;
>    #pragma acc enter data copyin(s.a[:s.n])
>    #pragma acc exit data delete(s.a[:s.n])
>    // for completeness, not relevant here:
>    #pragma acc exit data detach(s.a)
>    #pragma acc exit data delete(s.a)
> }
>
>
> GCC does:
>
>   #pragma omp target oacc_enter_data map(struct:s [len: 1]) \
>       map(alloc:s.a [len: 8]) map(to:*_4 [len: _3]) map(attach:s.a [bias: 0])
>
>   #pragma omp target oacc_exit_data map(release:s.a [len: 8]) \
>       map(release:*_8 [len: _7]) map(detach:s.a [bias: 0])
>
>   #pragma omp target oacc_exit_data map(detach:s.a [bias: 0])
>   #pragma omp target oacc_exit_data map(release:s.a [len: 8])
>
> which seems to be at least consistent. Again, here a 'finalize' would force the
> reference counts to zero and, hence, also delete 's' and not only the
> pointee/pointer target *s.a / s.a[0:.n] but also the pointer 's.a' itself.
>
> (BTW: Same result since GCC 10; GCC 9 rejects that code.)
>
>   * * *
>
> QUESTION: Is the current code for C (and Fortran) correct according to the
> OpenACC specification or not?
>
> FOLLOW UP QUESTION: If GCC's result is incorrect, what should the compiler do
> instead?
> And if it is correct, the question is: why do both ftn and nvfortran work in
> the same way?
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug fortran/111661] [OpenACC] Detach+Attach of DT component gives libgomp: [0x405140,96] is not mapped when running 'acc update' on DT var itself
  2023-10-02 10:40 [Bug fortran/111661] New: [OpenACC] Detach+Attach of DT component gives libgomp: [0x405140,96] is not mapped burnus at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-10-13 14:46 ` patrick.begou@univ-grenoble-alpes.fr
@ 2023-11-16 20:59 ` tschwinge at gcc dot gnu.org
  3 siblings, 0 replies; 5+ messages in thread
From: tschwinge at gcc dot gnu.org @ 2023-11-16 20:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111661

--- Comment #4 from Thomas Schwinge <tschwinge at gcc dot gnu.org> ---
Created attachment 56608
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56608&action=edit
'pr111661.c'

Before getting the Fortran case to work, let's indeed first look into some
conceptually corresponding C code:

(In reply to Tobias Burnus from comment #2)
> Looking at the code more closely, the problem is:
> 
>   #pragma omp target oacc_exit_data map(delete:tab.val.data [len: 88])
> 
> this tries to 'delete' the array descriptor - but as tab.val.data is part of
> 'tab', this deletes all of "tab".

..., and indeed the same appears to happen in C:

> Compare the C example:

I completed this into a functional code, as follows (and attached).

> struct t { int *a; int n; };
> void f() {
>   struct t s;

Here, first initialize 's':

    s.n = 10;
    s.a = __builtin_malloc(s.n * sizeof *s.a);

Now, before 's.a', we first need to establish 's' itself:

    #pragma acc enter data copyin(s)

>   #pragma acc enter data copyin(s.a[:s.n])

Then, let's do something observable, for example:

    #pragma acc serial present(s)
      {
        for (int i = 0; i < s.n; ++i)
          s.a[i] = i * i;
      }

To be able to observe the computations, instead of:

>   #pragma acc exit data delete(s.a[:s.n])

..., do:

    #pragma acc exit data copyout(s.a[:s.n]) //finalize

After this, we expect 's' still to be alive:

    if (!acc_is_present(&s, sizeof s))
      __builtin_abort();

>   // for completeness, not relevant here:
>   #pragma acc exit data detach(s.a)
>   #pragma acc exit data delete(s.a)

I don't understand what you're doing here; I commented out these two.

Instead, now get rid of 's':

    #pragma acc exit data delete(s)
    if (acc_is_present(&s, sizeof s))
      __builtin_abort();

Verify results, and clean up:

    for (int i = 0; i < s.n; ++i)
      if (s.a[i] != i * i)
        __builtin_abort();

    __builtin_free(s.a);

> }

This works fine with 'finalize' commented out.  However, with 'finalize'
enabled, we see:

> Again, here a 'finalize' would force
> the reference counts to zero and, hence, also delete 's' and not only the
> pointee/pointer target *s.a / s.a[0:.n] but also the pointer 's.a' itself.

... this behavior.

I've never in detail looked into the 'struct' mapping stuff -- I suppose the
problem here is not "simply" that '&s == &s.a', and that's confusing the
runtime?

> QUESTION: Is the current code for C (and Fortran) correct according to the
> OpenACC specification or not?

Per my -- quick, not in-depth -- first look, I'd say the code is correct, and
thus GCC's behavior is wrong.

> FOLLOW UP QUESTION: If GCC's result is incorrect, what should the compiler
> do instead?

It has to treat the outer 's' separate from the inner 's.a'.  (..., even if
they happen to have the same address -- in case that's relevant here).

How does corresponding OpenMP code behave?

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-11-16 20:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-02 10:40 [Bug fortran/111661] New: [OpenACC] Detach+Attach of DT component gives libgomp: [0x405140,96] is not mapped burnus at gcc dot gnu.org
2023-10-04 13:53 ` [Bug fortran/111661] [OpenACC] Detach+Attach of DT component gives libgomp: [0x405140,96] is not mapped when running 'acc update' on DT var itself patrick.begou@univ-grenoble-alpes.fr
2023-10-13  8:45 ` burnus at gcc dot gnu.org
2023-10-13 14:46 ` patrick.begou@univ-grenoble-alpes.fr
2023-11-16 20:59 ` tschwinge at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).