public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed
* OpenMP target (offloading) question
@ 2021-03-31 19:50 Harald Anlauf
  2021-04-06  8:41 ` Tobias Burnus
  0 siblings, 1 reply; 2+ messages in thread
From: Harald Anlauf @ 2021-03-31 19:50 UTC (permalink / raw)
  To: fortran

[-- Attachment #1: Type: text/plain, Size: 1431 bytes --]

Dear experts,

sorry if this is a stupid question, but I was playing with offloading for
the nvptx-none target and found different behavior between e.g. gfortran-10
on OpenSuse and the Nvidia compiler (nvfortran) for the attached code.

With "nvfortran -mp=multicore offload-test.f90" the code prints:

    2.000000        2000.000
 s1:    1001000.
 s2:    1001000.

With "/usr/bin/gfortran-10 -fopenmp -foffload=nvptx-none offload-test.f90":

   2.00000000       2000.00000
 s1:   1001000.00
 s2:   0.00000000

The core difference between the evaluations s1 and s2 is:

s1:

!$omp target data map(a,s)
!$omp target teams reduction(+:s) map(s)
    do i = 1, n
       s = s + a(i)
    end do
!$omp end target teams
!$omp end target data

s2:

!$omp target data map(a,s)
!$omp target teams reduction(+:s)
    do i = 1, n
       s = s + a(i)
    end do
!$omp end target teams
!$omp end target data

I was assuming that the map clause in the reduction should not be necessary,
but the result seems to tell me that either I am wrong (and gfortran is right),
or nvfortran is wrong.

With OpenACC this seems to be different; at least a simple example I tried
with the reduction within an !$acc data ... !$acc end data did not show
unexpected behavior.

Can anybody tell me that I am wrong (and point me to the right place in the
OpenMP standard), or should I open a PR?

Thanks
Harald

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: offload-test.f90 --]
[-- Type: text/x-fortran, Size: 747 bytes --]

program p5
  implicit none
  integer           :: i, n = 1000
  real              :: s
  real, allocatable :: a(:)
  allocate (a(n))
  do i = 1, n
     a(i) = 2*i
  end do
  print *, a(1),a(n)
  call s1 ()
  print *, "s1:", s
  call s2 ()
  print *, "s2:", s
contains
  subroutine s1 ()
    integer :: i
    s = 0.
!$omp target data map(a,s)
!$omp target teams reduction(+:s) map(s)
    do i = 1, n
       s = s + a(i)
    end do
!$omp end target teams
!$omp end target data
  end subroutine s1
  !----------------
  subroutine s2 ()
    integer :: i
    s = 0.
!$omp target data map(a,s)
!$omp target teams reduction(+:s)
    do i = 1, n
       s = s + a(i)
    end do
!$omp end target teams
!$omp end target data
  end subroutine s2
end program

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: OpenMP target (offloading) question
  2021-03-31 19:50 OpenMP target (offloading) question Harald Anlauf
@ 2021-04-06  8:41 ` Tobias Burnus
  0 siblings, 0 replies; 2+ messages in thread
From: Tobias Burnus @ 2021-04-06  8:41 UTC (permalink / raw)
  To: Harald Anlauf, fortran

Hi Harald,

interesting code; in any case for
   !$omp target
     s = 5
's' is a scalar which is mapped by default as 'firstprivate',
i.e. it is not copied back. However, OpenMP 5.1 states:

"If a list item appears in a reduction, lastprivate or linear clause
  on a combined target construct then it is treated as if it also appears
  in a map clause with a map-type of tofrom." (2.21.7)

Your code uses: omp target teams reduction(+:s)

→ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99928

Tobias

On 31.03.21 21:50, Harald Anlauf via Fortran wrote:
> Dear experts,
>
> sorry if this is a stupid question, but I was playing with offloading for
> the nvptx-none target and found different behavior between e.g. gfortran-10
> on OpenSuse and the Nvidia compiler (nvfortran) for the attached code.
>
> With "nvfortran -mp=multicore offload-test.f90" the code prints:
>
>      2.000000        2000.000
>   s1:    1001000.
>   s2:    1001000.
>
> With "/usr/bin/gfortran-10 -fopenmp -foffload=nvptx-none offload-test.f90":
>
>     2.00000000       2000.00000
>   s1:   1001000.00
>   s2:   0.00000000
>
> The core difference between the evaluations s1 and s2 is:
>
> s1:
>
> !$omp target data map(a,s)
> !$omp target teams reduction(+:s) map(s)
>      do i = 1, n
>         s = s + a(i)
>      end do
> !$omp end target teams
> !$omp end target data
>
> s2:
>
> !$omp target data map(a,s)
> !$omp target teams reduction(+:s)
>      do i = 1, n
>         s = s + a(i)
>      end do
> !$omp end target teams
> !$omp end target data
>
> I was assuming that the map clause in the reduction should not be necessary,
> but the result seems to tell me that either I am wrong (and gfortran is right),
> or nvfortran is wrong.
>
> With OpenACC this seems to be different; at least a simple example I tried
> with the reduction within an !$acc data ... !$acc end data did not show
> unexpected behavior.
>
> Can anybody tell me that I am wrong (and point me to the right place in the
> OpenMP standard), or should I open a PR?
>
> Thanks
> Harald
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-04-06  8:41 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-31 19:50 OpenMP target (offloading) question Harald Anlauf
2021-04-06  8:41 ` Tobias Burnus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).