From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <fortran-return-46772-listarch-fortran=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 69339 invoked by alias); 3 Aug 2016 15:53:02 -0000
Mailing-List: contact fortran-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <fortran.gcc.gnu.org>
List-Subscribe: <mailto:fortran-subscribe@gcc.gnu.org>
List-Post: <mailto:fortran@gcc.gnu.org>
List-Help: <mailto:fortran-help@gcc.gnu.org>, <http://sourceware.org/lists.html#faqs>
Sender: fortran-owner@gcc.gnu.org
Received: (qmail 69321 invoked by uid 89); 3 Aug 2016 15:53:01 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.1 required=5.0 tests=AWL,BAYES_00,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,LOTS_OF_MONEY,RCVD_IN_DNSWL_LOW,SPF_PASS,T_MONEY_PERCENT autolearn=no version=3.3.2 spammy=translate, Singh, singh, U*vikramsingh001
X-HELO: mail-oi0-f51.google.com
Received: from mail-oi0-f51.google.com (HELO mail-oi0-f51.google.com) (209.85.218.51) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 03 Aug 2016 15:52:51 +0000
Received: by mail-oi0-f51.google.com with SMTP id 4so76316706oih.2        for <fortran@gcc.gnu.org>; Wed, 03 Aug 2016 08:52:51 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20130820;        h=x-gm-message-state:mime-version:in-reply-to:references:from:date         :message-id:subject:to:cc:content-transfer-encoding;        bh=GYslnkBPAkJ84Qab8zSEQ04+eXiFVXnicsDBCFlOAPE=;        b=PYRjHdPSYm89myISDk84lzelNAk/D8LuH2D+87CFzsHQ8kgEGKdb/aeua/0+Up8Rz2         LBUSBECLLrx48gcaSyMUI7RruAbkVsYEMR4sIc9A6Q8gI+2kW2NuWljFm3qEYqm/IWrT         OqvVMLpUwEPZuyOFdt6JhTjgQ5evZKyykRFGNCnxgB7nanJmkB3oNa34IK/pKLvoZ7Nh         Y+oubsCYjbpFThn8fAXlPEqUOLJ9057+bN/PvoS9Ggv0piNTwNp/ZQBUJryI+fZe24Wf         /MRdfkfnUQlb+3hu4VvUG+rjr+maAr6uXS69i4aG+FGjOC2Tj8iI8swfqfyGVjurZZ7k         otgA==
X-Gm-Message-State: AEkoouv7G1v8kxiSxBzN40ZTkRJyQbaCayzrARZ5nUDDt3msMFcWY7rO2EdKZDq94l7XdqFZ5DKg/DnfKxDLcw==
X-Received: by 10.202.230.16 with SMTP id d16mr42042075oih.30.1470239569425; Wed, 03 Aug 2016 08:52:49 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.157.40.41 with HTTP; Wed, 3 Aug 2016 08:52:09 -0700 (PDT)
In-Reply-To: <CAD0gq3VoRWCiXRkgi-bnGLBfSjR-bFc0Mzp19LRr+yWP4MrYLg@mail.gmail.com>
References: <CAD0gq3XehzKyOqvqWfmSErNiy8CKkHqNvqAg5nU5eCyV5gQ4RA@mail.gmail.com> <8737qn5rd4.fsf@kepler.schwinge.homeip.net> <CAD0gq3XgHLLWfY8BJWCc2vPC6S4E2tHBvBfdok0yK5Gs4K5Qzw@mail.gmail.com> <8760vj45bi.fsf@hertz.schwinge.homeip.net> <CAKe2itfWOXb11iRpVtJXsU9Eh-3RpdK4dt_WX+WRO6bLvgd-NQ@mail.gmail.com> <CANSzZf4vnc23kHpSGQ4mW8RmuAYeytZu8dYvPQ3xi_v9V9Q7iQ@mail.gmail.com> <2b4f59d5-be38-2814-27bb-73aa7ffb4b8f@codesourcery.com> <878u0o6wwj.fsf@kepler.schwinge.homeip.net> <CAD0gq3VX0x6qCkMWLJBqK=9WRYk0O_38yCctwaQRcmqdWU0jiQ@mail.gmail.com> <87inyjuw6b.fsf@kepler.schwinge.homeip.net> <CAD0gq3VoRWCiXRkgi-bnGLBfSjR-bFc0Mzp19LRr+yWP4MrYLg@mail.gmail.com>
From: Vikram Singh <vikramsingh001@gmail.com>
Date: Wed, 03 Aug 2016 15:53:00 -0000
Message-ID: <CAD0gq3WmRfe7g-F-=imZMZg9zNE=sJ8xTcveb-H9F7_92wSjVA@mail.gmail.com>
Subject: Re: OpenACC-Library-Interoperability
To: Thomas Schwinge <thomas@codesourcery.com>
Cc: Salvatore Filippone <filippone.salvatore@gmail.com>, 	=?UTF-8?Q?Vladim=C3=ADr_Fuka?= <vladimir.fuka@gmail.com>, 	James Norris <jnorris@codesourcery.com>, Chung-Lin Tang <cltang@codesourcery.com>, 	Fortran List <fortran@gcc.gnu.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-SW-Source: 2016-08/txt/msg00011.txt.bz2

Sorry, I realized I have to send it as plain text.

I finally found some time to repeat my previous attempts, but with gcc
6.1. I'll try to be a little detailed so that people don't have to
waste time in doing stuff I did.

I used the same script as before

https://github.com/olcf/OLCFHack15/blob/master/GCC5OffloadTest/auto-gcc5-of=
fload-openacc-build-install.sh

And modified it for the paths for gcc6.1. For some reason

x86_64-pc-linux-gnu-accel-nvptx-none-gcc

that builds in install/bin

does not look for directories in $LD_LIBRARY_PATH. I found this using
the -print-search-dirs option.

x86_64-pc-linux-gnu-accel-nvptx-none-gcc -print-search-dirs

So, I got a few errors for libraries not found for libraries that were
present in install/lib64. I just copied those libraries to
install/nvptx-none/lib/.

With that I can now use gcc6.

Onto the next task. I made a simple testcase. I'll copy paste the
whole thing here. Lets call this test.f90


program example_dgemm

  use iso_c_binding
  implicit none

  integer                     :: N =3D 8
  real(c_double), allocatable :: A(:,:), B(:, :), C(:, :)
  integer                     :: size_of_real, i, j
  integer*8 :: devPtrA, devPtrB, devPtrC

  size_of_real =3D 16

  allocate(A(N, N))
  allocate(B(N, N))
  allocate(C(N, N))

  !$ACC PARALLEL COPY(A)
  do i =3D 1, N
      do j =3D 1, N
          A(i, j) =3D i + j
      end do
  end do
  !$ACC END PARALLEL
  !$ACC PARALLEL COPY(B)
  do i =3D 1, N
      do j =3D 1, N
          B(i, j) =3D j
      end do
  end do
  !$ACC END PARALLEL

  call cublas_Alloc(N*N, size_of_real, devPtrA)
  call cublas_Alloc(N*N, size_of_real, devPtrB)
  call cublas_Alloc(N*N, size_of_real, devPtrC)

  ! Copy Fixed Data to the GPU
  call cublas_Set_Matrix(N, N, size_of_real, A, N, devPtrA, N)
  call cublas_Set_Matrix(N, N, size_of_real, C, N, devPtrC, N)

  ! Copy data to the GPU
  call cublas_Set_Matrix(N, N, size_of_real, B, N, devPtrB, N)

  ! Do DGEMM on the GPU
  call cublas_DGEMM('N', 'N', N, N, N, &
       1.0_c_double, devPtrA, N, devPtrB, N, 0.0_c_double, devPtrC, N)

  ! Copy data from the GPU
  call cublas_Get_matrix(N, N, size_of_real, devPtrC, N, C, N)

  call cublas_Free(devPtrA)
  call cublas_Free(devPtrB)
  call cublas_Free(devPtrC)

  deallocate(A)
  deallocate(B)
  deallocate(C)

end program example_dgemm

Basically, I can test both CUBLAS and OPENACC using this simple code.
But, using CUBLAS needs some extra stuff. So what I need to do is copy
fortran.h, fortran_common.h and fortran.c from /usr/local/cuda/src.

Now, I do

 ./rungcc6.sh gcc -Wall -g -I/usr/local/cuda/include
-I/usr/local/cuda/src -DCUBLAS_GFORTRAN -c fortran.c

Finally, I do

./rungcc6.sh gfortran -Wall -g test.f90 fortran.o -fopenacc
-foffload=3Dnvptx-none -foffload=3D-O3 -O3 -o gpu.x
-L/usr/local/cuda/lib64 -lcublas -lcudart

And finally nvprof ./gpu.x gives output

=3D=3D9155=3D=3D Profiling application: ./gpu.x
=3D=3D9155=3D=3D Profiling result:
Time(%)      Time     Calls       Avg       Min       Max  Name
 23.60%  13.952us         1  13.952us  13.952us  13.952us  MAIN__$_omp_fn$0
 23.55%  13.920us         1  13.920us  13.920us  13.920us  MAIN__$_omp_fn$1
 18.57%  10.976us        16     686ns     576ns  1.0880us  [CUDA memcpy Hto=
D]
 13.72%  8.1080us         2  4.0540us  2.0430us  6.0650us  [CUDA memcpy Hto=
H]
 12.07%  7.1360us         1  7.1360us  7.1360us  7.1360us  void
gemm_kernel2x2_core<double, bool=3D0, bool=3D0, bool=3D0, bool=3D0,
bool=3D0>(double*, double const *, double const *, int, int, int, int,
int, int, double*, double*, double, double, int)
  8.50%  5.0240us         3  1.6740us  1.5680us  1.8880us  [CUDA memcpy Dto=
H]

Clearly openacc loops run on GPU and DGEMM kernels also run on GPU.

So the only thing left now, is to put host_data so that the whole
process of cublas_alloc, cublas_set_matrix, cublas_get_matrix need not
be done.

If you can tell me how the patch is applied, I can test it with this
simple example.

Regards,
Vikram

On Mon, Aug 1, 2016 at 6:58 PM, Vikram Singh <vikramsingh001@gmail.com> wro=
te:
> I finally found some time to repeat my previous attempts, but with gcc 6.=
1.
> I'll try to be a little detailed so that people don't have to waste time =
in
> doing stuff I did.
>
> I used the same script as before
>
> https://github.com/olcf/OLCFHack15/blob/master/GCC5OffloadTest/auto-gcc5-=
offload-openacc-build-install.sh
>
> And modified it for the paths for gcc6.1. For some reason
>
> x86_64-pc-linux-gnu-accel-nvptx-none-gcc
>
> that builds in install/bin
>
> does not look for directories in $LD_LIBRARY_PATH. I found this using the
> -print-search-dirs option.
>
> x86_64-pc-linux-gnu-accel-nvptx-none-gcc -print-search-dirs
>
> So, I got a few errors for libraries not found for libraries that were
> present in install/lib64. I just copied those libraries to
> install/nvptx-none/lib/.
>
> With that I can now use gcc6.
>
> Onto the next task. I made a simple testcase. I'll copy paste the whole
> thing here. Lets call this test.f90
>
>
> program example_dgemm
>
>   use iso_c_binding
>   implicit none
>
>   integer                     :: N =3D 8
>   real(c_double), allocatable :: A(:,:), B(:, :), C(:, :)
>   integer                     :: size_of_real, i, j
>   integer*8 :: devPtrA, devPtrB, devPtrC
>
>   size_of_real =3D 16
>
>   allocate(A(N, N))
>   allocate(B(N, N))
>   allocate(C(N, N))
>
>   !$ACC PARALLEL COPY(A)
>   do i =3D 1, N
>       do j =3D 1, N
>           A(i, j) =3D i + j
>       end do
>   end do
>   !$ACC END PARALLEL
>   !$ACC PARALLEL COPY(B)
>   do i =3D 1, N
>       do j =3D 1, N
>           B(i, j) =3D j
>       end do
>   end do
>   !$ACC END PARALLEL
>
>   call cublas_Alloc(N*N, size_of_real, devPtrA)
>   call cublas_Alloc(N*N, size_of_real, devPtrB)
>   call cublas_Alloc(N*N, size_of_real, devPtrC)
>
>   ! Copy Fixed Data to the GPU
>   call cublas_Set_Matrix(N, N, size_of_real, A, N, devPtrA, N)
>   call cublas_Set_Matrix(N, N, size_of_real, C, N, devPtrC, N)
>
>   ! Copy data to the GPU
>   call cublas_Set_Matrix(N, N, size_of_real, B, N, devPtrB, N)
>
>   ! Do DGEMM on the GPU
>   call cublas_DGEMM('N', 'N', N, N, N, &
>        1.0_c_double, devPtrA, N, devPtrB, N, 0.0_c_double, devPtrC, N)
>
>   ! Copy data from the GPU
>   call cublas_Get_matrix(N, N, size_of_real, devPtrC, N, C, N)
>
>   call cublas_Free(devPtrA)
>   call cublas_Free(devPtrB)
>   call cublas_Free(devPtrC)
>
>   deallocate(A)
>   deallocate(B)
>   deallocate(C)
>
> end program example_dgemm
>
> Basically, I can test both CUBLAS and OPENACC using this simple code. But,
> using CUBLAS needs some extra stuff. So what I need to do is copy fortran=
.h,
> fortran_common.h and fortran.c from /usr/local/cuda/src.
>
> Now, I do
>
>  ./rungcc6.sh gcc -Wall -g -I/usr/local/cuda/include -I/usr/local/cuda/src
> -DCUBLAS_GFORTRAN -c fortran.c
>
> Finally, I do
>
> ./rungcc6.sh gfortran -Wall -g test.f90 fortran.o -fopenacc
> -foffload=3Dnvptx-none -foffload=3D-O3 -O3 -o gpu.x -L/usr/local/cuda/lib=
64
> -lcublas -lcudart
>
> And finally nvprof ./gpu.x gives output
>
> =3D=3D9155=3D=3D Profiling application: ./gpu.x
> =3D=3D9155=3D=3D Profiling result:
> Time(%)      Time     Calls       Avg       Min       Max  Name
>  23.60%  13.952us         1  13.952us  13.952us  13.952us  MAIN__$_omp_fn=
$0
>  23.55%  13.920us         1  13.920us  13.920us  13.920us  MAIN__$_omp_fn=
$1
>  18.57%  10.976us        16     686ns     576ns  1.0880us  [CUDA memcpy
> HtoD]
>  13.72%  8.1080us         2  4.0540us  2.0430us  6.0650us  [CUDA memcpy
> HtoH]
>  12.07%  7.1360us         1  7.1360us  7.1360us  7.1360us  void
> gemm_kernel2x2_core<double, bool=3D0, bool=3D0, bool=3D0, bool=3D0, bool=
=3D0>(double*,
> double const *, double const *, int, int, int, int, int, int, double*,
> double*, double, double, int)
>   8.50%  5.0240us         3  1.6740us  1.5680us  1.8880us  [CUDA memcpy
> DtoH]
>
> Clearly openacc loops run on GPU and DGEMM kernels also run on GPU.
>
> So the only thing left now, is to put host_data so that the whole process=
 of
> cublas_alloc, cublas_set_matrix, cublas_get_matrix need not be done.
>
> If you can tell me how the patch is applied, I can test it with this simp=
le
> example.
>
> Regards,
> Vikram
>
>
> On Thu, May 12, 2016 at 6:34 PM, Thomas Schwinge <thomas@codesourcery.com>
> wrote:
>>
>> Hi!
>>
>> On Fri, 15 Apr 2016 13:59:53 +0300, Vikram Singh
>> <vikramsingh001@gmail.com> wrote:
>> > I checked libgomp.oacc-c-c++-common/context-*.c, and they seem to be
>> > exactly what I needed to start with.
>> >
>> > If I understand it correctly, I'll anyway be trying to implement
>> > something similar with PGI compilers, so I'll have something as a
>> > starting point.
>>
>> Any progress on that already?  I had a very quick look myself, but it's
>> not as easy as I thought...  A "courageous" use of "use cublas" in
>> Fortran code compiled with gfortran results in: "Fatal Error: Can't open
>> module file 'cublas.mod' for reading".  The problem is: Fortran
>> interfacing to C libraries (which cuBLAS is).
>>
>> <http://docs.nvidia.com/cuda/cublas/index.html#appendix-b-cublas-fortran=
-bindings>
>> has some instructions how to do it.  Anyone got that to work already?
>>
>> > But again, it will need the OpenACC host_data construct to be setup
>> > for gfortran to test.
>>
>> In
>>
>> <http://news.gmane.org/find-root.php?message_id=3D%3C2b4f59d5-be38-2814-=
27bb-73aa7ffb4b8f%40codesourcery.com%3E>,
>> Chung-Lin has now posted a patch (pending review) that should make the
>> OpenACC host_data construct usable in GCC Fortran.  (Problem discussed in
>>
>> <http://news.gmane.org/find-root.php?message_id=3D%3C878u0o6wwj.fsf%40ke=
pler.schwinge.homeip.net%3E>
>> before.)
>>
>> For reference:
>>
>> > On Fri, Apr 15, 2016 at 11:57 AM, Thomas Schwinge
>> > <thomas@codesourcery.com> wrote:
>> > > On Fri, 15 Apr 2016 11:35:06 +0300, Vikram Singh
>> > > <vikramsingh001@gmail.com> wrote:
>> > >> Yes, I came to the conclusion that host_data
>> > >> would be the only way to do it in fortran.
>> > >>
>> > >> On the other hand, I though there were no plans to implement it in
>> > >> gfortran 6 either
>> > >
>> > > I still hope we'll get this (that is, <https://gcc.gnu.org/PR70598>)
>> > > fixed in time for the GCC 6.1 release.  I'll keep you posted.
>> > >
>> > >
>> > >> > I'm copying Jim, who is the author of this chapter in the
>> > >> > documentation
>> > >> > as well as the
>> > >> > libgomp/testsuite/libgomp.oacc-c-c++-common/context-*.c
>> > >> > test cases, and much of the relevant libgomp code, too, and who
>> > >> > should
>> > >> > please correct me if I'm wrong.  I'll make a note for later, that
>> > >> > we
>> > >> > should translate the libgomp.oacc-c-c++-common/context-*.c test
>> > >> > cases to
>> > >> > Fortran, and also replicate them using the OpenACC host_data
>> > >> > construct
>> > >> > (like in
>> > >> > libgomp/testsuite/libgomp.oacc-c-c++-common/host_data-1.c), and
>> > >> > the same for the documentation you referenced.  (Vikram, please
>> > >> > tell if
>> > >> > you're interested to work on these items.)
>> > >>
>> > >> I am not sure I understand what you want me to work on exactly. I am
>> > >> not really much of a C programmer, so I wouldn't be good at it. I
>> > >> would still like to help.
>> > >
>> > > Sorry for being unclear.  My idea/question has been whether you're
>> > > interested in helping by translating the documentation as well as the
>> > > libgomp.oacc-c-c++-common/context-*.c test cases from C to Fortran
>> > > (using
>> > > the OpenACC host_data construct instead of the acc_* functions).  If
>> > > yes,
>> > > then that's great, if not, then one of us will do it at some point.
>>
>>
>> Gr=C3=BC=C3=9Fe
>>  Thomas
>
>