From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 87398 invoked by alias); 5 May 2015 09:00:29 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 87361 invoked by uid 89); 5 May 2015 09:00:27 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=AWL,BAYES_50,RCVD_IN_DNSWL_LOW,SPF_PASS,T_FROM_12LTRDOM autolearn=ham version=3.3.2 X-Spam-User: qpsmtpd, 2 recipients X-HELO: relay1.mentorg.com Received: from relay1.mentorg.com (HELO relay1.mentorg.com) (192.94.38.131) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 05 May 2015 09:00:01 +0000 Received: from nat-ies.mentorg.com ([192.94.31.2] helo=SVR-IES-FEM-01.mgc.mentorg.com) by relay1.mentorg.com with esmtp id 1YpYhn-0004xi-7a from Thomas_Schwinge@mentor.com ; Tue, 05 May 2015 01:59:57 -0700 Received: from feldtkeller.schwinge.homeip.net (137.202.0.76) by SVR-IES-FEM-01.mgc.mentorg.com (137.202.0.104) with Microsoft SMTP Server id 14.3.224.2; Tue, 5 May 2015 09:59:51 +0100 From: Thomas Schwinge To: , Jakub Jelinek , CC: Bernd Schmidt , Cesar Philippidis , Chung-Lin Tang , James Norris , Joseph Myers , Julian Brown , Tom de Vries Subject: Next set of OpenACC changes: Testsuite In-Reply-To: <87sibbpfpx.fsf@schwinge.name> References: <87sibbpfpx.fsf@schwinge.name> User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/24.3.1 (x86_64-pc-linux-gnu) Date: Tue, 05 May 2015 09:00:00 -0000 Message-ID: <87egmvpfgd.fsf@schwinge.name> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha1; protocol="application/pgp-signature" X-SW-Source: 2015-05/txt/msg00291.txt.bz2 --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-length: 251556 Hi! On Tue, 05 May 2015 10:54:02 +0200, I wrote: > In follow-up messages, I'll be posting the separated parts (for easier > review) of a next set of OpenACC changes that we'd like to commit. > ChangeLog updates not yet written; will do that before commit, obviously. gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c | 46 + .../c-c++-common/goacc-gomp/nesting-fail-1.c | 25 - gcc/testsuite/c-c++-common/goacc/asyncwait-1.c | 4 +- gcc/testsuite/c-c++-common/goacc/data-2.c | 12 +- gcc/testsuite/c-c++-common/goacc/declare-1.c | 84 + gcc/testsuite/c-c++-common/goacc/declare-2.c | 67 + gcc/testsuite/c-c++-common/goacc/dtype-1.c | 113 ++ gcc/testsuite/c-c++-common/goacc/dtype-2.c | 31 + gcc/testsuite/c-c++-common/goacc/host_data-1.c | 14 + gcc/testsuite/c-c++-common/goacc/host_data-2.c | 14 + gcc/testsuite/c-c++-common/goacc/host_data-3.c | 16 + gcc/testsuite/c-c++-common/goacc/host_data-4.c | 15 + gcc/testsuite/c-c++-common/goacc/kernels-1.c | 6 - gcc/testsuite/c-c++-common/goacc/kernels-empty.c | 6 + gcc/testsuite/c-c++-common/goacc/kernels-eternal.c | 11 + .../c-c++-common/goacc/kernels-noreturn.c | 12 + gcc/testsuite/c-c++-common/goacc/loop-1.c | 2 - gcc/testsuite/c-c++-common/goacc/parallel-1.c | 6 - gcc/testsuite/c-c++-common/goacc/parallel-empty.c | 6 + .../c-c++-common/goacc/parallel-eternal.c | 11 + .../c-c++-common/goacc/parallel-noreturn.c | 12 + gcc/testsuite/c-c++-common/goacc/reduction-1.c | 25 +- gcc/testsuite/c-c++-common/goacc/reduction-2.c | 22 +- gcc/testsuite/c-c++-common/goacc/reduction-3.c | 22 +- gcc/testsuite/c-c++-common/goacc/reduction-4.c | 40 +- gcc/testsuite/c-c++-common/goacc/routine-1.c | 35 + gcc/testsuite/c-c++-common/goacc/routine-2.c | 36 + gcc/testsuite/c-c++-common/goacc/routine-3.c | 52 + gcc/testsuite/c-c++-common/goacc/routine-4.c | 87 ++ gcc/testsuite/c-c++-common/goacc/tile.c | 26 + gcc/testsuite/g++.dg/goacc/template-reduction.C | 100 ++ gcc/testsuite/g++.dg/goacc/template.C | 131 ++ gcc/testsuite/gfortran.dg/goacc/cache-1.f95 | 1 - gcc/testsuite/gfortran.dg/goacc/coarray.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/coarray_2.f90 | 1 + gcc/testsuite/gfortran.dg/goacc/combined_loop.f90 | 2 +- gcc/testsuite/gfortran.dg/goacc/cray.f95 | 1 - gcc/testsuite/gfortran.dg/goacc/declare-1.f95 | 3 +- gcc/testsuite/gfortran.dg/goacc/declare-2.f95 | 44 + gcc/testsuite/gfortran.dg/goacc/default.f95 | 17 + gcc/testsuite/gfortran.dg/goacc/dtype-1.f95 | 161 ++ gcc/testsuite/gfortran.dg/goacc/dtype-2.f95 | 39 + gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95 | 2 +- gcc/testsuite/gfortran.dg/goacc/loop-1.f95 | 1 - gcc/testsuite/gfortran.dg/goacc/loop-2.f95 | 26 +- gcc/testsuite/gfortran.dg/goacc/modules.f95 | 55 + gcc/testsuite/gfortran.dg/goacc/parameter.f95 | 1 - gcc/testsuite/gfortran.dg/goacc/update.f95 | 5 + libgomp/testsuite/ .../libgomp.oacc-c++/template-reduction.C | 102 ++ .../libgomp.oacc-c-c++-common/atomic_capture-1.c | 866 +++++++++++ .../libgomp.oacc-c-c++-common/atomic_capture-2.c | 1626 ++++++++++++++++= ++++ .../libgomp.oacc-c-c++-common/atomic_update-1.c | 760 +++++++++ .../libgomp.oacc-c-c++-common/clauses-1.c | 26 + .../testsuite/libgomp.oacc-c-c++-common/data-2.c | 44 +- .../testsuite/libgomp.oacc-c-c++-common/data-3.c | 18 +- .../libgomp.oacc-c-c++-common/data-clauses.h | 202 +++ .../libgomp.oacc-c-c++-common/kernels-1.c | 182 +-- .../testsuite/libgomp.oacc-c-c++-common/lib-69.c | 70 +- .../testsuite/libgomp.oacc-c-c++-common/lib-70.c | 79 +- .../testsuite/libgomp.oacc-c-c++-common/lib-71.c | 55 +- .../testsuite/libgomp.oacc-c-c++-common/lib-72.c | 60 +- .../testsuite/libgomp.oacc-c-c++-common/lib-73.c | 64 +- .../testsuite/libgomp.oacc-c-c++-common/lib-74.c | 91 +- .../testsuite/libgomp.oacc-c-c++-common/lib-75.c | 89 +- .../testsuite/libgomp.oacc-c-c++-common/lib-76.c | 88 +- .../testsuite/libgomp.oacc-c-c++-common/lib-77.c | 91 +- .../testsuite/libgomp.oacc-c-c++-common/lib-78.c | 91 +- .../testsuite/libgomp.oacc-c-c++-common/lib-79.c | 91 +- .../testsuite/libgomp.oacc-c-c++-common/lib-80.c | 95 +- .../testsuite/libgomp.oacc-c-c++-common/lib-81.c | 106 +- .../testsuite/libgomp.oacc-c-c++-common/lib-82.c | 43 +- .../testsuite/libgomp.oacc-c-c++-common/lib-83.c | 22 +- .../libgomp.oacc-c-c++-common/parallel-1.c | 204 +-- .../libgomp.oacc-c-c++-common/routine-1.c | 40 + .../libgomp.oacc-c-c++-common/routine-2.c | 41 + libgomp/testsuite/libgomp.oacc-c-c++-common/subr.h | 44 +- .../testsuite/libgomp.oacc-c-c++-common/subr.ptx | 222 +-- .../testsuite/libgomp.oacc-c-c++-common/timer.h | 103 -- .../libgomp.oacc-fortran/atomic_capture-1.f90 | 784 ++++++++++ .../libgomp.oacc-fortran/atomic_update-1.f90 | 338 ++++ libgomp/testsuite/libgomp.oacc-fortran/cache-1.f90 | 26 + .../testsuite/libgomp.oacc-fortran/clauses-1.f90 | 290 ++++ libgomp/testsuite/libgomp.oacc-fortran/data-1.f90 | 231 ++- libgomp/testsuite/libgomp.oacc-fortran/data-2.f90 | 50 + libgomp/testsuite/libgomp.oacc-fortran/data-3.f90 | 34 +- .../testsuite/libgomp.oacc-fortran/data-4-2.f90 | 19 +- libgomp/testsuite/libgomp.oacc-fortran/data-4.f90 | 19 +- .../testsuite/libgomp.oacc-fortran/declare-1.f90 | 229 +++ libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90 | 24 + libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90 | 28 + libgomp/testsuite/libgomp.oacc-fortran/lib-14.f90 | 79 + libgomp/testsuite/libgomp.oacc-fortran/lib-15.f90 | 52 + .../testsuite/libgomp.oacc-fortran/routine-5.f90 | 27 + diff --git gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c gcc/testsuite/= c-c++-common/goacc-gomp/nesting-1.c index df45bcf..b38e181 100644 --- gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c +++ gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c @@ -1,4 +1,50 @@ void +f_acc_data (void) +{ +#pragma acc data + { + int i; +#pragma omp atomic write + i =3D 0; + } +} + +void +f_acc_kernels (void) +{ +#pragma acc kernels + { + int i; +#pragma omp atomic write + i =3D 0; + } +} + +void +f_acc_loop (void) +{ + int i; + +#pragma acc loop + for (i =3D 0; i < 2; ++i) + { +#pragma omp atomic write + i =3D 0; + } +} + +void +f_acc_parallel (void) +{ +#pragma acc parallel + { + int i; +#pragma omp atomic write + i =3D 0; + } +} + +void f_omp_parallel (void) { #pragma omp parallel diff --git gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c gcc/tests= uite/c-c++-common/goacc-gomp/nesting-fail-1.c index 411fb5f..14c6aa6 100644 --- gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c +++ gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c @@ -216,12 +216,6 @@ f_acc_parallel (void) =20 #pragma acc parallel { -#pragma omp atomic write - i =3D 0; /* { dg-error "non-OpenACC construct inside of OpenACC region= " } */ - } - -#pragma acc parallel - { #pragma omp ordered /* { dg-error "non-OpenACC construct inside of OpenACC= region" } */ ; } @@ -286,12 +280,6 @@ f_acc_kernels (void) =20 #pragma acc kernels { -#pragma omp atomic write - i =3D 0; /* { dg-error "non-OpenACC construct inside of OpenACC region= " } */ - } - -#pragma acc kernels - { #pragma omp ordered /* { dg-error "non-OpenACC construct inside of OpenACC= region" } */ ; } @@ -356,12 +344,6 @@ f_acc_data (void) =20 #pragma acc data { -#pragma omp atomic write - i =3D 0; /* { dg-error "non-OpenACC construct inside of OpenACC region= " } */ - } - -#pragma acc data - { #pragma omp ordered /* { dg-error "non-OpenACC construct inside of OpenACC= region" } */ ; } @@ -434,13 +416,6 @@ f_acc_loop (void) #pragma acc loop for (i =3D 0; i < 2; ++i) { -#pragma omp atomic write - i =3D 0; /* { dg-error "non-OpenACC construct inside of OpenACC regi= on" } */ - } - -#pragma acc loop - for (i =3D 0; i < 2; ++i) - { #pragma omp ordered /* { dg-error "non-OpenACC construct inside of OpenACC= region" } */ ; } diff --git gcc/testsuite/c-c++-common/goacc/asyncwait-1.c gcc/testsuite/c-c= ++-common/goacc/asyncwait-1.c index ccc0106..c6b81b1 100644 --- gcc/testsuite/c-c++-common/goacc/asyncwait-1.c +++ gcc/testsuite/c-c++-common/goacc/asyncwait-1.c @@ -116,7 +116,7 @@ f (int N, float *a, float *b) } =20 #pragma acc parallel copyin (a[0:N]) copy (b[0:N]) wait (1 /* { dg-error "= expected '\\\)' before end of line" } */ - /* { dg-error "expected integer expression before '\\\)'" "" { target = c++ } 118 } */ + /* { dg-error "expected integer expression list before" "" { target c+= + } 118 } */ { for (ii =3D 0; ii < N; ii++) b[ii] =3D a[ii]; @@ -171,7 +171,7 @@ f (int N, float *a, float *b) #pragma acc wait (1,2,,) /* { dg-error "expected (primary-|)expression bef= ore" } */ =20 #pragma acc wait (1 /* { dg-error "expected '\\\)' before end of line" } */ - /* { dg-error "expected integer expression before '\\\)'" "" { target = c++ } 173 } */ + /* { dg-error "expected integer expression list before" "" { target c+= + } 173 } */ =20 #pragma acc wait (1,*) /* { dg-error "expected (primary-|)expression befor= e" } */ =20 diff --git gcc/testsuite/c-c++-common/goacc/data-2.c gcc/testsuite/c-c++-co= mmon/goacc/data-2.c index a67d8a4..1043bf8a 100644 --- gcc/testsuite/c-c++-common/goacc/data-2.c +++ gcc/testsuite/c-c++-common/goacc/data-2.c @@ -10,12 +10,14 @@ foo (void) #pragma acc exit data delete (a) if (0) #pragma acc exit data copyout (b) if (a) #pragma acc exit data delete (b) -#pragma acc enter /* { dg-error "expected 'data' in" } */ -#pragma acc exit /* { dg-error "expected 'data' in" } */ +#pragma acc enter /* { dg-error "expected 'data' after" } */ +#pragma acc exit /* { dg-error "expected 'data' after" } */ #pragma acc enter data /* { dg-error "has no data movement clause" } */ -#pragma acc exit data /* { dg-error "has no data movement clause" } */ -#pragma acc enter Data /* { dg-error "invalid pragma before" } */ -#pragma acc exit copyout (b) /* { dg-error "invalid pragma before" } */ +#pragma acc exit data /* { dg-error "no data movement clause" } */ +#pragma acc enter Data /* { dg-error "expected 'data' after" } */ +#pragma acc exit copyout (b) /* { dg-error "expected 'data' after" } */ +#pragma acc enter for /* { dg-error "expected 'data' after" } */ +#pragma acc enter data2 /* { dg-error "expected 'data' after" } */ } =20 /* { dg-error "has no data movement clause" "" { target *-*-* } 8 } */ diff --git gcc/testsuite/c-c++-common/goacc/declare-1.c gcc/testsuite/c-c++= -common/goacc/declare-1.c new file mode 100644 index 0000000..cf50f02 --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/declare-1.c @@ -0,0 +1,84 @@ +/* Test valid uses of declare directive. */ +/* { dg-do compile } */ +/* { dg-skip-if "not yet" { c++ } } */ + +int v0; +#pragma acc declare create(v0) + +int v1; +#pragma acc declare copyin(v1) + +int *v2; +#pragma acc declare deviceptr(v2) + +int v3; +#pragma acc declare device_resident(v3) + +int v4; +#pragma acc declare link(v4) + +int v5, v6, v7, v8; +#pragma acc declare create(v5, v6) copyin(v7, v8) + +void +f (void) +{ + int va0; +#pragma acc declare create(va0) + + int va1; +#pragma acc declare copyin(va1) + + int *va2; +#pragma acc declare deviceptr(va2) + + int va3; +#pragma acc declare device_resident(va3) + + extern int ve0; +#pragma acc declare create(ve0) + + extern int ve1; +#pragma acc declare copyin(ve1) + + extern int *ve2; +#pragma acc declare deviceptr(ve2) + + extern int ve3; +#pragma acc declare device_resident(ve3) + + extern int ve4; +#pragma acc declare link(ve4) + + int va5; +#pragma acc declare copy(va5) + + int va6; +#pragma acc declare copyout(va6) + + int va7; +#pragma acc declare present(va7) + + int va8; +#pragma acc declare present_or_copy(va8) + + int va9; +#pragma acc declare present_or_copyin(va9) + + int va10; +#pragma acc declare present_or_copyout(va10) + + int va11; +#pragma acc declare present_or_create(va11) + + a: + { + int va0; +#pragma acc declare create(va0) + if (v1) + goto a; + else + goto b; + } + b:; +} diff --git gcc/testsuite/c-c++-common/goacc/declare-2.c gcc/testsuite/c-c++= -common/goacc/declare-2.c new file mode 100644 index 0000000..a2b5d6f --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/declare-2.c @@ -0,0 +1,67 @@ +/* Test invalid uses of declare directive. */ +/* { dg-do compile } */ +/* { dg-skip-if "not yet" { c++ } } */ + +#pragma acc declare /* { dg-error "no valid clauses" } */ + +#pragma acc declare create(undeclared) /* { dg-error "undeclared" } */ +/* { dg-error "no valid clauses" "second error" { target *-*-* } 7 } */ + +int v0[10]; +#pragma acc declare create(v0[1:3]) /* { dg-error "subarray" } */ + +int v1; +#pragma acc declare create(v1, v1) /* { dg-error "more than once" } */ + +int v2; +#pragma acc declare create(v2) /* { dg-message "previous directive" } */ +#pragma acc declare copyin(v2) /* { dg-error "more than once" } */ + +int v3; +#pragma acc declare copy(v3) /* { dg-error "at file scope" } */ + +int v4; +#pragma acc declare copyout(v4) /* { dg-error "at file scope" } */ + +int v5; +#pragma acc declare present(v5) /* { dg-error "at file scope" } */ + +int v6; +#pragma acc declare present_or_copy(v6) /* { dg-error "at file scope" } */ + +int v7; +#pragma acc declare present_or_copyin(v7) /* { dg-error "at file scope" } = */ + +int v8; +#pragma acc declare present_or_copyout(v8) /* { dg-error "at file scope" }= */ + +int v9; +#pragma acc declare present_or_create(v9) /* { dg-error "at file scope" } = */ + +void +f (void) +{ + int va0; +#pragma acc declare link(va0) /* { dg-error "invalid variable" } */ + + extern int ve0; +#pragma acc declare copy(ve0) /* { dg-error "invalid use of" } */ + + extern int ve1; +#pragma acc declare copyout(ve1) /* { dg-error "invalid use of" } */ + + extern int ve2; +#pragma acc declare present(ve2) /* { dg-error "invalid use of" } */ + + extern int ve3; +#pragma acc declare present_or_copy(ve3) /* { dg-error "invalid use of" } = */ + + extern int ve4; +#pragma acc declare present_or_copyin(ve4) /* { dg-error "invalid use of" = } */ + + extern int ve5; +#pragma acc declare present_or_copyout(ve5) /* { dg-error "invalid use of"= } */ + + extern int ve6; +#pragma acc declare present_or_create(ve6) /* { dg-error "invalid use of" = } */ +} diff --git gcc/testsuite/c-c++-common/goacc/dtype-1.c gcc/testsuite/c-c++-c= ommon/goacc/dtype-1.c new file mode 100644 index 0000000..2b4569e --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/dtype-1.c @@ -0,0 +1,113 @@ +/* { dg-do compile } */ +/* { dg-options "-fopenacc -fdump-tree-omplower" } */ + +void +test () +{ + int i1; + + /* ACC PARALLEL DEVICE_TYPE: */ + +#pragma acc parallel device_type (nVidia) async (1) num_gangs (100) num_wo= rkers (100) vector_length (32) wait (1) + { + } + +#pragma acc parallel async (1) num_gangs (1) num_workers (1) vector_length= (1) wait (1) dtype (nvidia) async (2) num_gangs (200) num_workers (200) ve= ctor_length (64) wait (2) + { + } + +#pragma acc parallel async (1) num_gangs (1) num_workers (1) vector_length= (1) wait (1) dtype (nvidia) async (3) num_gangs (300) num_workers (300) ve= ctor_length (128) wait (3) device_type (*) async (10) num_gangs (10) num_wo= rkers (10) vector_length (10) wait (10) + { + } + +#pragma acc parallel async (1) num_gangs (1) num_workers (1) vector_length= (1) wait (1) device_type (nvidia_ptx) async (3) num_gangs (300) num_worker= s (300) vector_length (128) wait (3) dtype (*) async (10) num_gangs (10) nu= m_workers (10) vector_length (10) wait (10) + { + } + + /* ACC KERNELS DEVICE_TYPE: */ + +#pragma acc kernels device_type (nvidia) async wait + { + } + +#pragma acc kernels async wait dtype (nvidia) async (1) wait (1) + { + } + +#pragma acc kernels async wait dtype (nvidia) async (2) wait (2) device_ty= pe (*) async (0) wait (0) + { + } + +#pragma acc kernels async wait device_type (nvidia_ptx) async (1) wait (1)= dtype (*) async (0) wait (0) + { + } + + /* ACC LOOP DEVICE_TYPE: */ + +#pragma acc parallel +#pragma acc loop dtype (nVidia) gang + for (i1 =3D 1; i1 < 10; i1++) + { + } + +#pragma acc parallel +#pragma acc loop device_type (nVidia) gang dtype (*) worker + for (i1 =3D 1; i1 < 10; i1++) + { + } + +#pragma acc parallel +#pragma acc loop dtype (nVidiaGPU) gang device_type (*) vector + for (i1 =3D 1; i1 < 10; i1++) + { + } + + /* ACC UPDATE DEVICE_TYPE: */ + +#pragma acc update host(i1) async(1) wait (1) + +#pragma acc update host(i1) device_type(nvidia) async(2) wait (2) + +#pragma acc update host(i1) async(1) wait (1) device_type(nvidia) async(3)= wait (3) + +#pragma acc update host(i1) async(4) wait (4) device_type(nvidia) async(5)= wait (5) dtype (*) async (6) wait (6) + +#pragma acc update host(i1) async(4) wait (4) dtype(nvidia1) async(5) wait= (5) dtype (*) async (6) wait (6) +} + +/* ACC ROUTINE DEVICE_TYPE: */ + +#pragma acc routine (foo1) device_type (nvidia) gang +#pragma acc routine (foo2) device_type (nvidia) worker +#pragma acc routine (foo3) dtype (nvidia) vector +#pragma acc routine (foo5) device_type (nvidia) bind (foo) +#pragma acc routine (foo6) device_type (nvidia) gang device_type (*) worker +#pragma acc routine (foo7) dtype (nvidia) worker dtype (*) vector +#pragma acc routine (foo8) dtype (nvidia) vector device_type (*) gang +#pragma acc routine (foo9) device_type (nvidia) vector device_type (*) wor= ker +#pragma acc routine (foo10) device_type (nvidia) bind (foo) dtype (*) gang +#pragma acc routine (foo11) device_type (gpu) gang device_type (*) worker +#pragma acc routine (foo12) device_type (gpu) worker dtype (*) worker +#pragma acc routine (foo13) device_type (gpu) vector device_type (*) worker +#pragma acc routine (foo14) dtype (gpu) worker dtype (*) worker +#pragma acc routine (foo15) dtype (gpu) bind (foo) dtype (*) gang + +/* { dg-final { scan-tree-dump-times "oacc_parallel wait\\(1\\) vector_len= gth\\(32\\) num_workers\\(100\\) num_gangs\\(100\\) async\\(1\\)" 1 "omplow= er" } } */ + +/* { dg-final { scan-tree-dump-times "oacc_parallel wait\\(1\\) vector_len= gth\\(1\\) num_workers\\(1\\) num_gangs\\(1\\) async\\(1\\) wait\\(2\\) vec= tor_length\\(64\\) num_workers\\(200\\) num_gangs\\(200\\) async\\(2\\)" 1 = "omplower" } } */ + +/* { dg-final { scan-tree-dump-times "acc_parallel wait\\(1\\) vector_leng= th\\(1\\) num_workers\\(1\\) num_gangs\\(1\\) async\\(1\\) wait\\(3\\) vect= or_length\\(128\\) num_workers\\(300\\) num_gangs\\(300\\) async\\(3" 1 "om= plower" } } */ + +/* { dg-final { scan-tree-dump-times "oacc_kernels async\\(-1\\)" 4 "omplo= wer" } } */ + +/* { dg-final { scan-tree-dump-times "oacc_kernels async\\(-1\\) wait\\(2\= \) async\\(2\\)" 1 "omplower" } } */ + +/* { dg-final { scan-tree-dump-times "oacc_kernels async\\(-1\\) wait\\(0\= \) async\\(0\\)" 1 "omplower" } } */ + +/* { dg-final { scan-tree-dump-times "acc loop gang private\\(i1.0\\) priv= ate\\(i1\\)" 1 "omplower" } } */ + +/* { dg-final { scan-tree-dump-times "acc loop gang private\\(i1.1\\) priv= ate\\(i1\\)" 1 "omplower" } } */ + +/* { dg-final { scan-tree-dump-times "acc loop vector private\\(i1.2\\) pr= ivate\\(i1\\)" 1 "omplower" } } */ + +/* { dg-final { cleanup-tree-dump "omplower" } } */ diff --git gcc/testsuite/c-c++-common/goacc/dtype-2.c gcc/testsuite/c-c++-c= ommon/goacc/dtype-2.c new file mode 100644 index 0000000..b0bd247 --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/dtype-2.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ + +void +test () +{ + int i1, i2; + + /* ACC PARALLEL DEVICE_TYPE: */ + +#pragma acc parallel dtype (nVidia) async (1) num_gangs (100) num_workers = (100) vector_length (32) wait (1) copy (i1) /* { dg-error "not valid" } */ + { + } + + /* ACC KERNELS DEVICE_TYPE: */ + +#pragma acc kernels device_type (nvidia) async wait copy (i1) /* { dg-erro= r "not valid" } */ + { + } + + /* ACC LOOP DEVICE_TYPE: */ + +#pragma acc parallel +#pragma acc loop device_type (nVidia) gang private (i2) /* { dg-error "not= valid" } */ + for (i1 =3D 1; i1 < 10; i1++) + { + } + + /* ACC UPDATE DEVICE_TYPE: */ + +#pragma acc update host(i1) dtype (nvidia) async(1) wait (1) self (i2) /* = { dg-error "not valid" } */ +} diff --git gcc/testsuite/c-c++-common/goacc/host_data-1.c gcc/testsuite/c-c= ++-common/goacc/host_data-1.c new file mode 100644 index 0000000..5e8240f --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/host_data-1.c @@ -0,0 +1,14 @@ +/* Test valid use of host_data directive. */ +/* { dg-do compile } */ + +int v0; +int v1[3][3]; + +void +f (void) +{ + int v2 =3D 3; +#pragma acc host_data use_device(v2, v0, v1) + ; +} +/* { dg-bogus "sorry, unimplemented: directive not yet implemented" "host_= data" { xfail *-*-* } 11 } */ diff --git gcc/testsuite/c-c++-common/goacc/host_data-2.c gcc/testsuite/c-c= ++-common/goacc/host_data-2.c new file mode 100644 index 0000000..92fa97b --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/host_data-2.c @@ -0,0 +1,14 @@ +/* Test invalid use of host_data directive. */ +/* { dg-do compile } */ + +int v0; +#pragma acc host_data use_device(v0) /* { dg-error "expected" } */ + +void +f (void) +{ + int v2 =3D 3; +#pragma acc host_data copy(v2) /* { dg-error "not valid for" } */ + ; +} +/* { dg-bogus "sorry, unimplemented: directive not yet implemented" "host_= data" { xfail *-*-* } 11 } */ diff --git gcc/testsuite/c-c++-common/goacc/host_data-3.c gcc/testsuite/c-c= ++-common/goacc/host_data-3.c new file mode 100644 index 0000000..580f566 --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/host_data-3.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ + +int main (int argc, char* argv[]) +{ + int x =3D 5, y; + + #pragma acc enter data copyin (x) + #pragma acc host_data use_device (x) + { + y =3D x; + } + #pragma acc exit data delete (x) + + return y - 5; +} +/* { dg-bogus "sorry, unimplemented: directive not yet implemented" "host_= data" { xfail *-*-* } 8 } */ diff --git gcc/testsuite/c-c++-common/goacc/host_data-4.c gcc/testsuite/c-c= ++-common/goacc/host_data-4.c new file mode 100644 index 0000000..61b1c5b --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/host_data-4.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ + +int main (int argc, char* argv[]) +{ + int x[100]; + + #pragma acc enter data copyin (x) + /* Specifying an array index is not valid for host_data/use_device. */ + #pragma acc host_data use_device (x[4]) /* { dg-error "expected \\\')' b= efore '\\\[' token" } */ + ; + #pragma acc exit data delete (x) + + return 0; +} +/* { dg-bogus "sorry, unimplemented: directive not yet implemented" "host_= data" { xfail *-*-* } 9 } */ diff --git gcc/testsuite/c-c++-common/goacc/kernels-1.c gcc/testsuite/c-c++= -common/goacc/kernels-1.c deleted file mode 100644 index e91b81c..0000000 --- gcc/testsuite/c-c++-common/goacc/kernels-1.c +++ /dev/null @@ -1,6 +0,0 @@ -void -foo (void) -{ -#pragma acc kernels - ; -} diff --git gcc/testsuite/c-c++-common/goacc/kernels-empty.c gcc/testsuite/c= -c++-common/goacc/kernels-empty.c new file mode 100644 index 0000000..e91b81c --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/kernels-empty.c @@ -0,0 +1,6 @@ +void +foo (void) +{ +#pragma acc kernels + ; +} diff --git gcc/testsuite/c-c++-common/goacc/kernels-eternal.c gcc/testsuite= /c-c++-common/goacc/kernels-eternal.c new file mode 100644 index 0000000..edc17d2 --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/kernels-eternal.c @@ -0,0 +1,11 @@ +int +main (void) +{ +#pragma acc kernels + { + while (1) + ; + } + + return 0; +} diff --git gcc/testsuite/c-c++-common/goacc/kernels-noreturn.c gcc/testsuit= e/c-c++-common/goacc/kernels-noreturn.c new file mode 100644 index 0000000..1a8cc67 --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/kernels-noreturn.c @@ -0,0 +1,12 @@ +int +main (void) +{ + +#pragma acc kernels + { + __builtin_abort (); + } + + return 0; +} + diff --git gcc/testsuite/c-c++-common/goacc/loop-1.c gcc/testsuite/c-c++-co= mmon/goacc/loop-1.c index fea40e0..5e1a248 100644 --- gcc/testsuite/c-c++-common/goacc/loop-1.c +++ gcc/testsuite/c-c++-common/goacc/loop-1.c @@ -1,5 +1,3 @@ -/* { dg-skip-if "not yet" { c++ } } */ - int test1() { int i, j, k, b[10]; diff --git gcc/testsuite/c-c++-common/goacc/parallel-1.c gcc/testsuite/c-c+= +-common/goacc/parallel-1.c deleted file mode 100644 index a860526..0000000 --- gcc/testsuite/c-c++-common/goacc/parallel-1.c +++ /dev/null @@ -1,6 +0,0 @@ -void -foo (void) -{ -#pragma acc parallel - ; -} diff --git gcc/testsuite/c-c++-common/goacc/parallel-empty.c gcc/testsuite/= c-c++-common/goacc/parallel-empty.c new file mode 100644 index 0000000..a860526 --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/parallel-empty.c @@ -0,0 +1,6 @@ +void +foo (void) +{ +#pragma acc parallel + ; +} diff --git gcc/testsuite/c-c++-common/goacc/parallel-eternal.c gcc/testsuit= e/c-c++-common/goacc/parallel-eternal.c new file mode 100644 index 0000000..51eac76 --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/parallel-eternal.c @@ -0,0 +1,11 @@ +int +main (void) +{ +#pragma acc parallel + { + while (1) + ; + } + + return 0; +} diff --git gcc/testsuite/c-c++-common/goacc/parallel-noreturn.c gcc/testsui= te/c-c++-common/goacc/parallel-noreturn.c new file mode 100644 index 0000000..ec840bd --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/parallel-noreturn.c @@ -0,0 +1,12 @@ +int +main (void) +{ + +#pragma acc parallel + { + __builtin_abort (); + } + + return 0; +} + diff --git gcc/testsuite/c-c++-common/goacc/reduction-1.c gcc/testsuite/c-c= ++-common/goacc/reduction-1.c index 0f50082..8f7c70d 100644 --- gcc/testsuite/c-c++-common/goacc/reduction-1.c +++ gcc/testsuite/c-c++-common/goacc/reduction-1.c @@ -22,20 +22,17 @@ main(void) for (i =3D 0; i < n; i++) result *=3D array[i]; =20 -// result =3D 0; -// vresult =3D 0; -//=20 -// /* 'max' reductions. */ -// #pragma acc parallel vector_length (vl) -// #pragma acc loop reduction (+:result) -// for (i =3D 0; i < n; i++) -// result =3D result > array[i] ? result : array[i]; -// -// /* 'min' reductions. */ -// #pragma acc parallel vector_length (vl) -// #pragma acc loop reduction (+:result) -// for (i =3D 0; i < n; i++) -// result =3D result < array[i] ? result : array[i]; + /* 'max' reductions. */ +#pragma acc parallel vector_length (vl) +#pragma acc loop reduction (max:result) + for (i =3D 0; i < n; i++) + result =3D result > array[i] ? result : array[i]; + + /* 'min' reductions. */ +#pragma acc parallel vector_length (vl) +#pragma acc loop reduction (min:result) + for (i =3D 0; i < n; i++) + result =3D result < array[i] ? result : array[i]; =20 /* '&' reductions. */ #pragma acc parallel vector_length (vl) diff --git gcc/testsuite/c-c++-common/goacc/reduction-2.c gcc/testsuite/c-c= ++-common/goacc/reduction-2.c index 1f95138..7ff125f 100644 --- gcc/testsuite/c-c++-common/goacc/reduction-2.c +++ gcc/testsuite/c-c++-common/goacc/reduction-2.c @@ -22,17 +22,17 @@ main(void) for (i =3D 0; i < n; i++) result *=3D array[i]; =20 -// /* 'max' reductions. */ -// #pragma acc parallel vector_length (vl) -// #pragma acc loop reduction (+:result) -// for (i =3D 0; i < n; i++) -// result =3D result > array[i] ? result : array[i]; -//=20 -// /* 'min' reductions. */ -// #pragma acc parallel vector_length (vl) -// #pragma acc loop reduction (+:result) -// for (i =3D 0; i < n; i++) -// result =3D result < array[i] ? result : array[i]; + /* 'max' reductions. */ +#pragma acc parallel vector_length (vl) +#pragma acc loop reduction (max:result) + for (i =3D 0; i < n; i++) + result =3D result > array[i] ? result : array[i]; + + /* 'min' reductions. */ +#pragma acc parallel vector_length (vl) +#pragma acc loop reduction (min:result) + for (i =3D 0; i < n; i++) + result =3D result < array[i] ? result : array[i]; =20 /* '&&' reductions. */ #pragma acc parallel vector_length (vl) diff --git gcc/testsuite/c-c++-common/goacc/reduction-3.c gcc/testsuite/c-c= ++-common/goacc/reduction-3.c index 476e375..cd44559 100644 --- gcc/testsuite/c-c++-common/goacc/reduction-3.c +++ gcc/testsuite/c-c++-common/goacc/reduction-3.c @@ -22,17 +22,17 @@ main(void) for (i =3D 0; i < n; i++) result *=3D array[i]; =20 -// /* 'max' reductions. */ -// #pragma acc parallel vector_length (vl) -// #pragma acc loop reduction (+:result) -// for (i =3D 0; i < n; i++) -// result =3D result > array[i] ? result : array[i]; -//=20 -// /* 'min' reductions. */ -// #pragma acc parallel vector_length (vl) -// #pragma acc loop reduction (+:result) -// for (i =3D 0; i < n; i++) -// result =3D result < array[i] ? result : array[i]; + /* 'max' reductions. */ +#pragma acc parallel vector_length (vl) +#pragma acc loop reduction (max:result) + for (i =3D 0; i < n; i++) + result =3D result > array[i] ? result : array[i]; + + /* 'min' reductions. */ +#pragma acc parallel vector_length (vl) +#pragma acc loop reduction (min:result) + for (i =3D 0; i < n; i++) + result =3D result < array[i] ? result : array[i]; =20 /* '&&' reductions. */ #pragma acc parallel vector_length (vl) diff --git gcc/testsuite/c-c++-common/goacc/reduction-4.c gcc/testsuite/c-c= ++-common/goacc/reduction-4.c index 73dde86..ec3a9c9 100644 --- gcc/testsuite/c-c++-common/goacc/reduction-4.c +++ gcc/testsuite/c-c++-common/goacc/reduction-4.c @@ -16,25 +16,29 @@ main(void) for (i =3D 0; i < n; i++) result +=3D array[i]; =20 - /* Needs support for complex multiplication. */ + /* '*' reductions. */ +#pragma acc parallel vector_length (vl) +#pragma acc loop reduction (*:result) + for (i =3D 0; i < n; i++) + result *=3D array[i]; =20 -// /* '*' reductions. */ -// #pragma acc parallel vector_length (vl) -// #pragma acc loop reduction (*:result) -// for (i =3D 0; i < n; i++) -// result *=3D array[i]; -// -// /* 'max' reductions. */ -// #pragma acc parallel vector_length (vl) -// #pragma acc loop reduction (+:result) -// for (i =3D 0; i < n; i++) -// result =3D result > array[i] ? result : array[i]; -//=20 -// /* 'min' reductions. */ -// #pragma acc parallel vector_length (vl) -// #pragma acc loop reduction (+:result) -// for (i =3D 0; i < n; i++) -// result =3D result < array[i] ? result : array[i]; + /* 'max' reductions. */ +#if 0 + // error: 'result' has invalid type for 'reduction(max)' +#pragma acc parallel vector_length (vl) +#pragma acc loop reduction (max:result) + for (i =3D 0; i < n; i++) + result =3D result > array[i] ? result : array[i]; +#endif + + /* 'min' reductions. */ +#if 0 + // error: 'result' has invalid type for 'reduction(min)' +#pragma acc parallel vector_length (vl) +#pragma acc loop reduction (min:result) + for (i =3D 0; i < n; i++) + result =3D result < array[i] ? result : array[i]; +#endif =20 /* '&&' reductions. */ #pragma acc parallel vector_length (vl) diff --git gcc/testsuite/c-c++-common/goacc/routine-1.c gcc/testsuite/c-c++= -common/goacc/routine-1.c new file mode 100644 index 0000000..1f89fdb --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/routine-1.c @@ -0,0 +1,35 @@ +void *malloc (__SIZE_TYPE__); +void free (void *); + +#pragma acc routine +int +fact (int n) +{ + if (n =3D=3D 0 || n =3D=3D 1) + return 1; + + return n * fact (n - 1); +} + +int +main(int argc, char **argv) +{ + int *a, i, n =3D 10; + + a =3D (int *)malloc (sizeof (int) * n); + +#pragma acc parallel copy (a[0:n]) vector_length (5) + { +#pragma acc loop + for (i =3D 0; i < n; i++) + a[i] =3D fact (i); + } + + for (i =3D 0; i < n; i++) + if (fact (i) !=3D a[i]) + return -1; + + free (a); + + return 0; +} diff --git gcc/testsuite/c-c++-common/goacc/routine-2.c gcc/testsuite/c-c++= -common/goacc/routine-2.c new file mode 100644 index 0000000..fe2e7f7 --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/routine-2.c @@ -0,0 +1,36 @@ +void *malloc (__SIZE_TYPE__); +void free (void *); + +#pragma acc routine (fact) + +int +fact (int n) +{ + if (n =3D=3D 0 || n =3D=3D 1) + return 1; + + return n * fact (n - 1); +} + +int +main(int argc, char **argv) +{ + int *a, i, n =3D 10; + + a =3D (int *)malloc (sizeof (int) * n); + +#pragma acc parallel copy (a[0:n]) vector_length (5) + { +#pragma acc loop + for (i =3D 0; i < n; i++) + a[i] =3D fact (i); + } + + for (i =3D 0; i < n; i++) + if (fact (i) !=3D a[i]) + return -1; + + free (a); + + return 0; +} diff --git gcc/testsuite/c-c++-common/goacc/routine-3.c gcc/testsuite/c-c++= -common/goacc/routine-3.c new file mode 100644 index 0000000..e35dfc1 --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/routine-3.c @@ -0,0 +1,52 @@ +/* Test valid use of clauses with routine. */ +/* { dg-do compile } */ + +#pragma acc routine gang +void +f1 (void) +{ +} + +#pragma acc routine worker +void +f2 (void) +{ +} + +#pragma acc routine vector +void +f3 (void) +{ +} + +#pragma acc routine seq +void +f4 (void) +{ +} + +#pragma acc routine bind (f4a) +void +f5 (void) +{ +} + +typedef int T; + +#pragma acc routine bind (T) +void +f6 (void) +{ +} + +#pragma acc routine bind ("f7a") +void +f7 (void) +{ +} + +#pragma acc routine nohost +void +f8 (void) +{ +} diff --git gcc/testsuite/c-c++-common/goacc/routine-4.c gcc/testsuite/c-c++= -common/goacc/routine-4.c new file mode 100644 index 0000000..682d901 --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/routine-4.c @@ -0,0 +1,87 @@ +/* Test invalid use of clauses with routine. */ +/* { dg-do compile } */ + +#pragma acc routine gang worker /* { dg-error "invalid combination" } */ +void +f1 (void) +{ +} + +#pragma acc routine worker gang /* { dg-error "invalid combination" } */ +void +f1a (void) +{ +} + +#pragma acc routine gang vector /* { dg-error "invalid combination" } */ +void +f2 (void) +{ +} + +#pragma acc routine vector gang /* { dg-error "invalid combination" } */ +void +f2a (void) +{ +} + +#pragma acc routine gang seq /* { dg-error "invalid combination" } */ +void +f3 (void) +{ +} + +#pragma acc routine seq gang /* { dg-error "invalid combination" } */ +void +f3a (void) +{ +} + +#pragma acc routine worker vector /* { dg-error "invalid combination" } */ +void +f4 (void) +{ +} + +#pragma acc routine vector worker /* { dg-error "invalid combination" } */ +void +f4a (void) +{ +} + +#pragma acc routine worker seq /* { dg-error "invalid combination" } */ +void +f5 (void) +{ +} + +#pragma acc routine seq worker /* { dg-error "invalid combination" } */ +void +f5a (void) +{ +} + +#pragma acc routine vector seq /* { dg-error "invalid combination" } */ +void +f6 (void) +{ +} + +#pragma acc routine seq vector /* { dg-error "invalid combination" } */ +void +f6a (void) +{ +} + +#pragma acc routine (g1) gang worker /* { dg-error "invalid combination" }= */ +#pragma acc routine (g2) worker gang /* { dg-error "invalid combination" }= */ +#pragma acc routine (g3) gang vector /* { dg-error "invalid combination" }= */ +#pragma acc routine (g4) vector gang /* { dg-error "invalid combination" }= */ +#pragma acc routine (g5) gang seq /* { dg-error "invalid combination" } */ +#pragma acc routine (g6) seq gang /* { dg-error "invalid combination" } */ +#pragma acc routine (g7) worker vector /* { dg-error "invalid combination"= } */ +#pragma acc routine (g8) vector worker /* { dg-error "invalid combination"= } */ +#pragma acc routine (g9) worker seq /* { dg-error "invalid combination" } = */ +#pragma acc routine (g10) seq worker /* { dg-error "invalid combination" }= */ +#pragma acc routine (g11) vector seq /* { dg-error "invalid combination" }= */ +#pragma acc routine (g12) seq vector /* { dg-error "invalid combination" }= */ diff --git gcc/testsuite/c-c++-common/goacc/tile.c gcc/testsuite/c-c++-comm= on/goacc/tile.c new file mode 100644 index 0000000..e127955 --- /dev/null +++ gcc/testsuite/c-c++-common/goacc/tile.c @@ -0,0 +1,26 @@ +int +main () +{ + int i; + +#pragma acc parallel loop tile (10) + for (i =3D 0; i < 100; i++) + ; + +#pragma acc parallel loop tile (*) + for (i =3D 0; i < 100; i++) + ; + +#pragma acc parallel loop tile (10, *) + for (i =3D 0; i < 100; i++) + ; + +#pragma acc parallel loop tile (10, *, i) /* { dg-error "positive constant= integer expression" } */ + for (i =3D 0; i < 100; i++) + ; + + return 0; +} +/* { dg-bogus "sorry, unimplemented: Clause not supported yet" "tile" { xf= ail *-*-* } 6 } */ +/* { dg-bogus "sorry, unimplemented: Clause not supported yet" "tile" { xf= ail *-*-* } 10 } */ +/* { dg-bogus "sorry, unimplemented: Clause not supported yet" "tile" { xf= ail *-*-* } 14 } */ diff --git gcc/testsuite/g++.dg/goacc/template-reduction.C gcc/testsuite/g+= +.dg/goacc/template-reduction.C new file mode 100644 index 0000000..3618c02 --- /dev/null +++ gcc/testsuite/g++.dg/goacc/template-reduction.C @@ -0,0 +1,100 @@ +extern void abort (); + +const int n =3D 100; + +// Check explicit template copy map + +template T +sum (T array[]) +{ + T s =3D 0; + +#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy (s, arr= ay[0:n]) + for (int i =3D 0; i < n; i++) + s +=3D array[i]; + + return s; +} + +// Check implicit template copy map + +template T +sum () +{ + T s =3D 0; + T array[n]; + + for (int i =3D 0; i < n; i++) + array[i] =3D i+1; + +#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy (s) + for (int i =3D 0; i < n; i++) + s +=3D array[i]; + + return s; +} + +// Check present and async + +template T +async_sum (T array[]) +{ + T s =3D 0; + +#pragma acc parallel loop num_gangs (10) gang async (1) present (array[0:n= ]) + for (int i =3D 0; i < n; i++) + array[i] =3D i+1; + +#pragma acc parallel loop num_gangs (10) gang reduction (+:s) present (arr= ay[0:n]) copy (s) async wait (1) + for (int i =3D 0; i < n; i++) + s +=3D array[i]; + +#pragma acc wait + + return s; +} + +// Check present and async + +template T +async_sum (int c) +{ + T s =3D 0; + +#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy(s) asyn= c wait (1) + for (int i =3D 0; i < n; i++) + s +=3D i; + +#pragma acc wait + + return s; +} + +int +main() +{ + int a[n]; + int result =3D 0; + + for (int i =3D 0; i < n; i++) + { + a[i] =3D i+1; + result +=3D i+1; + } + + if (sum (a) !=3D result) + abort (); + + if (sum () !=3D result) + abort (); + +#pragma acc enter data copyin (a) + if (async_sum (a) !=3D result) + abort (); + + if (async_sum (1) !=3D result) + abort (); +#pragma acc exit data delete (a) + + return 0; +} diff --git gcc/testsuite/g++.dg/goacc/template.C gcc/testsuite/g++.dg/goacc= /template.C new file mode 100644 index 0000000..497c004 --- /dev/null +++ gcc/testsuite/g++.dg/goacc/template.C @@ -0,0 +1,131 @@ +#include + +#pragma acc routine +template T +accDouble(int val) +{ + return val * 2; +} + +template T +oacc_parallel_copy (T a) +{ + T b =3D 0; + char w =3D 1; + int x =3D 2; + float y =3D 3; + double z =3D 4; + +#pragma acc parallel num_gangs (a) num_workers (a) vector_length (a) defau= lt (none) copyout (b) copyin (a) + { + b =3D a; + } + +#pragma acc parallel num_gangs (a) copy (w, x, y, z) + { + w =3D accDouble(w); + x =3D accDouble(x); + y =3D accDouble(y); + z =3D accDouble(z); + } + +#pragma acc parallel num_gangs (a) if (1) + { +#pragma acc loop independent collapse (2) device_type (nvidia) gang + for (int i =3D 0; i < a; i++) + for (int j =3D 0; j < 5; j++) + b =3D a; + } + + T c; + +#pragma acc parallel num_workers (10) + { +#pragma acc atomic capture + c =3D b++; + +#pragma atomic update + c++; + +#pragma acc atomic read + b =3D a; + +#pragma acc atomic write + b =3D a; + } + +#pragma acc parallel reduction (+:c) + { + c =3D 1; + } + +#pragma acc data if (1) copy (b) + { + #pragma acc parallel + { + b =3D a; + } + } + +#pragma acc enter data copyin (b) +#pragma acc parallel present (b) + { + b =3D a; + } + +#pragma acc update host (b) +#pragma acc update self (b) +#pragma acc update device (b) +#pragma acc exit data delete (b) + + return b; +} + +template T +oacc_kernels_copy (T a) +{ + T b =3D 0; + T c =3D 0; + char w =3D 1; + int x =3D 2; + float y =3D 3; + double z =3D 4; + +#pragma acc kernels copy (w, x, y, z) + { + w =3D accDouble(w); + x =3D accDouble(x); + y =3D accDouble(y); + z =3D accDouble(z); + } + +#pragma acc kernels copyout (b) copyin (a) + b =3D a; + +#pragma acc data if (1) copy (b) + { + #pragma acc kernels + { + b =3D a; + } + } + +#pragma acc enter data copyin (b) +#pragma acc kernels present (b) + { + b =3D a; + } + return b; +} + +int +main () +{ + int b =3D oacc_parallel_copy (5); + int c =3D oacc_kernels_copy (5); + + printf ("b =3D %d\n", b); + printf ("c =3D %d\n", c); + + return 0; +} diff --git gcc/testsuite/gfortran.dg/goacc/cache-1.f95 gcc/testsuite/gfortr= an.dg/goacc/cache-1.f95 index 746cf02..74ab332 100644 --- gcc/testsuite/gfortran.dg/goacc/cache-1.f95 +++ gcc/testsuite/gfortran.dg/goacc/cache-1.f95 @@ -9,4 +9,3 @@ program test !$acc cache (d) enddo end -! { dg-prune-output "unimplemented" } diff --git gcc/testsuite/gfortran.dg/goacc/coarray.f95 gcc/testsuite/gfortr= an.dg/goacc/coarray.f95 index 4f1224e..08e4004 100644 --- gcc/testsuite/gfortran.dg/goacc/coarray.f95 +++ gcc/testsuite/gfortran.dg/goacc/coarray.f95 @@ -32,4 +32,4 @@ contains !$acc update self (a) end subroutine oacc1 end module test -! { dg-prune-output "ACC cache unimplemented" } +! { dg-bogus "sorry, unimplemented: directive not yet implemented" "host_d= ata" { xfail *-*-* } 19 } diff --git gcc/testsuite/gfortran.dg/goacc/coarray_2.f90 gcc/testsuite/gfor= tran.dg/goacc/coarray_2.f90 index f35d4b9..06a2bed 100644 --- gcc/testsuite/gfortran.dg/goacc/coarray_2.f90 +++ gcc/testsuite/gfortran.dg/goacc/coarray_2.f90 @@ -2,6 +2,7 @@ ! { dg-additional-options "-fcoarray=3Dlib" } ! ! PR fortran/63861 +! { dg-xfail-if "" { *-*-* } } */ =20 module test contains diff --git gcc/testsuite/gfortran.dg/goacc/combined_loop.f90 gcc/testsuite/= gfortran.dg/goacc/combined_loop.f90 index b8be649..58aaa4f 100644 --- gcc/testsuite/gfortran.dg/goacc/combined_loop.f90 +++ gcc/testsuite/gfortran.dg/goacc/combined_loop.f90 @@ -6,7 +6,7 @@ subroutine oacc1() implicit none integer :: i integer :: a - !$acc parallel loop reduction(+:a) ! { dg-excess-errors "sorry, unimplem= ented: directive not yet implemented" } + !$acc parallel loop reduction(+:a) do i =3D 1,5 enddo end subroutine oacc1 diff --git gcc/testsuite/gfortran.dg/goacc/cray.f95 gcc/testsuite/gfortran.= dg/goacc/cray.f95 index 8f2c077..28294ee 100644 --- gcc/testsuite/gfortran.dg/goacc/cray.f95 +++ gcc/testsuite/gfortran.dg/goacc/cray.f95 @@ -53,4 +53,3 @@ contains !$acc update self (ptr) end subroutine oacc1 end module test -! { dg-prune-output "unimplemented" } diff --git gcc/testsuite/gfortran.dg/goacc/declare-1.f95 gcc/testsuite/gfor= tran.dg/goacc/declare-1.f95 index 03540f1..14190a7 100644 --- gcc/testsuite/gfortran.dg/goacc/declare-1.f95 +++ gcc/testsuite/gfortran.dg/goacc/declare-1.f95 @@ -15,6 +15,5 @@ contains END BLOCK end function foo end program test -! { dg-prune-output "unimplemented" } -! { dg-final { scan-tree-dump-times "pragma acc declare map\\(force_tofrom= :i\\)" 2 "original" } }=20 +! { dg-final { scan-tree-dump-times "pragma acc data map\\(force_tofrom:i\= \)" 2 "original" } } ! { dg-final { cleanup-tree-dump "original" } }=20 diff --git gcc/testsuite/gfortran.dg/goacc/declare-2.f95 gcc/testsuite/gfor= tran.dg/goacc/declare-2.f95 new file mode 100644 index 0000000..afdbe2e --- /dev/null +++ gcc/testsuite/gfortran.dg/goacc/declare-2.f95 @@ -0,0 +1,44 @@ + +module amod + +contains + +subroutine asubr (b) + implicit none + integer :: b(8) + + !$acc declare copy (b) ! { dg-error "Invalid clause in module" } + !$acc declare copyout (b) ! { dg-error "Invalid clause in module" } + !$acc declare present (b) ! { dg-error "Invalid clause in module" } + !$acc declare present_or_copy (b) ! { dg-error "Invalid clause in module= " } + !$acc declare present_or_copyin (b) ! { dg-error "Invalid clause in modu= le" } + !$acc declare present_or_copyout (b) ! { dg-error "Invalid clause in mod= ule" } + !$acc declare present_or_create (b) ! { dg-error "Invalid clause in modu= le" } + !$acc declare deviceptr (b) ! { dg-error "Invalid clause in module" } + !$acc declare create (b) copyin (b) ! { dg-error "present on multiple cl= auses" } + +end subroutine + +end module + +subroutine bsubr (foo) + implicit none + + integer, dimension (:) :: foo + + !$acc declare copy (foo) ! { dg-error "assumed-size dummy array" } + !$acc declare copy (foo(1:2)) ! { dg-error "assumed-size dummy array" } + +end subroutine + +program test + integer :: a(8) + integer :: b(8) + integer :: c(8) + + !$acc declare create (a) copyin (a) ! { dg-error "present on multiple cl= auses" } + !$acc declare copyin (b) + !$acc declare copyin (b) ! { dg-error "present on multiple clauses" } + !$acc declare copy (c(1:2)) ! { dg-error "Subarray: 'c' not allowed" } + +end program diff --git gcc/testsuite/gfortran.dg/goacc/default.f95 gcc/testsuite/gfortr= an.dg/goacc/default.f95 new file mode 100644 index 0000000..c1fc52e --- /dev/null +++ gcc/testsuite/gfortran.dg/goacc/default.f95 @@ -0,0 +1,17 @@ +! { dg-do compile } + +program tile + integer i, j, a + + !$acc parallel default (shared) ! { dg-error "Unclassifiable OpenACC dir= ective" } + !$acc end parallel ! { dg-error "Unexpected" } + + !$acc parallel default (private) ! { dg-error "Unclassifiable OpenACC di= rective" } + !$acc end parallel ! { dg-error "Unexpected" } + + !$acc parallel default (none) + !$acc end parallel + + !$acc parallel default (firstprivate) ! { dg-error "Unclassifiable OpenA= CC directive" } + !$acc end parallel ! { dg-error "Unexpected" } +end program tile diff --git gcc/testsuite/gfortran.dg/goacc/dtype-1.f95 gcc/testsuite/gfortr= an.dg/goacc/dtype-1.f95 new file mode 100644 index 0000000..350e443 --- /dev/null +++ gcc/testsuite/gfortran.dg/goacc/dtype-1.f95 @@ -0,0 +1,161 @@ +! { dg-do compile } +! { dg-options "-fopenacc -fdump-tree-omplower" } + +program dtype + integer i1 + +!! ACC PARALLEL DEVICE_TYPE: + +!$acc parallel dtype (nVidia) async (1) num_gangs (100) & +!$acc& num_workers (100) vector_length (32) wait (1) +!$acc end parallel + +!$acc parallel async (1) num_gangs (1) num_workers (1) vector_length (1) & +!$acc& wait (1) device_type (nvidia) async (2) num_gangs (200) & +!$acc& num_workers (200) vector_length (64) wait (2) +!$acc end parallel + +!$acc parallel async (1) num_gangs (1) num_workers (1) vector_length (1) & +!$acc& wait (1) device_type (nvidia) async (3) num_gangs (300) & +!$acc& num_workers (300) vector_length (128) wait (3) dtype (*) & +!$acc& async (10) num_gangs (10) num_workers (10) vector_length (10) wait = (10) +!$acc end parallel + +!$acc parallel async (1) num_gangs (1) num_workers (1) vector_length (1) & +!$acc& wait (1) dtype (nvidia_ptx) async (3) num_gangs (300) & +!$acc& num_workers (300) vector_length (128) wait (3) device_type (*) & +!$acc& async (10) num_gangs (10) num_workers (10) vector_length (10) wait = (10) +!$acc end parallel + +!! ACC KERNELS DEVICE_TYPE: + +!$acc kernels device_type (nvidia) async wait +!$acc end kernels + +!$acc kernels async wait dtype (nvidia) async (1) wait (1) +!$acc end kernels + +!$acc kernels async wait dtype (nvidia) async (2) wait (2) & +!$acc& device_type (*) async (0) wait (0) +!$acc end kernels + +!$acc kernels async wait device_type (nvidia_ptx) async (1) wait (1) & +!$acc& dtype (*) async (0) wait (0) +!$acc end kernels + +!! ACC LOOP DEVICE_TYPE: + +!$acc parallel +!$acc loop device_type (nVidia) gang + do i1 =3D 1, 10 + end do +!$acc end parallel + +!$acc parallel +!$acc loop dtype (nVidia) gang dtype (*) worker + do i1 =3D 1, 10 + end do +!$acc end parallel + +!$acc parallel +!$acc loop dtype (nVidiaGPU) gang dtype (*) vector + do i1 =3D 1, 10 + end do +!$acc end parallel + +!! ACC UPDATE: + +!$acc update host(i1) async(1) wait (1) + +!$acc update host(i1) device_type(nvidia) async(2) wait (2) + +!$acc update host(i1) async(1) wait (1) dtype(nvidia) async(3) wait (3) + +!$acc update host(i1) async(4) wait (4) device_type(nvidia) async(5) wait = (5) & +!$acc& dtype (*) async (6) wait (6) + +!$acc update host(i1) async(4) wait (4) dtype(nvidia1) async(5) & +!$acc& wait (5) device_type (*) async (6) wait (6) +end program dtype + +!! ACC ROUTINE: + +subroutine sr1 () + !$acc routine device_type (nvidia) gang +end subroutine sr1 + +subroutine sr2 () + !$acc routine dtype (nvidia) worker +end subroutine sr2 + +subroutine sr3 () + !$acc routine device_type (nvidia) vector +end subroutine sr3 + +subroutine sr5 () + !$acc routine dtype (nvidia) bind (foo) +end subroutine sr5 + +subroutine sr1a () + !$acc routine device_type (nvidia) gang device_type (*) worker +end subroutine sr1a + +subroutine sr2a () + !$acc routine dtype (nvidia) worker dtype (*) vector +end subroutine sr2a + +subroutine sr3a () + !$acc routine dtype (nvidia) vector device_type (*) gang +end subroutine sr3a + +subroutine sr4a () + !$acc routine device_type (nvidia) vector device_type (*) worker +end subroutine sr4a + +subroutine sr5a () + !$acc routine device_type (nvidia) bind (foo) dtype (*) gang +end subroutine sr5a + +subroutine sr1b () + !$acc routine dtype (gpu) gang dtype (*) worker +end subroutine sr1b + +subroutine sr2b () + !$acc routine dtype (gpu) worker device_type (*) worker +end subroutine sr2b + +subroutine sr3b () + !$acc routine device_type (gpu) vector device_type (*) worker +end subroutine sr3b + +subroutine sr4b () + !$acc routine device_type (gpu) worker device_type (*) worker +end subroutine sr4b + +subroutine sr5b () + !$acc routine dtype (gpu) bind (foo) device_type (*) gang +end subroutine sr5b + +! { dg-final { scan-tree-dump-times "oacc_parallel async\\(1\\) wait\\(1\\= ) num_gangs\\(100\\) num_workers\\(100\\) vector_length\\(32\\)" 1 "omplowe= r" } } + +! { dg-final { scan-tree-dump-times "oacc_parallel async\\(2\\) wait\\(2\\= ) num_gangs\\(200\\) num_workers\\(200\\) vector_length\\(64\\)" 1 "omplowe= r" } } + +! { dg-final { scan-tree-dump-times "oacc_parallel async\\(3\\) wait\\(3\\= ) num_gangs\\(300\\) num_workers\\(300\\) vector_length\\(128\\)" 1 "omplow= er" } } + +! { dg-final { scan-tree-dump-times "oacc_parallel async\\(10\\) wait\\(10= \\) num_gangs\\(10\\) num_workers\\(10\\) vector_length\\(10\\)" 1 "omplowe= r" } } + +! { dg-final { scan-tree-dump-times "oacc_kernels async\\(-1\\)" 1 "omplow= er" } } + +! { dg-final { scan-tree-dump-times "oacc_kernels async\\(1\\) wait\\(1\\)= " 1 "omplower" } } + +! { dg-final { scan-tree-dump-times "oacc_kernels async\\(2\\) wait\\(2\\)= " 1 "omplower" } } + +! { dg-final { scan-tree-dump-times "oacc_kernels async\\(0\\) wait\\(0\\)= " 1 "omplower" } } + +! { dg-final { scan-tree-dump-times "acc loop private\\(i1\\) gang private= \\(i1\\.1\\)" 1 "omplower" } } + +! { dg-final { scan-tree-dump-times "acc loop private\\(i1\\) gang private= \\(i1\\.2\\)" 1 "omplower" } } + +! { dg-final { scan-tree-dump-times "acc loop private\\(i1\\) vector priva= te\\(i1\\.3\\)" 1 "omplower" } } + +! { dg-final { cleanup-tree-dump "omplower" } } diff --git gcc/testsuite/gfortran.dg/goacc/dtype-2.f95 gcc/testsuite/gfortr= an.dg/goacc/dtype-2.f95 new file mode 100644 index 0000000..a4573e9 --- /dev/null +++ gcc/testsuite/gfortran.dg/goacc/dtype-2.f95 @@ -0,0 +1,39 @@ +! { dg-do compile } + +program dtype + integer i1, i2, i3, i4, i5, i6 + +!! ACC PARALLEL DEVICE_TYPE: + +!$acc parallel device_type (nVidia) async (1) num_gangs (100) & +!$acc& num_workers (100) vector_length (32) wait (1) copy (i1) +!$acc end parallel + +!! ACC KERNELS DEVICE_TYPE: + +!$acc kernels dtype (nvidia) async wait copy (i1) +!$acc end kernels + +!! ACC LOOP DEVICE_TYPE: + +!$acc parallel +!$acc loop dtype (nVidia) gang tile (1) private (i1) + do i1 =3D 1, 10 + end do +!$acc end parallel + +!! ACC UPDATE: + +!$acc update host(i1) device_type(nvidia) async(2) wait (2) self(i2) + +end program dtype + +! { dg-error "Invalid character" "" { target *-*-* } 8 } +! { dg-error "Unexpected" "" { target *-*-* } 10 } + +! { dg-error "Invalid character" "" { target *-*-* } 14 } +! { dg-error "Unexpected" "" { target *-*-* } 15 } + +! { dg-error "Invalid character" "" { target *-*-* } 20 } + +! { dg-error "Invalid character" "" { target *-*-* } 27 } diff --git gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95 gcc/testsuite= /gfortran.dg/goacc/host_data-tree.f95 index 19e7411..8a25829 100644 --- gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95 +++ gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95 @@ -8,6 +8,6 @@ program test !$acc host_data use_device(i) !$acc end host_data end program test -! { dg-prune-output "unimplemented" } +! { dg-bogus "sorry, unimplemented: directive not yet implemented" "host_d= ata" { xfail *-*-* } 8 } ! { dg-final { scan-tree-dump-times "pragma acc host_data use_device\\(i\\= )" 1 "original" } }=20 ! { dg-final { cleanup-tree-dump "original" } }=20 diff --git gcc/testsuite/gfortran.dg/goacc/loop-1.f95 gcc/testsuite/gfortra= n.dg/goacc/loop-1.f95 index e1b2dfd..817039f 100644 --- gcc/testsuite/gfortran.dg/goacc/loop-1.f95 +++ gcc/testsuite/gfortran.dg/goacc/loop-1.f95 @@ -168,4 +168,3 @@ subroutine test1 end subroutine test1 end module test ! { dg-prune-output "Deleted" } -! { dg-prune-output "ACC cache unimplemented" } diff --git gcc/testsuite/gfortran.dg/goacc/loop-2.f95 gcc/testsuite/gfortra= n.dg/goacc/loop-2.f95 index f85691e..b5e6368 100644 --- gcc/testsuite/gfortran.dg/goacc/loop-2.f95 +++ gcc/testsuite/gfortran.dg/goacc/loop-2.f95 @@ -66,7 +66,7 @@ program test !$acc loop seq worker ! { dg-error "conflicts with" } DO i =3D 1,10 ENDDO - !$acc loop gang worker ! { dg-error "conflicts with" } + !$acc loop gang worker DO i =3D 1,10 ENDDO =20 @@ -94,10 +94,10 @@ program test !$acc loop seq vector ! { dg-error "conflicts with" } DO i =3D 1,10 ENDDO - !$acc loop gang vector ! { dg-error "conflicts with" } + !$acc loop gang vector DO i =3D 1,10 ENDDO - !$acc loop worker vector ! { dg-error "conflicts with" } + !$acc loop worker vector DO i =3D 1,10 ENDDO =20 @@ -239,7 +239,7 @@ program test !$acc loop seq worker ! { dg-error "conflicts with" } DO i =3D 1,10 ENDDO - !$acc loop gang worker ! { dg-error "conflicts with" } + !$acc loop gang worker DO i =3D 1,10 ENDDO =20 @@ -267,10 +267,10 @@ program test !$acc loop seq vector ! { dg-error "conflicts with" } DO i =3D 1,10 ENDDO - !$acc loop gang vector ! { dg-error "conflicts with" } + !$acc loop gang vector DO i =3D 1,10 ENDDO - !$acc loop worker vector ! { dg-error "conflicts with" } + !$acc loop worker vector DO i =3D 1,10 ENDDO =20 @@ -392,7 +392,7 @@ program test !$acc kernels loop seq worker ! { dg-error "conflicts with" } DO i =3D 1,10 ENDDO - !$acc kernels loop gang worker ! { dg-error "conflicts with" } + !$acc kernels loop gang worker DO i =3D 1,10 ENDDO =20 @@ -420,10 +420,10 @@ program test !$acc kernels loop seq vector ! { dg-error "conflicts with" } DO i =3D 1,10 ENDDO - !$acc kernels loop gang vector ! { dg-error "conflicts with" } + !$acc kernels loop gang vector DO i =3D 1,10 ENDDO - !$acc kernels loop worker vector ! { dg-error "conflicts with" } + !$acc kernels loop worker vector DO i =3D 1,10 ENDDO =20 @@ -544,7 +544,7 @@ program test !$acc parallel loop seq worker ! { dg-error "conflicts with" } DO i =3D 1,10 ENDDO - !$acc parallel loop gang worker ! { dg-error "conflicts with" } + !$acc parallel loop gang worker DO i =3D 1,10 ENDDO =20 @@ -572,10 +572,10 @@ program test !$acc parallel loop seq vector ! { dg-error "conflicts with" } DO i =3D 1,10 ENDDO - !$acc parallel loop gang vector ! { dg-error "conflicts with" } + !$acc parallel loop gang vector DO i =3D 1,10 ENDDO - !$acc parallel loop worker vector ! { dg-error "conflicts with" } + !$acc parallel loop worker vector DO i =3D 1,10 ENDDO =20 @@ -646,4 +646,4 @@ program test !$acc parallel loop gang worker tile(*)=20 DO i =3D 1,10 ENDDO -end \ No newline at end of file +end diff --git gcc/testsuite/gfortran.dg/goacc/modules.f95 gcc/testsuite/gfortr= an.dg/goacc/modules.f95 new file mode 100644 index 0000000..19a2abe --- /dev/null +++ gcc/testsuite/gfortran.dg/goacc/modules.f95 @@ -0,0 +1,55 @@ +! { dg-do compile }=20 + +MODULE reduction_test + +CONTAINS + +SUBROUTINE reduction_kernel(x_min,x_max,y_min,y_max,arr,sum) + + IMPLICIT NONE + + INTEGER :: x_min,x_max,y_min,y_max + REAL(KIND=3D8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+2) :: arr + REAL(KIND=3D8) :: sum + + INTEGER :: j,k + + sum=3D0.0 + +!$ACC DATA PRESENT(arr) COPY(sum) +!$ACC PARALLEL LOOP REDUCTION(+ : sum) + DO k=3Dy_min,y_max + DO j=3Dx_min,x_max + sum=3Dsum*arr(j,k) + ENDDO + ENDDO +!$ACC END PARALLEL LOOP +!$ACC END DATA + +END SUBROUTINE reduction_kernel + +END MODULE reduction_test + +program main + use reduction_test + + integer :: x_min,x_max,y_min,y_max + real(kind=3D8), dimension(1:10,1:10) :: arr + real(kind=3D8) :: sum + + x_min =3D 5 + x_max =3D 6 + y_min =3D 5 + y_max =3D 6 + + arr(:,:) =3D 1.0 + + sum =3D 1.0 + + !$acc data copy(arr) + + call field_summary_kernel(x_min,x_max,y_min,y_max,arr,sum) + + !$acc end data + +end program diff --git gcc/testsuite/gfortran.dg/goacc/parameter.f95 gcc/testsuite/gfor= tran.dg/goacc/parameter.f95 index 1364181..82c25ba 100644 --- gcc/testsuite/gfortran.dg/goacc/parameter.f95 +++ gcc/testsuite/gfortran.dg/goacc/parameter.f95 @@ -29,4 +29,3 @@ contains !$acc update self (a) ! { dg-error "not a variable" } end subroutine oacc1 end module test -! { dg-prune-output "unimplemented" } diff --git gcc/testsuite/gfortran.dg/goacc/update.f95 gcc/testsuite/gfortra= n.dg/goacc/update.f95 new file mode 100644 index 0000000..ae23dfc --- /dev/null +++ gcc/testsuite/gfortran.dg/goacc/update.f95 @@ -0,0 +1,5 @@ +! { dg-do compile }=20 + +program foo + !$acc update ! { dg-error "must contain at least one 'device' or 'host/s= elf' clause" } +end program foo diff --git libgomp/testsuite/libgomp.oacc-c++/template-reduction.C libgomp/= testsuite/libgomp.oacc-c++/template-reduction.C new file mode 100644 index 0000000..c158b7a --- /dev/null +++ libgomp/testsuite/libgomp.oacc-c++/template-reduction.C @@ -0,0 +1,102 @@ +/* { dg-do run } */ + +#include + +const int n =3D 100; + +// Check explicit template copy map + +template T +sum (T array[]) +{ + T s =3D 0; + +#pragma acc parallel loop vector_length (10) reduction (+:s) copy (s, arra= y[0:n]) + for (int i =3D 0; i < n; i++) + s +=3D array[i]; + + return s; +} + +// Check implicit template copy map + +template T +sum () +{ + T s =3D 0; + T array[n]; + + for (int i =3D 0; i < n; i++) + array[i] =3D i+1; + +#pragma acc parallel loop vector_length (10) reduction (+:s) copy (s) + for (int i =3D 0; i < n; i++) + s +=3D array[i]; + + return s; +} + +// Check present and async + +template T +async_sum (T array[]) +{ + T s =3D 0; + +#pragma acc parallel loop vector_length (10) async (1) present (array[0:n]) + for (int i =3D 0; i < n; i++) + array[i] =3D i+1; + +#pragma acc parallel loop vector_length (10) reduction (+:s) present (arra= y[0:n]) copy (s) async wait (1) + for (int i =3D 0; i < n; i++) + s +=3D array[i]; + +#pragma acc wait + + return s; +} + +// Check present and async + +template T +async_sum (int c) +{ + T s =3D 0; + +#pragma acc parallel loop vector_length (10) reduction (+:s) copy(s) async= wait (1) + for (int i =3D 0; i < n; i++) + s +=3D i+1; + +#pragma acc wait + + return s; +} + +int +main() +{ + int a[n]; + int result =3D 0; + + for (int i =3D 0; i < n; i++) + { + a[i] =3D i+1; + result +=3D i+1; + } + + if (sum (a) !=3D result) + abort (); + + if (sum () !=3D result) + abort (); + +#pragma acc enter data copyin (a) + if (async_sum (a) !=3D result) + abort (); + + if (async_sum (1) !=3D result) + abort (); +#pragma acc exit data delete (a) + + return 0; +} diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c l= ibgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c new file mode 100644 index 0000000..ad958cd --- /dev/null +++ libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c @@ -0,0 +1,866 @@ +/* { dg-do run } */ + +#include + +int +main(int argc, char **argv) +{ + int iexp, igot; + long long lexp, lgot; + int N =3D 32; + int idata[N]; + long long ldata[N]; + float fexp, fgot; + float fdata[N]; + int i; + + igot =3D 0; + iexp =3D 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { +#pragma acc atomic capture + idata[i] =3D igot++; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 32; + iexp =3D 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { +#pragma acc atomic capture + idata[i] =3D igot--; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 0; + iexp =3D 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { +#pragma acc atomic capture + idata[i] =3D ++igot; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 32; + iexp =3D 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { +#pragma acc atomic capture + idata[i] =3D --igot; + } + } + + /* BINOP =3D + */ + igot =3D 0; + iexp =3D 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + idata[i] =3D igot +=3D expr; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 0; + iexp =3D 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + idata[i] =3D igot =3D igot + expr; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 0; + iexp =3D 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + idata[i] =3D igot =3D expr + igot; + } + } + + if (iexp !=3D igot) + abort (); + + /* BINOP =3D * */ + lgot =3D 1LL; + lexp =3D 1LL << N; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 2LL; + +#pragma acc atomic capture + ldata[i] =3D lgot *=3D expr; + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 1LL << N; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 2LL; + +#pragma acc atomic capture + ldata[i] =3D lgot =3D lgot * expr; + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 1LL << N; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 2LL; + +#pragma acc atomic capture + ldata[i] =3D lgot =3D expr * lgot; + } + } + + if (lexp !=3D lgot) + abort (); + + /* BINOP =3D - */ + igot =3D 32; + iexp =3D 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + idata[i] =3D igot -=3D expr; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 32; + iexp =3D 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + idata[i] =3D igot =3D igot - expr; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 32; + iexp =3D 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + idata[i] =3D igot =3D expr - igot; + } + } + + if (iexp !=3D igot) + abort (); + + + /* BINOP =3D / */ + lgot =3D 1LL << 32; + lexp =3D 1LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 2LL; + +#pragma acc atomic capture + ldata[i] =3D lgot /=3D expr; + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL << 32; + lexp =3D 1LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 2LL; + +#pragma acc atomic capture + ldata[i] =3D lgot =3D lgot / expr; + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 2LL; + lexp =3D 2LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1LL << N; + +#pragma acc atomic capture + ldata[i] =3D lgot =3D expr / lgot; + } + } + + if (lexp !=3D lgot) + abort (); + + /* BINOP =3D & */ + igot =3D ~0; + iexp =3D 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1 << i; + +#pragma acc atomic capture + idata[i] =3D igot &=3D expr; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D ~0; + iexp =3D 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1 << i; + +#pragma acc atomic capture + idata[i] =3D igot =3D igot & expr; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D ~0; + iexp =3D 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1 << i; + +#pragma acc atomic capture + idata[i] =3D igot =3D expr & igot; + } + } + + if (iexp !=3D igot) + abort (); + + /* BINOP =3D ^ */ + igot =3D ~0; + iexp =3D 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1 << i; + +#pragma acc atomic capture + idata[i] =3D igot ^=3D expr; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D ~0; + iexp =3D 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1 << i; + +#pragma acc atomic capture + idata[i] =3D igot =3D igot ^ expr; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D ~0; + iexp =3D 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1 << i; + +#pragma acc atomic capture + idata[i] =3D igot =3D expr ^ igot; + } + } + + if (iexp !=3D igot) + abort (); + + /* BINOP =3D | */ + igot =3D 0; + iexp =3D ~0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1 << i; + +#pragma acc atomic capture + idata[i] =3D igot |=3D expr; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 0; + iexp =3D ~0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1 << i; + +#pragma acc atomic capture + idata[i] =3D igot =3D igot | expr; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 0; + iexp =3D ~0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1 << i; + +#pragma acc atomic capture + idata[i] =3D igot =3D expr | igot; + } + } + + if (iexp !=3D igot) + abort (); + + /* BINOP =3D << */ + lgot =3D 1LL; + lexp =3D 1LL << N; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + ldata[i] =3D lgot <<=3D expr; + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 1LL << N; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + idata[i] =3D lgot =3D lgot << expr; + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 2LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel + { + long long expr =3D 1LL; + +#pragma acc atomic capture + ldata[0] =3D lgot =3D expr << lgot; + } + } + + if (lexp !=3D lgot) + abort (); + + /* BINOP =3D >> */ + lgot =3D 1LL << N; + lexp =3D 1LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1LL; + +#pragma acc atomic capture + ldata[i] =3D lgot >>=3D expr; + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL << N; + lexp =3D 1LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + ldata[i] =3D lgot =3D lgot >> expr; + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL << 63; + lexp =3D 1LL << 32; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel + { + long long expr =3D 1LL << 32; + +#pragma acc atomic capture + ldata[0] =3D lgot =3D expr >> lgot; + } + } + + if (lexp !=3D lgot) + abort (); + + fgot =3D 0.0; + fexp =3D 32.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { +#pragma acc atomic capture + fdata[i] =3D fgot++; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 32.0; + fexp =3D 0.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { +#pragma acc atomic capture + fdata[i] =3D fgot--; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 0.0; + fexp =3D 32.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { +#pragma acc atomic capture + fdata[i] =3D ++fgot; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 32.0; + fexp =3D 0.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { +#pragma acc atomic capture + fdata[i] =3D --fgot; + } + } + + if (fexp !=3D fgot) + abort (); + + /* BINOP =3D + */ + fgot =3D 0.0; + fexp =3D 32.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; + +#pragma acc atomic capture + fdata[i] =3D fgot +=3D expr; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 0.0; + fexp =3D 32.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; + +#pragma acc atomic capture + fdata[i] =3D fgot =3D fgot + expr; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 0.0; + fexp =3D 32.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; + +#pragma acc atomic capture + fdata[i] =3D fgot =3D expr + fgot; + } + } + + if (fexp !=3D fgot) + abort (); + + /* BINOP =3D * */ + fgot =3D 1.0; + fexp =3D 8192.0*8192.0*64.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; + +#pragma acc atomic capture + fdata[i] =3D fgot *=3D expr; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1.0; + fexp =3D 8192.0*8192.0*64.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 2LL; + +#pragma acc atomic capture + fdata[i] =3D fgot =3D fgot * expr; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1.0; + fexp =3D 8192.0*8192.0*64.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; + +#pragma acc atomic capture + fdata[i] =3D fgot =3D expr * fgot; + } + } + + if (fexp !=3D fgot) + abort (); + + /* BINOP =3D - */ + fgot =3D 32.0; + fexp =3D 0.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; + +#pragma acc atomic capture + fdata[i] =3D fgot -=3D expr; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 32.0; + fexp =3D 0.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; + +#pragma acc atomic capture + fdata[i] =3D fgot =3D fgot - expr; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1.0; + fexp =3D 0.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 32.0; + +#pragma acc atomic capture + fdata[i] =3D fgot =3D expr - fgot; + } + } + + for (i =3D 0; i < N; i++) + if (i % 2 =3D=3D 0) + { + if (fdata[i] !=3D 31.0) + abort (); + } + else + { + if (fdata[i] !=3D 1.0) + abort (); + } + + + /* BINOP =3D / */ + fexp =3D 1.0; + fgot =3D 8192.0*8192.0*64.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; + +#pragma acc atomic capture + fdata[i] =3D fgot /=3D expr; + } + } + + if (fexp !=3D fgot) + abort (); + + fexp =3D 1.0; + fgot =3D 8192.0*8192.0*64.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; + +#pragma acc atomic capture + fdata[i] =3D fgot =3D fgot / expr; + } + } + + if (fexp !=3D fgot) + abort (); + + fexp =3D 1.0; + fgot =3D 8192.0*8192.0*64.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel + { + float expr =3D 8192.0*8192.0*64.0; + +#pragma acc atomic capture + fdata[0] =3D fgot =3D expr / fgot; + } + } + + if (fexp !=3D fgot) + abort (); +=20=20 + return 0; +} diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-2.c l= ibgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-2.c new file mode 100644 index 0000000..842f2de --- /dev/null +++ libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-2.c @@ -0,0 +1,1626 @@ +/* { dg-do run } */ + +#include + +int +main(int argc, char **argv) +{ + int iexp, igot, imax, imin; + long long lexp, lgot; + int N =3D 32; + int i; + int idata[N]; + long long ldata[N]; + float fexp, fgot; + float fdata[N]; + + igot =3D 1234; + iexp =3D 31; + + for (i =3D 0; i < N; i++) + idata[i] =3D i; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) +#pragma acc atomic capture + { idata[i] =3D igot; igot =3D i; } + } + + imax =3D 0; + imin =3D N; + + for (i =3D 0; i < N; i++) + { + imax =3D idata[i] > imax ? idata[i] : imax; + imin =3D idata[i] < imin ? idata[i] : imin; + } + + if (imax !=3D 1234 || imin !=3D 0) + abort (); + + return 0; + + igot =3D 0; + iexp =3D 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) +#pragma acc atomic capture + { idata[i] =3D igot; igot++; } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 0; + iexp =3D 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) +#pragma acc atomic capture + { idata[i] =3D igot; ++igot; } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 0; + iexp =3D 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) +#pragma acc atomic capture + { ++igot; idata[i] =3D igot; } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 0; + iexp =3D 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) +#pragma acc atomic capture + { igot++; idata[i] =3D igot; } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 32; + iexp =3D 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) +#pragma acc atomic capture + { idata[i] =3D igot; igot--; } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 32; + iexp =3D 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) +#pragma acc atomic capture + { idata[i] =3D igot; --igot; } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 32; + iexp =3D 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) +#pragma acc atomic capture + { --igot; idata[i] =3D igot; } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 32; + iexp =3D 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) +#pragma acc atomic capture + { igot--; idata[i] =3D igot; } + } + + if (iexp !=3D igot) + abort (); + + /* BINOP =3D + */ + igot =3D 0; + iexp =3D 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + { idata[i] =3D igot; igot +=3D expr; } + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 0; + iexp =3D 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + { igot +=3D expr; idata[i] =3D igot; } + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 0; + iexp =3D 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + { idata[i] =3D igot; igot =3D igot + expr; } + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 0; + iexp =3D 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + { idata[i] =3D igot; igot =3D expr + igot; } + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 0; + iexp =3D 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + { igot =3D igot + expr; idata[i] =3D igot; } + } + } + + if (iexp !=3D igot) + abort (); + + + igot =3D 0; + iexp =3D 32; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + { igot =3D expr + igot; idata[i] =3D igot; } + } + } + + if (iexp !=3D igot) + abort (); + + /* BINOP =3D * */ + lgot =3D 1LL; + lexp =3D 1LL << 32; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 2LL; + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot *=3D expr; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 1LL << 32; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 2LL; + +#pragma acc atomic capture + { lgot *=3D expr; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 1LL << 32; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 2LL; + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot =3D lgot * expr; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 1LL << 32; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 2LL; + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot =3D expr * lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 1LL << 32; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 2LL; + +#pragma acc atomic capture + { lgot =3D lgot * expr; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 1LL << 32; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 2; + +#pragma acc atomic capture + { lgot =3D expr * lgot; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + /* BINOP =3D - */ + igot =3D 32; + iexp =3D 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + { idata[i] =3D igot; igot -=3D expr; } + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 32; + iexp =3D 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + { igot -=3D expr; idata[i] =3D igot; } + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 32; + iexp =3D 0; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + { idata[i] =3D igot; igot =3D igot - expr; } + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 1; + iexp =3D 1; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + { idata[i] =3D igot; igot =3D expr - igot; } + } + } + + for (i =3D 0; i < N; i++) + if (i % 2 =3D=3D 0) + { + if (idata[i] !=3D 1) + abort (); + } + else + { + if (idata[i] !=3D 0) + abort (); + } + + if (iexp !=3D igot) + abort (); + + igot =3D 1; + iexp =3D -31; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + { igot =3D igot - expr; idata[i] =3D igot; } + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 1; + iexp =3D 1; + +#pragma acc data copy (igot, idata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D 1; + +#pragma acc atomic capture + { igot =3D expr - igot; idata[i] =3D igot; } + } + } + + for (i =3D 0; i < N; i++) + if (i % 2 =3D=3D 0) + { + if (idata[i] !=3D 0) + abort (); + } + else + { + if (idata[i] !=3D 1) + abort (); + } + + if (iexp !=3D igot) + abort (); + + /* BINOP =3D / */ + lgot =3D 1LL << 32; + lexp =3D 1LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 2LL; + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot /=3D expr; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL << 32; + lexp =3D 1LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 2LL; + +#pragma acc atomic capture + { lgot /=3D expr; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL << 32; + lexp =3D 1LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 2LL; + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot =3D lgot / expr; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 2LL; + lexp =3D 2LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1LL << N; + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot =3D expr / lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 2LL; + lexp =3D 2LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1LL << N; + +#pragma acc atomic capture + { lgot =3D lgot / expr; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 2LL; + lexp =3D 2LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1LL << N; + +#pragma acc atomic capture + { lgot =3D expr / lgot; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + /* BINOP =3D & */ + lgot =3D ~0LL; + lexp =3D 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D ~(1 << i); + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot &=3D expr; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D ~0LL; + iexp =3D 0LL;=20 + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D ~(1 << i); + +#pragma acc atomic capture + { lgot &=3D expr; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D ~0LL; + lexp =3D 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D ~(1 << i); + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot =3D lgot & expr; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D ~0LL; + lexp =3D 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D ~(1 << i); + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot =3D expr & lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D ~0LL; + iexp =3D 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D ~(1 << i); + +#pragma acc atomic capture + { lgot =3D lgot & expr; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D ~0LL; + lexp =3D 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D ~(1 << i); + +#pragma acc atomic capture + { lgot =3D expr & lgot; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + /* BINOP =3D ^ */ + lgot =3D ~0LL; + lexp =3D 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1 << i; + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot ^=3D expr; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D ~0LL; + iexp =3D 0LL;=20 + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D ~(1 << i); + +#pragma acc atomic capture + { lgot ^=3D expr; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D ~0LL; + lexp =3D 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D ~(1 << i); + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot =3D lgot ^ expr; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D ~0LL; + lexp =3D 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D ~(1 << i); + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot =3D expr ^ lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D ~0LL; + iexp =3D 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D ~(1 << i); + +#pragma acc atomic capture + { lgot =3D lgot ^ expr; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D ~0LL; + lexp =3D 0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D ~(1 << i); + +#pragma acc atomic capture + { lgot =3D expr ^ lgot; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + /* BINOP =3D | */ + lgot =3D 0LL; + lexp =3D ~0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1 << i; + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot |=3D expr; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 0LL; + iexp =3D ~0LL;=20 + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D ~(1 << i); + +#pragma acc atomic capture + { lgot |=3D expr; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 0LL; + lexp =3D ~0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D ~(1 << i); + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot =3D lgot | expr; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 0LL; + lexp =3D ~0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D ~(1 << i); + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot =3D expr | lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 0LL; + iexp =3D ~0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D ~(1 << i); + +#pragma acc atomic capture + { lgot =3D lgot | expr; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 0LL; + lexp =3D ~0LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D ~(1 << i); + +#pragma acc atomic capture + { lgot =3D expr | lgot; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + /* BINOP =3D << */ + lgot =3D 1LL; + lexp =3D 1LL << N; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1LL; + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot <<=3D expr; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + iexp =3D 1LL << N;=20 + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1LL; + +#pragma acc atomic capture + { lgot <<=3D expr; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 1LL << N; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1LL; + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot =3D lgot << expr; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 2LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < 1; i++) + { + long long expr =3D 1LL; + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot =3D expr << lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 2LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < 1; i++) + { + long long expr =3D 1LL; + +#pragma acc atomic capture + { lgot =3D lgot << expr; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 2LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < 1; i++) + { + long long expr =3D 1LL; + +#pragma acc atomic capture + { lgot =3D expr << lgot; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + /* BINOP =3D >> */ + lgot =3D 1LL << N; + lexp =3D 1LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1LL; +=20=20 +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot >>=3D expr; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL << N; + iexp =3D 1LL;=20 + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1LL; + +#pragma acc atomic capture + { lgot >>=3D expr; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL << N; + lexp =3D 1LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1LL; + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot =3D lgot >> expr; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 1LL << (N - 1); + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < 1; i++) + { + long long expr =3D 1LL << N; + +#pragma acc atomic capture + { ldata[i] =3D lgot; lgot =3D expr >> lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL << N; + lexp =3D 1LL; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1LL; + +#pragma acc atomic capture + { lgot =3D lgot >> expr; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 1LL << (N - 1); + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < 1; i++) + { + long long expr =3D 1LL << N; + +#pragma acc atomic capture + { lgot =3D expr >> lgot; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + // FLOAT FLOAT FLOAT + + /* BINOP =3D + */ + fgot =3D 0.0; + fexp =3D 32.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; + +#pragma acc atomic capture + { fdata[i] =3D fgot; fgot +=3D expr; } + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 0.0; + fexp =3D 32.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; + +#pragma acc atomic capture + { fgot +=3D expr; fdata[i] =3D fgot; } + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 0.0; + fexp =3D 32.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; + +#pragma acc atomic capture + { idata[i] =3D fgot; fgot =3D fgot + expr; } + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 0.0; + fexp =3D 32.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; + +#pragma acc atomic capture + { fdata[i] =3D fgot; fgot =3D expr + fgot; } + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 0.0; + fexp =3D 32.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; + +#pragma acc atomic capture + { fgot =3D fgot + expr; fdata[i] =3D fgot; } + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 0.0; + fexp =3D 32.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; + +#pragma acc atomic capture + { fgot =3D expr + fgot; fdata[i] =3D fgot; } + } + } + + if (fexp !=3D fgot) + abort (); + + /* BINOP =3D * */ + fgot =3D 1.0; + fexp =3D 8192.0*8192.0*64.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; + +#pragma acc atomic capture + { fdata[i] =3D fgot; fgot *=3D expr; } + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1.0; + fexp =3D 8192.0*8192.0*64.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; + +#pragma acc atomic capture + { fgot *=3D expr; fdata[i] =3D fgot; } + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1.0; + fexp =3D 8192.0*8192.0*64.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; + +#pragma acc atomic capture + { fdata[i] =3D fgot; fgot =3D fgot * expr; } + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1.0; + fexp =3D 8192.0*8192.0*64.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; + +#pragma acc atomic capture + { fdata[i] =3D fgot; fgot =3D expr * fgot; } + } + } + + if (fexp !=3D fgot) + abort (); + + lgot =3D 1LL; + lexp =3D 1LL << 32; + +#pragma acc data copy (lgot, ldata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 2LL; + +#pragma acc atomic capture + { lgot =3D lgot * expr; ldata[i] =3D lgot; } + } + } + + if (lexp !=3D lgot) + abort (); + + fgot =3D 1.0; + fexp =3D 8192.0*8192.0*64.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 2; + +#pragma acc atomic capture + { fgot =3D expr * fgot; fdata[i] =3D fgot; } + } + } + + if (fexp !=3D fgot) + abort (); + + /* BINOP =3D - */ + fgot =3D 32.0; + fexp =3D 0.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; +=20=20 +#pragma acc atomic capture + { fdata[i] =3D fgot; fgot -=3D expr; } + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 32.0; + fexp =3D 0.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; + +#pragma acc atomic capture + { fgot -=3D expr; fdata[i] =3D fgot; } + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 32.0; + fexp =3D 0.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; + +#pragma acc atomic capture + { fdata[i] =3D fgot; fgot =3D fgot - expr; } + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1.0; + fexp =3D 1.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; + +#pragma acc atomic capture + { fdata[i] =3D fgot; fgot =3D expr - fgot; } + } + } + + for (i =3D 0; i < N; i++) + if (i % 2 =3D=3D 0) + { + if (fdata[i] !=3D 1.0) + abort (); + } + else + { + if (fdata[i] !=3D 0.0) + abort (); + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1.0; + fexp =3D -31.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; + +#pragma acc atomic capture + { fgot =3D fgot - expr; fdata[i] =3D fgot; } + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1.0; + fexp =3D 1.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; + +#pragma acc atomic capture + { fgot =3D expr - fgot; fdata[i] =3D fgot; } + } + } + + for (i =3D 0; i < N; i++) + if (i % 2 =3D=3D 0) + { + if (fdata[i] !=3D 0.0) + abort (); + } + else + { + if (fdata[i] !=3D 1.0) + abort (); + } + + if (fexp !=3D fgot) + abort (); + + /* BINOP =3D / */ + fgot =3D 8192.0*8192.0*64.0; + fexp =3D 1.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; + +#pragma acc atomic capture + { fdata[i] =3D fgot; fgot /=3D expr; } + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 8192.0*8192.0*64.0; + fexp =3D 1.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; + +#pragma acc atomic capture + { fgot /=3D expr; fdata[i] =3D fgot; } + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 8192.0*8192.0*64.0; + fexp =3D 1.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; + +#pragma acc atomic capture + { fdata[i] =3D fgot; fgot =3D fgot / expr; } + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 8192.0*8192.0*64.0; + fexp =3D 1.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; + +#pragma acc atomic capture + { fdata[i] =3D fgot; fgot =3D expr / fgot; } + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 4.0; + fexp =3D 4.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1LL << N; + +#pragma acc atomic capture + { fgot =3D fgot / expr; fdata[i] =3D fgot; } + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 4.0; + fexp =3D 4.0; + +#pragma acc data copy (fgot, fdata[0:N]) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; + +#pragma acc atomic capture + { fgot =3D expr / fgot; fdata[i] =3D fgot; } + } + } + + if (fexp !=3D fgot) + abort (); + + return 0; +} diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_update-1.c li= bgomp/testsuite/libgomp.oacc-c-c++-common/atomic_update-1.c new file mode 100644 index 0000000..18ee3aa --- /dev/null +++ libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_update-1.c @@ -0,0 +1,760 @@ +/* { dg-do run } */ + +#include + +int +main(int argc, char **argv) +{ + float fexp, fgot; + int iexp, igot; + long long lexp, lgot; + int N =3D 32; + int i; + + fgot =3D 1234.0; + fexp =3D 1235.0; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < 1; i++) +#pragma acc atomic update + fgot++; + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1234.0; + fexp =3D fgot - N; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { +#pragma acc atomic update + fgot--; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1234.0; + fexp =3D fgot + N; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { +#pragma acc atomic update + ++fgot; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1234.0; + fexp =3D fgot - N; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { +#pragma acc atomic update + --fgot; + } + } + + if (fexp !=3D fgot) + abort (); + + /* BINOP =3D + */ + + fgot =3D 1234.0; + fexp =3D fgot + N; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; +#pragma acc atomic update + fgot +=3D expr; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1234.0; + fexp =3D fgot + N; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; +#pragma acc atomic update + fgot =3D fgot + expr; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1234.0; + fexp =3D fgot + N; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; +#pragma acc atomic update + fgot =3D expr + fgot; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1234.0; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 0.5; +#pragma acc atomic update + fgot =3D (expr + expr) + fgot; + } + } + + if (fexp !=3D fgot) + abort (); + + /* BINOP =3D * */ + + fgot =3D 1234.0; + fexp =3D 1234.0; + + for (i =3D 0; i < N; i++) + fexp *=3D 2.0; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; +#pragma acc atomic update + fgot *=3D expr; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1234.0; + fexp =3D 1234.0; + + for (i =3D 0; i < N; i++) + fexp =3D fexp * 2.0; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; +#pragma acc atomic update + fgot =3D fgot * expr; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1234.0; + fexp =3D 1234.0; + + for (i =3D 0; i < N; i++) + fexp =3D 2.0 * fexp; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; +#pragma acc atomic update + fgot =3D expr * fgot; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1234.0; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; +#pragma acc atomic update + fgot =3D (expr + expr) * fgot; + } + } + + if (fexp !=3D fgot) + abort (); + + /* BINOP =3D - */ + + fgot =3D 1234.0; + fexp =3D 1234.0; + + for (i =3D 0; i < N; i++) + fexp -=3D 2.0; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; +#pragma acc atomic update + fgot -=3D expr; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1234.0; + fexp =3D 1234.0; + + for (i =3D 0; i < N; i++) + fexp =3D fexp - 2.0; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; +#pragma acc atomic update + fgot =3D fgot - expr; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1234.0; + fexp =3D 1234.0; + + for (i =3D 0; i < N; i++) + fexp =3D 2.0 - fexp; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; + +#pragma acc atomic update + fgot =3D expr - fgot; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1234.0; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; +#pragma acc atomic update + fgot =3D (expr + expr) - fgot; + } + } + + if (fexp !=3D fgot) + abort (); + + /* BINOP =3D / */ + + fgot =3D 1234.0; + fexp =3D 1234.0; + + for (i =3D 0; i < N; i++) + fexp /=3D 2.0; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; +#pragma acc atomic update + fgot /=3D expr; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1234.0; + fexp =3D 1234.0; + + for (i =3D 0; i < N; i++) + fexp =3D fexp / 2.0; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; +=09 +#pragma acc atomic update + fgot =3D fgot / expr; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1234.0; + fexp =3D 1234.0; + + for (i =3D 0; i < N; i++) + fexp =3D 2.0 / fexp; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 2.0; + +#pragma acc atomic update + fgot =3D expr / fgot; + } + } + + if (fexp !=3D fgot) + abort (); + + fgot =3D 1234.0; + fexp =3D 1234.0; + + for (i =3D 0; i < N; i++) + fexp =3D 2.0 / fexp; + +#pragma acc data copy (fgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + float expr =3D 1.0; +#pragma acc atomic update + fgot =3D (expr + expr) / fgot; + } + } + + if (fexp !=3D fgot) + abort (); + + /* BINOP =3D & */ + + igot =3D ~0; + iexp =3D 0; + +#pragma acc data copy (igot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D ~(1 << i); + +#pragma acc atomic update + igot &=3D expr; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D ~0; + iexp =3D 0; + +#pragma acc data copy (igot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D ~(1 << i); +#pragma acc atomic update + igot =3D igot / expr; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D ~0; + iexp =3D 0; + +#pragma acc data copy (igot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D ~(1 << i); +#pragma acc atomic update + igot =3D expr & igot; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D ~0; + iexp =3D 0; + +#pragma acc data copy (igot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D ~(1 << i); + int zero =3D 0; + +#pragma acc atomic update + igot =3D (expr + zero) & igot; + } + } + + if (iexp !=3D igot) + abort (); + + /* BINOP =3D ^ */ + + igot =3D ~0; + iexp =3D 0; + +#pragma acc data copy (igot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D (1 << i); + +#pragma acc atomic update + igot ^=3D expr; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D ~0; + iexp =3D 0; + +#pragma acc data copy (igot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D (1 << i); + +#pragma acc atomic update + igot =3D igot ^ expr; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D ~0; + iexp =3D 0; + +#pragma acc data copy (igot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D (1 << i); + +#pragma acc atomic update + igot =3D expr ^ igot; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D ~0; + iexp =3D 0; + +#pragma acc data copy (igot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D (1 << i); + int zero =3D 0; + +#pragma acc atomic update + igot =3D (expr + zero) ^ igot; + } + } + + if (iexp !=3D igot) + abort (); + + /* BINOP =3D | */ + + igot =3D 0; + iexp =3D ~0; + +#pragma acc data copy (igot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D (1 << i); + +#pragma acc atomic update + igot |=3D expr; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 0; + iexp =3D ~0; + +#pragma acc data copy (igot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D (1 << i); + +#pragma acc atomic update + igot =3D igot | expr; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 0; + iexp =3D ~0; + +#pragma acc data copy (igot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D (1 << i); + +#pragma acc atomic update + igot =3D expr | igot; + } + } + + if (iexp !=3D igot) + abort (); + + igot =3D 0; + iexp =3D ~0; + +#pragma acc data copy (igot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + int expr =3D (1 << i); + int zero =3D 0; + +#pragma acc atomic update + igot =3D (expr + zero) | igot; + } + } + + if (iexp !=3D igot) + abort (); + + /* BINOP =3D << */ + + lgot =3D 1LL; + lexp =3D 1LL << N; + +#pragma acc data copy (lgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1LL; + +#pragma acc atomic update + lgot <<=3D expr; + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 1LL << N; + +#pragma acc data copy (lgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1LL; + +#pragma acc atomic update + lgot =3D lgot << expr; + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 2LL; + +#pragma acc data copy (lgot) + { +#pragma acc parallel loop + for (i =3D 0; i < 1; i++) + { + long long expr =3D 1LL; + +#pragma acc atomic update + lgot =3D expr << lgot; + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 2LL; + +#pragma acc data copy (lgot) + { +#pragma acc parallel loop + for (i =3D 0; i < 1; i++) + { + long long expr =3D 1LL; + long long zero =3D 0LL; + +#pragma acc atomic update + lgot =3D (expr + zero) << lgot; + } + } + + if (lexp !=3D lgot) + abort (); + + /* BINOP =3D >> */ + + lgot =3D 1LL << N; + lexp =3D 1LL; + +#pragma acc data copy (lgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1LL; + +#pragma acc atomic update + lgot >>=3D expr; + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL << N; + lexp =3D 1LL; + +#pragma acc data copy (lgot) + { +#pragma acc parallel loop + for (i =3D 0; i < N; i++) + { + long long expr =3D 1LL; + +#pragma acc atomic update + lgot =3D lgot >> expr; + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 1LL << (N - 1); + +#pragma acc data copy (lgot) + { +#pragma acc parallel loop + for (i =3D 0; i < 1; i++) + { + long long expr =3D 1LL << N; + +#pragma acc atomic update + lgot =3D expr >> lgot; + } + } + + if (lexp !=3D lgot) + abort (); + + lgot =3D 1LL; + lexp =3D 1LL << (N - 1); + +#pragma acc data copy (lgot) + { +#pragma acc parallel loop + for (i =3D 0; i < 1; i++) + { + long long expr =3D 1LL << N; + long long zero =3D 0LL; + +#pragma acc atomic update + lgot =3D (expr + zero) >> lgot; + } + } + + if (lexp !=3D lgot) + abort (); + + return 0; +} diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-1.c libgomp/= testsuite/libgomp.oacc-c-c++-common/clauses-1.c index 51c0cf5..410c46c 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-1.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-1.c @@ -586,6 +586,32 @@ main (int argc, char **argv) =20 for (i =3D 0; i < N; i++) { + a[i] =3D 6.0; + b[i] =3D 0.0; + } + +#pragma acc parallel pcopy (a[0:N], b[0:N]) + { + int ii; + + for (ii =3D 0; ii < N; ii++) + b[ii] =3D a[ii]; + } + + for (i =3D 0; i < N; i++) + { + if (b[i] !=3D 6.0) + abort (); + } + + if (acc_is_present (&a[0], (N * sizeof (float)))) + abort (); + + if (acc_is_present (&b[0], (N * sizeof (float)))) + abort (); + + for (i =3D 0; i < N; i++) + { a[i] =3D 5.0; b[i] =3D 7.0; } diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/data-2.c index f867a66..5fc9fb6 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c @@ -25,7 +25,33 @@ main (int argc, char **argv) } =20 #pragma acc enter data copyin (a[0:N]) copyin (b[0:N]) copyin (N) async -#pragma acc parallel async wait +#pragma acc parallel async wait present (a[0:N]) present (b[0:N]) present = (N) +#pragma acc loop + for (i =3D 0; i < N; i++) + b[i] =3D a[i]; + +#pragma acc exit data copyout (a[0:N]) copyout (b[0:N]) delete (N) wait as= ync +#pragma acc wait + + for (i =3D 0; i < N; i++) + { + if (a[i] !=3D 3.0) + abort (); + + if (b[i] !=3D 3.0) + abort (); + } + + for (i =3D 0; i < N; i++) + { + a[i] =3D 3.0; + b[i] =3D 0.0; + } + +#pragma acc enter data copyin (a[0:N]) async=20 +#pragma acc enter data copyin (b[0:N]) async wait +#pragma acc enter data copyin (N) async wait +#pragma acc parallel async wait present (a[0:N]) present (b[0:N]) present = (N) #pragma acc loop for (i =3D 0; i < N; i++) b[i] =3D a[i]; @@ -49,7 +75,7 @@ main (int argc, char **argv) } =20 #pragma acc enter data copyin (a[0:N]) copyin (b[0:N]) copyin (N) async (1) -#pragma acc parallel async (1) +#pragma acc parallel async (1) present (a[0:N]) present (b[0:N]) present (= N) #pragma acc loop for (i =3D 0; i < N; i++) b[i] =3D a[i]; @@ -76,17 +102,17 @@ main (int argc, char **argv) =20 #pragma acc enter data copyin (a[0:N]) copyin (b[0:N]) copyin (c[0:N]) cop= yin (d[0:N]) copyin (N) async (1) =20 -#pragma acc parallel async (1) wait (1) +#pragma acc parallel async (1) wait (1) present (a[0:N]) present (b[0:N]) = present (c[0:N]) present (d[0:N]) present (N) #pragma acc loop for (i =3D 0; i < N; i++) b[i] =3D (a[i] * a[i] * a[i]) / a[i]; =20 -#pragma acc parallel async (2) wait (1) +#pragma acc parallel async (2) wait (1) present (a[0:N]) present (b[0:N]) = present (c[0:N]) present (d[0:N]) present (N) #pragma acc loop for (i =3D 0; i < N; i++) c[i] =3D (a[i] + a[i] + a[i] + a[i]) / a[i]; =20 -#pragma acc parallel async (3) wait (1) +#pragma acc parallel async (3) wait (1) present (a[0:N]) present (b[0:N]) = present (c[0:N]) present (d[0:N]) present (N) #pragma acc loop for (i =3D 0; i < N; i++) d[i] =3D ((a[i] * a[i] + a[i]) / a[i]) - a[i]; @@ -120,19 +146,19 @@ main (int argc, char **argv) =20 #pragma acc enter data copyin (a[0:N]) copyin (b[0:N]) copyin (c[0:N]) cop= yin (d[0:N]) copyin (e[0:N]) copyin (N) async (1) =20 -#pragma acc parallel async (1) wait (1) +#pragma acc parallel async (1) wait (1) present (a[0:N]) present (b[0:N]) = present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N) for (int ii =3D 0; ii < N; ii++) b[ii] =3D (a[ii] * a[ii] * a[ii]) / a[ii]; =20 -#pragma acc parallel async (2) wait (1) +#pragma acc parallel async (2) wait (1) present (a[0:N]) present (b[0:N]) = present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N) for (int ii =3D 0; ii < N; ii++) c[ii] =3D (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii]; =20 -#pragma acc parallel async (3) wait (1) +#pragma acc parallel async (3) wait (1) present (a[0:N]) present (b[0:N]) = present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N) for (int ii =3D 0; ii < N; ii++) d[ii] =3D ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii]; =20 -#pragma acc parallel wait (1) async (4) +#pragma acc parallel wait (1) async (4) present (a[0:N]) present (b[0:N]) = present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N) for (int ii =3D 0; ii < N; ii++) e[ii] =3D a[ii] + b[ii] + c[ii] + d[ii]; =20 diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/data-3.c index 747109f..6e173d3 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c @@ -25,7 +25,7 @@ main (int argc, char **argv) } =20 #pragma acc enter data copyin (a[0:N]) copyin (b[0:N]) copyin (N) async -#pragma acc parallel async wait +#pragma acc parallel async wait present (a[0:N]) present (b[0:N]) present = (N) #pragma acc loop for (i =3D 0; i < N; i++) b[i] =3D a[i]; @@ -49,7 +49,7 @@ main (int argc, char **argv) } =20 #pragma acc update device (a[0:N], b[0:N]) async (1) -#pragma acc parallel async (1) +#pragma acc parallel async (1) present (a[0:N]) present (b[0:N]) present (= N) #pragma acc loop for (i =3D 0; i < N; i++) b[i] =3D a[i]; @@ -78,17 +78,17 @@ main (int argc, char **argv) #pragma acc update device (b[0:N]) async (2) #pragma acc enter data copyin (c[0:N], d[0:N]) async (3) =20 -#pragma acc parallel async (1) wait (1,2) +#pragma acc parallel async (1) wait (1,2) present (a[0:N]) present (b[0:N]= ) present (c[0:N]) present (d[0:N]) present (N) #pragma acc loop for (i =3D 0; i < N; i++) b[i] =3D (a[i] * a[i] * a[i]) / a[i]; =20 -#pragma acc parallel async (2) wait (1,3) +#pragma acc parallel async (2) wait (1,3) present (a[0:N]) present (b[0:N]= ) present (c[0:N]) present (d[0:N]) present (N) #pragma acc loop for (i =3D 0; i < N; i++) c[i] =3D (a[i] + a[i] + a[i] + a[i]) / a[i]; =20 -#pragma acc parallel async (3) wait (1,3) +#pragma acc parallel async (3) wait (1,3) present (a[0:N]) present (b[0:N]= ) present (c[0:N]) present (d[0:N]) present (N) #pragma acc loop for (i =3D 0; i < N; i++) d[i] =3D ((a[i] * a[i] + a[i]) / a[i]) - a[i]; @@ -123,19 +123,19 @@ main (int argc, char **argv) #pragma acc update device (a[0:N], b[0:N], c[0:N], d[0:N]) async (1) #pragma acc enter data copyin (e[0:N]) async (5) =20 -#pragma acc parallel async (1) wait (1) +#pragma acc parallel async (1) wait (1) present (a[0:N]) present (b[0:N]) = present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N) for (int ii =3D 0; ii < N; ii++) b[ii] =3D (a[ii] * a[ii] * a[ii]) / a[ii]; =20 -#pragma acc parallel async (2) wait (1) +#pragma acc parallel async (2) wait (1) present (a[0:N]) present (b[0:N]) = present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N) for (int ii =3D 0; ii < N; ii++) c[ii] =3D (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii]; =20 -#pragma acc parallel async (3) wait (1) +#pragma acc parallel async (3) wait (1) present (a[0:N]) present (b[0:N]) = present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N) for (int ii =3D 0; ii < N; ii++) d[ii] =3D ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii]; =20 -#pragma acc parallel wait (1,5) async (4) +#pragma acc parallel wait (1,5) async (4) present (a[0:N]) present (b[0:N]= ) present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N) for (int ii =3D 0; ii < N; ii++) e[ii] =3D a[ii] + b[ii] + c[ii] + d[ii]; =20 diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h libgo= mp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h new file mode 100644 index 0000000..8341053 --- /dev/null +++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h @@ -0,0 +1,202 @@ +int i; + +int main(void) +{ + int j, v; + + i =3D -1; + j =3D -2; + v =3D 0; +#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) copyin (i,= j) + { + if (i !=3D -1 || j !=3D -2) + abort (); + i =3D 2; + j =3D 1; + if (i !=3D 2 || j !=3D 1) + abort (); + v =3D 1; + } +#if ACC_MEM_SHARED + if (v !=3D 1 || i !=3D 2 || j !=3D 1) + abort (); +#else + if (v !=3D 1 || i !=3D -1 || j !=3D -2) + abort (); +#endif + + i =3D -1; + j =3D -2; + v =3D 0; +#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) copyout (i= , j) + { + i =3D 2; + j =3D 1; + if (i !=3D 2 || j !=3D 1) + abort (); + v =3D 1; + } + if (v !=3D 1 || i !=3D 2 || j !=3D 1) + abort (); + + i =3D -1; + j =3D -2; + v =3D 0; +#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) copy (i, j) + { + if (i !=3D -1 || j !=3D -2) + abort (); + i =3D 2; + j =3D 1; + if (i !=3D 2 || j !=3D 1) + abort (); + v =3D 1; + } + if (v !=3D 1 || i !=3D 2 || j !=3D 1) + abort (); + + i =3D -1; + j =3D -2; + v =3D 0; +#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) create (i,= j) + { + i =3D 2; + j =3D 1; + if (i !=3D 2 || j !=3D 1) + abort (); + v =3D 1; + } +#if ACC_MEM_SHARED + if (v !=3D 1 || i !=3D 2 || j !=3D 1) + abort (); +#else + if (v !=3D 1 || i !=3D -1 || j !=3D -2) + abort (); +#endif + + i =3D -1; + j =3D -2; + v =3D 0; +#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or= _copyin (i, j) + { + if (i !=3D -1 || j !=3D -2) + abort (); + i =3D 2; + j =3D 1; + if (i !=3D 2 || j !=3D 1) + abort (); + v =3D 1; + } + if (v !=3D 1) + abort (); +#if ACC_MEM_SHARED + if (v !=3D 1 || i !=3D 2 || j !=3D 1) + abort (); +#else + if (v !=3D 1 || i !=3D -1 || j !=3D -2) + abort (); +#endif + + i =3D -1; + j =3D -2; + v =3D 0; +#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or= _copyout (i, j) + { + i =3D 2; + j =3D 1; + if (i !=3D 2 || j !=3D 1) + abort (); + v =3D 1; + } + if (v !=3D 1 || i !=3D 2 || j !=3D 1) + abort (); + + i =3D -1; + j =3D -2; + v =3D 0; +#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or= _copy (i, j) + { + if (i !=3D -1 || j !=3D -2) + abort (); + i =3D 2; + j =3D 1; + if (i !=3D 2 || j !=3D 1) + abort (); + v =3D 1; + } + if (v !=3D 1 || i !=3D 2 || j !=3D 1) + abort (); + + i =3D -1; + j =3D -2; + v =3D 0; +#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or= _create (i, j) + { + i =3D 2; + j =3D 1; + if (i !=3D 2 || j !=3D 1) + abort (); + v =3D 1; + } + if (v !=3D 1) + abort (); +#if ACC_MEM_SHARED + if (v !=3D 1 || i !=3D 2 || j !=3D 1) + abort (); +#else + if (v !=3D 1 || i !=3D -1 || j !=3D -2) + abort (); +#endif + + i =3D -1; + j =3D -2; + v =3D 0; + +#pragma acc data copyin (i, j) + { +#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present (i= , j) + { + if (i !=3D -1 || j !=3D -2) + abort (); + i =3D 2; + j =3D 1; + if (i !=3D 2 || j !=3D 1) + abort (); + v =3D 1; + } + } +#if ACC_MEM_SHARED + if (v !=3D 1 || i !=3D 2 || j !=3D 1) + abort (); +#else + if (v !=3D 1 || i !=3D -1 || j !=3D -2) + abort (); +#endif + + i =3D -1; + j =3D -2; + v =3D 0; + +#pragma acc data copyin(i, j) + { +#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) + { + if (i !=3D -1 || j !=3D -2) + abort (); + i =3D 2; + j =3D 1; + if (i !=3D 2 || j !=3D 1) + abort (); + v =3D 1; + } + } +#if ACC_MEM_SHARED + if (v !=3D 1 || i !=3D 2 || j !=3D 1) + abort (); +#else + if (v !=3D 1 || i !=3D -1 || j !=3D -2) + abort (); +#endif + + return 0; +} diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c libgomp/= testsuite/libgomp.oacc-c-c++-common/kernels-1.c index 3acfdf5..aeb0142 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c @@ -2,183 +2,5 @@ =20 #include =20 -int i; - -int main (void) -{ - int j, v; - -#if 0 - i =3D -1; - j =3D -2; - v =3D 0; -#pragma acc kernels /* copyout */ present_or_copyout (v) copyin (i, j) - { - if (i !=3D -1 || j !=3D -2) - abort (); - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } - if (v !=3D 1 || i !=3D -1 || j !=3D -2) - abort (); - - i =3D -1; - j =3D -2; - v =3D 0; -#pragma acc kernels /* copyout */ present_or_copyout (v) copyout (i, j) - { - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } - if (v !=3D 1 || i !=3D 2 || j !=3D 1) - abort (); - - i =3D -1; - j =3D -2; - v =3D 0; -#pragma acc kernels /* copyout */ present_or_copyout (v) copy (i, j) - { - if (i !=3D -1 || j !=3D -2) - abort (); - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } - if (v !=3D 1 || i !=3D 2 || j !=3D 1) - abort (); - - i =3D -1; - j =3D -2; - v =3D 0; -#pragma acc kernels /* copyout */ present_or_copyout (v) create (i, j) - { - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } - if (v !=3D 1 || i !=3D -1 || j !=3D -2) - abort (); -#endif - - i =3D -1; - j =3D -2; - v =3D 0; -#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyin= (i, j) - { - if (i !=3D -1 || j !=3D -2) - abort (); - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } - if (v !=3D 1) - abort (); -#if ACC_MEM_SHARED - if (i !=3D 2 || j !=3D 1) - abort (); -#else - if (i !=3D -1 || j !=3D -2) - abort (); -#endif - - i =3D -1; - j =3D -2; - v =3D 0; -#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyou= t (i, j) - { - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } - if (v !=3D 1 || i !=3D 2 || j !=3D 1) - abort (); - - i =3D -1; - j =3D -2; - v =3D 0; -#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copy (= i, j) - { - if (i !=3D -1 || j !=3D -2) - abort (); - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } - if (v !=3D 1 || i !=3D 2 || j !=3D 1) - abort (); - - i =3D -1; - j =3D -2; - v =3D 0; -#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_create= (i, j) - { - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } - if (v !=3D 1) - abort (); -#if ACC_MEM_SHARED - if (i !=3D 2 || j !=3D 1) - abort (); -#else - if (i !=3D -1 || j !=3D -2) - abort (); -#endif - -#if 0 - i =3D -1; - j =3D -2; - v =3D 0; -#pragma acc kernels /* copyout */ present_or_copyout (v) present (i, j) - { - if (i !=3D -1 || j !=3D -2) - abort (); - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } - if (v !=3D 1 || i !=3D 2 || j !=3D 1) - abort (); -#endif - -#if 0 - i =3D -1; - j =3D -2; - v =3D 0; -#pragma acc kernels /* copyout */ present_or_copyout (v) - { - if (i !=3D -1 || j !=3D -2) - abort (); - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } - if (v !=3D 1 || i !=3D 2 || j !=3D 1) - abort (); -#endif - - return 0; -} +#define EXEC_DIRECTIVE kernels +#include "data-clauses.h" diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-69.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/lib-69.c index 5462f12..78c834a 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-69.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-69.c @@ -9,46 +9,14 @@ int main (int argc, char **argv) { - CUdevice dev; CUfunction delay; CUmodule module; CUresult r; CUstream stream; - unsigned long *a, *d_a, dticks; - int nbytes; - float dtime; - void *kargs[2]; - int clkrate; - int devnum, nprocs; =20 acc_init (acc_device_nvidia); =20 - devnum =3D acc_get_device_num (acc_device_nvidia); - - r =3D cuDeviceGet (&dev, devnum); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGet failed: %d\n", r); - abort (); - } - - r =3D - cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUN= T, - dev); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - - r =3D cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, de= v); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - - r =3D cuModuleLoad (&module, "subr.ptx"); + r =3D cuModuleLoad (&module, "./subr.ptx"); if (r !=3D CUDA_SUCCESS) { fprintf (stderr, "cuModuleLoad failed: %d\n", r); @@ -62,20 +30,6 @@ main (int argc, char **argv) abort (); } =20 - nbytes =3D nprocs * sizeof (unsigned long); - - dtime =3D 200.0; - - dticks =3D (unsigned long) (dtime * clkrate); - - a =3D (unsigned long *) malloc (nbytes); - d_a =3D (unsigned long *) acc_malloc (nbytes); - - acc_map_data (a, d_a, nbytes); - - kargs[0] =3D (void *) &d_a; - kargs[1] =3D (void *) &dticks; - stream =3D (CUstream) acc_get_cuda_stream (0); if (stream !=3D NULL) abort (); @@ -90,31 +44,21 @@ main (int argc, char **argv) if (!acc_set_cuda_stream (0, stream)) abort (); =20 - r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0); + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, NULL, 0); if (r !=3D CUDA_SUCCESS) { fprintf (stderr, "cuLaunchKernel failed: %d\n", r); abort (); } =20 - if (acc_async_test (0) !=3D 0) - { - fprintf (stderr, "asynchronous operation not running\n"); - abort (); - } + if (acc_async_test (0) =3D=3D 1) + fprintf (stderr, "expected asynchronous operation to be running\n"); =20 - sleep (1); + acc_wait_all (); =20 - if (acc_async_test (0) !=3D 1) - { - fprintf (stderr, "found asynchronous operation still running\n"); - abort (); - } + if (acc_async_test (0) =3D=3D 0) + fprintf (stderr, "expected asynchronous operation to be running\n"); =20 - acc_unmap_data (a); - - free (a); - acc_free (d_a); =20 acc_shutdown (acc_device_nvidia); =20 diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-70.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/lib-70.c index 912b266..ee06898 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-70.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-70.c @@ -1,6 +1,7 @@ /* { dg-do run { target openacc_nvidia_accel_selected } } */ /* { dg-additional-options "-lcuda" } */ =20 +#include #include #include #include @@ -10,47 +11,17 @@ int main (int argc, char **argv) { - CUdevice dev; CUfunction delay; CUmodule module; CUresult r; - const int N =3D 10; + const int N =3D 3; int i; CUstream streams[N]; - unsigned long *a, *d_a, dticks; - int nbytes; - float dtime; - void *kargs[2]; - int clkrate; - int devnum, nprocs; + struct timeval tv1, tv2; + time_t diff; =20 acc_init (acc_device_nvidia); =20 - devnum =3D acc_get_device_num (acc_device_nvidia); - - r =3D cuDeviceGet (&dev, devnum); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGet failed: %d\n", r); - abort (); - } - - r =3D - cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUN= T, - dev); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - - r =3D cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, de= v); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - r =3D cuModuleLoad (&module, "subr.ptx"); if (r !=3D CUDA_SUCCESS) { @@ -65,20 +36,6 @@ main (int argc, char **argv) abort (); } =20 - nbytes =3D nprocs * sizeof (unsigned long); - - dtime =3D 200.0; - - dticks =3D (unsigned long) (dtime * clkrate); - - a =3D (unsigned long *) malloc (nbytes); - d_a =3D (unsigned long *) acc_malloc (nbytes); - - acc_map_data (a, d_a, nbytes); - - kargs[0] =3D (void *) &d_a; - kargs[1] =3D (void *) &dticks; - for (i =3D 0; i < N; i++) { streams[i] =3D (CUstream) acc_get_cuda_stream (i); @@ -96,9 +53,29 @@ main (int argc, char **argv) abort (); } =20 + gettimeofday (&tv1, NULL); + + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[0], NULL, 0); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuLaunchKernel failed: %d\n", r); + abort (); + } + + r =3D cuCtxSynchronize (); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuCtxLaunch failed: %d\n", r); + abort (); + } + + gettimeofday (&tv2, NULL); + + diff =3D tv2.tv_sec - tv1.tv_sec; + for (i =3D 0; i < N; i++) { - r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], kargs,= 0); + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], NULL, = 0); if (r !=3D CUDA_SUCCESS) { fprintf (stderr, "cuLaunchKernel failed: %d\n", r); @@ -112,7 +89,7 @@ main (int argc, char **argv) } } =20 - sleep ((int) (dtime / 1000.0f) + 1); + sleep ((diff + 1) * N); =20 for (i =3D 0; i < N; i++) { @@ -123,10 +100,6 @@ main (int argc, char **argv) } } =20 - acc_unmap_data (a); - - free (a); - acc_free (d_a); =20 acc_shutdown (acc_device_nvidia); =20 diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-71.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/lib-71.c index e8584db..8db6bcb 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-71.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-71.c @@ -9,45 +9,13 @@ int main (int argc, char **argv) { - CUdevice dev; CUfunction delay; CUmodule module; CUresult r; CUstream stream; - unsigned long *a, *d_a, dticks; - int nbytes; - float dtime; - void *kargs[2]; - int clkrate; - int devnum, nprocs; =20 acc_init (acc_device_nvidia); =20 - devnum =3D acc_get_device_num (acc_device_nvidia); - - r =3D cuDeviceGet (&dev, devnum); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGet failed: %d\n", r); - abort (); - } - - r =3D - cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUN= T, - dev); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - - r =3D cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, de= v); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - r =3D cuModuleLoad (&module, "subr.ptx"); if (r !=3D CUDA_SUCCESS) { @@ -62,20 +30,6 @@ main (int argc, char **argv) abort (); } =20 - nbytes =3D nprocs * sizeof (unsigned long); - - dtime =3D 200.0; - - dticks =3D (unsigned long) (dtime * clkrate); - - a =3D (unsigned long *) malloc (nbytes); - d_a =3D (unsigned long *) acc_malloc (nbytes); - - acc_map_data (a, d_a, nbytes); - - kargs[0] =3D (void *) &d_a; - kargs[1] =3D (void *) &dticks; - r =3D cuStreamCreate (&stream, CU_STREAM_DEFAULT); if (r !=3D CUDA_SUCCESS) { @@ -85,7 +39,7 @@ main (int argc, char **argv) =20 acc_set_cuda_stream (0, stream); =20 - r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0); + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, NULL, 0); if (r !=3D CUDA_SUCCESS) { fprintf (stderr, "cuLaunchKernel failed: %d\n", r); @@ -98,7 +52,7 @@ main (int argc, char **argv) abort (); } =20 - sleep ((int) (dtime / 1000.0f) + 1); + sleep (1); =20 if (acc_async_test (1) !=3D 1) { @@ -106,11 +60,6 @@ main (int argc, char **argv) abort (); } =20 - acc_unmap_data (a); - - free (a); - acc_free (d_a); - acc_shutdown (acc_device_nvidia); =20 return 0; diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-72.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/lib-72.c index e383ba0..920ff5f 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-72.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-72.c @@ -10,45 +10,13 @@ int main (int argc, char **argv) { - CUdevice dev; CUfunction delay; CUmodule module; CUresult r; CUstream stream; - unsigned long *a, *d_a, dticks; - int nbytes; - float dtime; - void *kargs[2]; - int clkrate; - int devnum, nprocs; =20 acc_init (acc_device_nvidia); =20 - devnum =3D acc_get_device_num (acc_device_nvidia); - - r =3D cuDeviceGet (&dev, devnum); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGet failed: %d\n", r); - abort (); - } - - r =3D - cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUN= T, - dev); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - - r =3D cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, de= v); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - r =3D cuModuleLoad (&module, "subr.ptx"); if (r !=3D CUDA_SUCCESS) { @@ -63,20 +31,6 @@ main (int argc, char **argv) abort (); } =20 - nbytes =3D nprocs * sizeof (unsigned long); - - dtime =3D 200.0; - - dticks =3D (unsigned long) (dtime * clkrate); - - a =3D (unsigned long *) malloc (nbytes); - d_a =3D (unsigned long *) acc_malloc (nbytes); - - acc_map_data (a, d_a, nbytes); - - kargs[0] =3D (void *) &d_a; - kargs[1] =3D (void *) &dticks; - r =3D cuStreamCreate (&stream, CU_STREAM_DEFAULT); if (r !=3D CUDA_SUCCESS) { @@ -87,7 +41,7 @@ main (int argc, char **argv) if (!acc_set_cuda_stream (0, stream)) abort (); =20=20=20=20=20 - r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0); + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, NULL, 0); if (r !=3D CUDA_SUCCESS) { fprintf (stderr, "cuLaunchKernel failed: %d\n", r); @@ -100,7 +54,12 @@ main (int argc, char **argv) abort (); } =20 - sleep ((int) (dtime / 1000.f) + 1); + r =3D cuCtxSynchronize (); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuCtxSynchronize () failed: %d\n", r); + abort (); + } =20 if (acc_async_test_all () !=3D 1) { @@ -108,11 +67,6 @@ main (int argc, char **argv) abort (); } =20 - acc_unmap_data (a); - - free (a); - acc_free (d_a); - acc_shutdown (acc_device_nvidia); =20 exit (0); diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-73.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/lib-73.c index 43a8b7e..4fa9d5a 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-73.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-73.c @@ -1,6 +1,7 @@ /* { dg-do run { target openacc_nvidia_accel_selected } } */ /* { dg-additional-options "-lcuda" } */ =20 +#include #include #include #include @@ -10,47 +11,15 @@ int main (int argc, char **argv) { - CUdevice dev; CUfunction delay; CUmodule module; CUresult r; - const int N =3D 10; + const int N =3D 6; int i; CUstream streams[N]; - unsigned long *a, *d_a, dticks; - int nbytes; - float dtime; - void *kargs[2]; - int clkrate; - int devnum, nprocs; =20 acc_init (acc_device_nvidia); =20 - devnum =3D acc_get_device_num (acc_device_nvidia); - - r =3D cuDeviceGet (&dev, devnum); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGet failed: %d\n", r); - abort (); - } - - r =3D - cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUN= T, - dev); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - - r =3D cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, de= v); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - r =3D cuModuleLoad (&module, "subr.ptx"); if (r !=3D CUDA_SUCCESS) { @@ -65,20 +34,6 @@ main (int argc, char **argv) abort (); } =20 - nbytes =3D nprocs * sizeof (unsigned long); - - dtime =3D 200.0; - - dticks =3D (unsigned long) (dtime * clkrate); - - a =3D (unsigned long *) malloc (nbytes); - d_a =3D (unsigned long *) acc_malloc (nbytes); - - acc_map_data (a, d_a, nbytes); - - kargs[0] =3D (void *) &d_a; - kargs[1] =3D (void *) &dticks; - for (i =3D 0; i < N; i++) { streams[i] =3D (CUstream) acc_get_cuda_stream (i); @@ -98,13 +53,12 @@ main (int argc, char **argv) =20 for (i =3D 0; i < N; i++) { - r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], kargs,= 0); + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], NULL, = 0); if (r !=3D CUDA_SUCCESS) { fprintf (stderr, "cuLaunchKernel failed: %d\n", r); abort (); } - } =20 if (acc_async_test_all () !=3D 0) @@ -113,7 +67,12 @@ main (int argc, char **argv) abort (); } =20 - sleep ((int) (dtime / 1000.0f) + 1); + r =3D cuCtxSynchronize (); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuCtxSynchronize failed: %d\n", r); + abort (); + } =20 if (acc_async_test_all () !=3D 1) { @@ -121,11 +80,6 @@ main (int argc, char **argv) abort (); } =20 - acc_unmap_data (a); - - free (a); - acc_free (d_a); - acc_shutdown (acc_device_nvidia); =20 exit (0); diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-74.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/lib-74.c index 0726ee4..e25d894 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-74.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-74.c @@ -5,50 +5,20 @@ #include #include #include -#include "timer.h" +#include =20 int main (int argc, char **argv) { - CUdevice dev; CUfunction delay; CUmodule module; CUresult r; CUstream stream; - unsigned long *a, *d_a, dticks; - int nbytes; - float atime, dtime; - void *kargs[2]; - int clkrate; - int devnum, nprocs; + struct timeval tv1, tv2; + time_t t1, t2; =20 acc_init (acc_device_nvidia); =20 - devnum =3D acc_get_device_num (acc_device_nvidia); - - r =3D cuDeviceGet (&dev, devnum); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGet failed: %d\n", r); - abort (); - } - - r =3D - cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUN= T, - dev); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - - r =3D cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, de= v); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - r =3D cuModuleLoad (&module, "subr.ptx"); if (r !=3D CUDA_SUCCESS) { @@ -63,19 +33,25 @@ main (int argc, char **argv) abort (); } =20 - nbytes =3D nprocs * sizeof (unsigned long); + gettimeofday (&tv1, NULL); =20 - dtime =3D 200.0; + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, NULL, NULL, 0); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuLaunchKernel failed: %d\n", r); + abort (); + } =20 - dticks =3D (unsigned long) (dtime * clkrate); + r =3D cuCtxSynchronize (); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuCtxSynchronize failed: %d\n", r); + abort (); + } =20 - a =3D (unsigned long *) malloc (nbytes); - d_a =3D (unsigned long *) acc_malloc (nbytes); + gettimeofday (&tv2, NULL); =20 - acc_map_data (a, d_a, nbytes); - - kargs[0] =3D (void *) &d_a; - kargs[1] =3D (void *) &dticks; + t1 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); =20 stream =3D (CUstream) acc_get_cuda_stream (0); if (stream !=3D NULL) @@ -91,11 +67,9 @@ main (int argc, char **argv) if (!acc_set_cuda_stream (0, stream)) abort (); =20 - init_timers (1); + gettimeofday (&tv1, NULL); =20 - start_timer (0); - - r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0); + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, NULL, 0); if (r !=3D CUDA_SUCCESS) { fprintf (stderr, "cuLaunchKernel failed: %d\n", r); @@ -104,33 +78,30 @@ main (int argc, char **argv) =20 acc_wait (0); =20 - atime =3D stop_timer (0); + gettimeofday (&tv2, NULL); =20 - if (atime < dtime) + t2 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); + + if (((abs (t2 - t1) / t1) * 100.0) > 1.0) { - fprintf (stderr, "actual time < delay time\n"); + fprintf (stderr, "too long 1\n"); abort (); } =20 - start_timer (0); + gettimeofday (&tv1, NULL); =20 acc_wait (0); =20 - atime =3D stop_timer (0); + gettimeofday (&tv2, NULL); =20 - if (0.010 < atime) + t2 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); + + if (t2 > 1000) { - fprintf (stderr, "actual time too long\n"); + fprintf (stderr, "too long 2\n"); abort (); } =20 - acc_unmap_data (a); - - fini_timers (); - - free (a); - acc_free (d_a); - acc_shutdown (acc_device_nvidia); =20 exit (0); diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-75.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/lib-75.c index 1942211..53e285f 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-75.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-75.c @@ -6,52 +6,22 @@ #include #include #include -#include "timer.h" +#include =20 int main (int argc, char **argv) { - CUdevice dev; CUfunction delay; CUmodule module; CUresult r; - int N; + const int N =3D 2; int i; CUstream stream; - unsigned long *a, *d_a, dticks; - int nbytes; - float atime, dtime, hitime, lotime; - void *kargs[2]; - int clkrate; - int devnum, nprocs; + struct timeval tv1, tv2; + time_t t1, t2; =20 acc_init (acc_device_nvidia); =20 - devnum =3D acc_get_device_num (acc_device_nvidia); - - r =3D cuDeviceGet (&dev, devnum); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGet failed: %d\n", r); - abort (); - } - - r =3D - cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUN= T, - dev); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - - r =3D cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, de= v); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - r =3D cuModuleLoad (&module, "subr.ptx"); if (r !=3D CUDA_SUCCESS) { @@ -66,18 +36,25 @@ main (int argc, char **argv) abort (); } =20 - nbytes =3D nprocs * sizeof (unsigned long); + gettimeofday (&tv1, NULL); =20 - dtime =3D 200.0; + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, NULL, NULL, 0); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuLaunchKernel failed: %d\n", r); + abort (); + } =20 - dticks =3D (unsigned long) (dtime * clkrate); + r =3D cuCtxSynchronize (); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuCtxSynchronize failed: %d\n", r); + abort (); + } =20 - N =3D nprocs; + gettimeofday (&tv2, NULL); =20 - a =3D (unsigned long *) malloc (nbytes); - d_a =3D (unsigned long *) acc_malloc (nbytes); - - acc_map_data (a, d_a, nbytes); + t1 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); =20 stream =3D (CUstream) acc_get_cuda_stream (0); if (stream !=3D NULL) @@ -93,16 +70,11 @@ main (int argc, char **argv) if (!acc_set_cuda_stream (0, stream)) abort (); =20 - init_timers (1); - - kargs[0] =3D (void *) &d_a; - kargs[1] =3D (void *) &dticks; - - start_timer (0); + gettimeofday (&tv1, NULL); =20 for (i =3D 0; i < N; i++) { - r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0); + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, NULL, 0); if (r !=3D CUDA_SUCCESS) { fprintf (stderr, "cuLaunchKernel failed: %d\n", r); @@ -112,27 +84,18 @@ main (int argc, char **argv) acc_wait (0); } =20 - atime =3D stop_timer (0); + gettimeofday (&tv2, NULL); =20 - hitime =3D dtime * N; - hitime +=3D hitime * 0.02; + t2 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); =20 - lotime =3D dtime * N; - lotime -=3D lotime * 0.02; + t1 *=3D N; =20 - if (atime > hitime || atime < lotime) + if (((abs (t2 - t1) / t1) * 100.0) > 1.0) { - fprintf (stderr, "actual time < delay time\n"); + fprintf (stderr, "too long\n"); abort (); } =20 - acc_unmap_data (a); - - fini_timers (); - - free (a); - acc_free (d_a); - acc_shutdown (acc_device_nvidia); =20 exit (0); diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-76.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/lib-76.c index 11d9d62..787dcb8 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-76.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-76.c @@ -6,52 +6,22 @@ #include #include #include -#include "timer.h" +#include =20 int main (int argc, char **argv) { - CUdevice dev; CUfunction delay; CUmodule module; CUresult r; - int N; + const int N =3D 2; int i; CUstream *streams; - unsigned long *a, *d_a, dticks; - int nbytes; - float atime, dtime, hitime, lotime; - void *kargs[2]; - int clkrate; - int devnum, nprocs; + struct timeval tv1, tv2; + time_t t1, t2; =20 acc_init (acc_device_nvidia); =20 - devnum =3D acc_get_device_num (acc_device_nvidia); - - r =3D cuDeviceGet (&dev, devnum); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGet failed: %d\n", r); - abort (); - } - - r =3D - cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUN= T, - dev); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - - r =3D cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, de= v); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - r =3D cuModuleLoad (&module, "subr.ptx"); if (r !=3D CUDA_SUCCESS) { @@ -66,18 +36,25 @@ main (int argc, char **argv) abort (); } =20 - nbytes =3D nprocs * sizeof (unsigned long); + gettimeofday (&tv1, NULL); =20 - dtime =3D 200.0; + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, NULL, NULL, 0); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuLaunchKernel failed: %d\n", r); + abort (); + } =20 - dticks =3D (unsigned long) (dtime * clkrate); + r =3D cuCtxSynchronize (); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuCtxSynchronize failed: %d\n", r); + abort (); + } =20 - N =3D nprocs; + gettimeofday (&tv2, NULL); =20 - a =3D (unsigned long *) malloc (nbytes); - d_a =3D (unsigned long *) acc_malloc (nbytes); - - acc_map_data (a, d_a, nbytes); + t1 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); =20 streams =3D (CUstream *) malloc (N * sizeof (void *)); =20 @@ -98,16 +75,11 @@ main (int argc, char **argv) abort (); } =20 - init_timers (1); - - kargs[0] =3D (void *) &d_a; - kargs[1] =3D (void *) &dticks; - - start_timer (0); + gettimeofday (&tv1, NULL); =20 for (i =3D 0; i < N; i++) { - r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], kargs,= 0); + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], NULL, = 0); if (r !=3D CUDA_SUCCESS) { fprintf (stderr, "cuLaunchKernel failed: %d\n", r); @@ -117,27 +89,19 @@ main (int argc, char **argv) acc_wait (i); } =20 - atime =3D stop_timer (0); + gettimeofday (&tv2, NULL); =20 - hitime =3D dtime * N; - hitime +=3D hitime * 0.02; + t2 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); =20 - lotime =3D dtime * N; - lotime -=3D lotime * 0.02; + t1 *=3D N; =20 - if (atime > hitime || atime < lotime) + if (((abs (t2 - t1) / t1) * 100.0) > 1.0) { - fprintf (stderr, "actual time < delay time\n"); + fprintf (stderr, "too long\n"); abort (); } =20 - acc_unmap_data (a); - - fini_timers (); - free (streams); - free (a); - acc_free (d_a); =20 acc_shutdown (acc_device_nvidia); =20 diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-77.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/lib-77.c index 35a0980..5ef6fd9 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-77.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-77.c @@ -6,50 +6,20 @@ #include #include #include -#include "timer.h" +#include =20 int main (int argc, char **argv) { - CUdevice dev; CUfunction delay; CUmodule module; CUresult r; CUstream stream; - unsigned long *a, *d_a, dticks; - int nbytes; - float atime, dtime; - void *kargs[2]; - int clkrate; - int devnum, nprocs; + struct timeval tv1, tv2; + time_t t1, t2; =20 acc_init (acc_device_nvidia); =20 - devnum =3D acc_get_device_num (acc_device_nvidia); - - r =3D cuDeviceGet (&dev, devnum); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGet failed: %d\n", r); - abort (); - } - - r =3D - cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUN= T, - dev); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - - r =3D cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, de= v); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - r =3D cuModuleLoad (&module, "subr.ptx"); if (r !=3D CUDA_SUCCESS) { @@ -64,19 +34,25 @@ main (int argc, char **argv) abort (); } =20 - nbytes =3D nprocs * sizeof (unsigned long); + gettimeofday (&tv1, NULL); =20 - dtime =3D 200.0; + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, NULL, NULL, 0); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuLaunchKernel failed: %d\n", r); + abort (); + } =20 - dticks =3D (unsigned long) (dtime * clkrate); + r =3D cuCtxSynchronize(); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuCtxSynchronize failed: %d\n", r); + abort (); + } =20 - a =3D (unsigned long *) malloc (nbytes); - d_a =3D (unsigned long *) acc_malloc (nbytes); + gettimeofday (&tv2, NULL); =20 - acc_map_data (a, d_a, nbytes); - - kargs[0] =3D (void *) &d_a; - kargs[1] =3D (void *) &dticks; + t1 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); =20 r =3D cuStreamCreate (&stream, CU_STREAM_DEFAULT); if (r !=3D CUDA_SUCCESS) @@ -87,11 +63,9 @@ main (int argc, char **argv) =20 acc_set_cuda_stream (0, stream); =20 - init_timers (1); + gettimeofday (&tv1, NULL); =20 - start_timer (0); - - r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0); + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, NULL, 0); if (r !=3D CUDA_SUCCESS) { fprintf (stderr, "cuLaunchKernel failed: %d\n", r); @@ -100,33 +74,30 @@ main (int argc, char **argv) =20 acc_wait (1); =20 - atime =3D stop_timer (0); + gettimeofday (&tv2, NULL); =20 - if (atime < dtime) + t2 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); + + if (t2 > t1) { - fprintf (stderr, "actual time < delay time\n"); + fprintf (stderr, "too long 1\n"); abort (); } =20 - start_timer (0); + gettimeofday (&tv1, NULL); =20 acc_wait (1); =20 - atime =3D stop_timer (0); + gettimeofday (&tv2, NULL); =20 - if (0.010 < atime) + t2 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); + + if (t2 > 1000) { - fprintf (stderr, "actual time < delay time\n"); + fprintf (stderr, "too long 2\n"); abort (); } =20 - acc_unmap_data (a); - - fini_timers (); - - free (a); - acc_free (d_a); - acc_shutdown (acc_device_nvidia); =20 return 0; diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-78.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/lib-78.c index 4f58fb2..0bed15f 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-78.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-78.c @@ -6,50 +6,20 @@ #include #include #include -#include "timer.h" +#include =20 int main (int argc, char **argv) { - CUdevice dev; CUfunction delay; CUmodule module; CUresult r; CUstream stream; - unsigned long *a, *d_a, dticks; - int nbytes; - float atime, dtime; - void *kargs[2]; - int clkrate; - int devnum, nprocs; + struct timeval tv1, tv2; + time_t t1, t2; =20 acc_init (acc_device_nvidia); =20 - devnum =3D acc_get_device_num (acc_device_nvidia); - - r =3D cuDeviceGet (&dev, devnum); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGet failed: %d\n", r); - abort (); - } - - r =3D - cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUN= T, - dev); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - - r =3D cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, de= v); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - r =3D cuModuleLoad (&module, "subr.ptx"); if (r !=3D CUDA_SUCCESS) { @@ -64,19 +34,25 @@ main (int argc, char **argv) abort (); } =20 - nbytes =3D nprocs * sizeof (unsigned long); + gettimeofday (&tv1, NULL); =20 - dtime =3D 200.0; + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, NULL, NULL, 0); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuLaunchKernel failed: %d\n", r); + abort (); + } =20 - dticks =3D (unsigned long) (dtime * clkrate); + r =3D cuCtxSynchronize (); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuCtxSynchronize failed: %d\n", r); + abort (); + } =20 - a =3D (unsigned long *) malloc (nbytes); - d_a =3D (unsigned long *) acc_malloc (nbytes); + gettimeofday (&tv2, NULL); =20 - acc_map_data (a, d_a, nbytes); - - kargs[0] =3D (void *) &d_a; - kargs[1] =3D (void *) &dticks; + t1 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); =20 stream =3D (CUstream) acc_get_cuda_stream (0); if (stream !=3D NULL) @@ -92,11 +68,9 @@ main (int argc, char **argv) if (!acc_set_cuda_stream (0, stream)) abort (); =20 - init_timers (1); + gettimeofday (&tv1, NULL); =20 - start_timer (0); - - r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0); + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, NULL, 0); if (r !=3D CUDA_SUCCESS) { fprintf (stderr, "cuLaunchKernel failed: %d\n", r); @@ -105,33 +79,30 @@ main (int argc, char **argv) =20 acc_wait_all (); =20 - atime =3D stop_timer (0); + gettimeofday (&tv2, NULL); =20 - if (atime < dtime) + t2 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); + + if (t2 > (t1 + (t1 * 0.10))) { - fprintf (stderr, "actual time < delay time\n"); + fprintf (stderr, "too long 1\n"); abort (); } =20 - start_timer (0); + gettimeofday (&tv1, NULL); =20 acc_wait_all (); =20 - atime =3D stop_timer (0); + gettimeofday (&tv2, NULL); =20 - if (0.010 < atime) + t2 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); + + if (t2 > 1000) { - fprintf (stderr, "actual time too long\n"); + fprintf (stderr, "too long 2\n"); abort (); } =20 - acc_unmap_data (a); - - fini_timers (); - - free (a); - acc_free (d_a); - acc_shutdown (acc_device_nvidia); =20 exit (0); diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-79.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/lib-79.c index ef3df13..5723588 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-79.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-79.c @@ -6,54 +6,22 @@ #include #include #include -#include "timer.h" +#include =20 int main (int argc, char **argv) { - CUdevice dev; CUfunction delay; CUmodule module; CUresult r; - int N; + const int N =3D 2; int i; CUstream stream; - unsigned long *a, *d_a, dticks; - int nbytes; - float atime, dtime, hitime, lotime; - void *kargs[2]; - int clkrate; - int devnum, nprocs; - - devnum =3D 2; + struct timeval tv1, tv2; + time_t t1, t2; =20 acc_init (acc_device_nvidia); =20 - devnum =3D acc_get_device_num (acc_device_nvidia); - - r =3D cuDeviceGet (&dev, devnum); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGet failed: %d\n", r); - abort (); - } - - r =3D - cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUN= T, - dev); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - - r =3D cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, de= v); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - r =3D cuModuleLoad (&module, "subr.ptx"); if (r !=3D CUDA_SUCCESS) { @@ -68,18 +36,25 @@ main (int argc, char **argv) abort (); } =20 - nbytes =3D nprocs * sizeof (unsigned long); + gettimeofday (&tv1, NULL); =20 - dtime =3D 200.0; + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, NULL, NULL, 0); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuLaunchKernel failed: %d\n", r); + abort (); + } =20 - dticks =3D (unsigned long) (dtime * clkrate); + r =3D cuCtxSynchronize (); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuCtxSynchronize failed: %d\n", r); + abort (); + } =20 - N =3D nprocs; + gettimeofday (&tv2, NULL); =20 - a =3D (unsigned long *) malloc (nbytes); - d_a =3D (unsigned long *) acc_malloc (nbytes); - - acc_map_data (a, d_a, nbytes); + t1 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); =20 r =3D cuStreamCreate (&stream, CU_STREAM_DEFAULT); if (r !=3D CUDA_SUCCESS) @@ -105,16 +80,11 @@ main (int argc, char **argv) if (!acc_set_cuda_stream (0, stream)) abort (); =20 - init_timers (1); - - kargs[0] =3D (void *) &d_a; - kargs[1] =3D (void *) &dticks; - - start_timer (0); + gettimeofday (&tv1, NULL); =20 for (i =3D 0; i < N; i++) { - r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0); + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, NULL, 0); if (r !=3D CUDA_SUCCESS) { fprintf (stderr, "cuLaunchKernel failed: %d\n", r); @@ -132,7 +102,7 @@ main (int argc, char **argv) =20 acc_wait (1); =20 - atime =3D stop_timer (0); + gettimeofday (&tv2, NULL); =20 if (acc_async_test (0) !=3D 1) abort (); @@ -140,25 +110,16 @@ main (int argc, char **argv) if (acc_async_test (1) !=3D 1) abort (); =20 - hitime =3D dtime * N; - hitime +=3D hitime * 0.02; + t2 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); =20 - lotime =3D dtime * N; - lotime -=3D lotime * 0.02; + t1 *=3D N; =20 - if (atime > hitime || atime < lotime) + if (((abs (t2 - t1) / t1) * 100.0) > 1.0) { - fprintf (stderr, "actual time < delay time\n"); + fprintf (stderr, "too long\n"); abort (); } =20 - acc_unmap_data (a); - - fini_timers (); - - free (a); - acc_free (d_a); - acc_shutdown (acc_device_nvidia); =20 exit (0); diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-80.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/lib-80.c index d521331..ec98119 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-80.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-80.c @@ -6,52 +6,22 @@ #include #include #include -#include "timer.h" +#include =20 int main (int argc, char **argv) { - CUdevice dev; CUfunction delay; CUmodule module; CUresult r; CUstream stream; - int N; + const int N =3D 2; int i; - unsigned long *a, *d_a, dticks; - int nbytes; - float atime, dtime; - void *kargs[2]; - int clkrate; - int devnum, nprocs; + struct timeval tv1, tv2; + time_t t1, t2; =20 acc_init (acc_device_nvidia); =20 - devnum =3D acc_get_device_num (acc_device_nvidia); - - r =3D cuDeviceGet (&dev, devnum); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGet failed: %d\n", r); - abort (); - } - - r =3D - cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUN= T, - dev); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - - r =3D cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, de= v); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - r =3D cuModuleLoad (&module, "subr.ptx"); if (r !=3D CUDA_SUCCESS) { @@ -66,38 +36,40 @@ main (int argc, char **argv) abort (); } =20 - nbytes =3D nprocs * sizeof (unsigned long); + gettimeofday (&tv1, NULL); =20 - dtime =3D 200.0; + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, NULL, NULL, 0); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuLaunchKernel failed: %d\n", r); + abort (); + } =20 - dticks =3D (unsigned long) (dtime * clkrate); + r =3D cuCtxSynchronize(); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuCtxSynchronize failed: %d\n", r); + abort (); + } =20 - N =3D nprocs; + gettimeofday (&tv2, NULL); =20 - a =3D (unsigned long *) malloc (nbytes); - d_a =3D (unsigned long *) acc_malloc (nbytes); - - acc_map_data (a, d_a, nbytes); + t1 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); =20 r =3D cuStreamCreate (&stream, CU_STREAM_DEFAULT); if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuStreamCreate failed: %d\n", r); - abort (); - } + { + fprintf (stderr, "cuStreamCreate failed: %d\n", r); + abort (); + } =20 acc_set_cuda_stream (1, stream); =20 - init_timers (1); - - kargs[0] =3D (void *) &d_a; - kargs[1] =3D (void *) &dticks; - - start_timer (0); + gettimeofday (&tv1, NULL); =20 for (i =3D 0; i < N; i++) { - r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0); + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, NULL, 0); if (r !=3D CUDA_SUCCESS) { fprintf (stderr, "cuLaunchKernel failed: %d\n", r); @@ -109,21 +81,18 @@ main (int argc, char **argv) =20 acc_wait (1); =20 - atime =3D stop_timer (0); + gettimeofday (&tv2, NULL); =20 - if (atime < dtime) + t2 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); + + t1 *=3D N; + + if (((abs (t2 - t1) / t1) * 100.0) > 1.0) { - fprintf (stderr, "actual time < delay time\n"); + fprintf (stderr, "too long\n"); abort (); } =20 - acc_unmap_data (a); - - fini_timers (); - - free (a); - acc_free (d_a); - acc_shutdown (acc_device_nvidia); =20 return 0; diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-81.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/lib-81.c index d5f18f0..77de9ba 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-81.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-81.c @@ -6,52 +6,22 @@ #include #include #include -#include "timer.h" +#include =20 int main (int argc, char **argv) { - CUdevice dev; CUfunction delay; CUmodule module; CUresult r; - int N; + const int N =3D 2; int i; CUstream *streams, stream; - unsigned long *a, *d_a, dticks; - int nbytes; - float atime, dtime; - void *kargs[2]; - int clkrate; - int devnum, nprocs; + struct timeval tv1, tv2; + time_t t1, t2; =20 acc_init (acc_device_nvidia); =20 - devnum =3D acc_get_device_num (acc_device_nvidia); - - r =3D cuDeviceGet (&dev, devnum); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGet failed: %d\n", r); - abort (); - } - - r =3D - cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUN= T, - dev); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - - r =3D cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, de= v); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - r =3D cuModuleLoad (&module, "subr.ptx"); if (r !=3D CUDA_SUCCESS) { @@ -66,18 +36,25 @@ main (int argc, char **argv) abort (); } =20 - nbytes =3D nprocs * sizeof (unsigned long); + gettimeofday (&tv1, NULL); =20 - dtime =3D 500.0; + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, NULL, NULL, 0); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuLaunchKernel failed: %d\n", r); + abort (); + } =20 - dticks =3D (unsigned long) (dtime * clkrate); + r =3D cuCtxSynchronize (); + if (r !=3D CUDA_SUCCESS) + { + fprintf (stderr, "cuCtxSynchronize failed: %d\n", r); + abort (); + } =20 - N =3D nprocs; + gettimeofday (&tv2, NULL); =20 - a =3D (unsigned long *) malloc (nbytes); - d_a =3D (unsigned long *) acc_malloc (nbytes); - - acc_map_data (a, d_a, nbytes); + t1 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); =20 streams =3D (CUstream *) malloc (N * sizeof (void *)); =20 @@ -98,11 +75,6 @@ main (int argc, char **argv) abort (); } =20 - init_timers (1); - - kargs[0] =3D (void *) &d_a; - kargs[1] =3D (void *) &dticks; - stream =3D (CUstream) acc_get_cuda_stream (N); if (stream !=3D NULL) abort (); @@ -117,11 +89,11 @@ main (int argc, char **argv) if (!acc_set_cuda_stream (N, stream)) abort (); =20 - start_timer (0); + gettimeofday (&tv1, NULL); =20 for (i =3D 0; i < N; i++) { - r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], kargs,= 0); + r =3D cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], NULL, = 0); if (r !=3D CUDA_SUCCESS) { fprintf (stderr, "cuLaunchKernel failed: %d\n", r); @@ -129,6 +101,10 @@ main (int argc, char **argv) } } =20 + gettimeofday (&tv2, NULL); + + t2 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); + acc_wait_all_async (N); =20 for (i =3D 0; i <=3D N; i++) @@ -145,15 +121,13 @@ main (int argc, char **argv) abort (); } =20 - atime =3D stop_timer (0); - - if (atime < dtime) + if ((t1 * N) < t2) { - fprintf (stderr, "actual time < delay time\n"); + fprintf (stderr, "too long 1\n"); abort (); } =20 - start_timer (0); + gettimeofday (&tv1, NULL); =20 stream =3D (CUstream) acc_get_cuda_stream (N + 1); if (stream !=3D NULL) @@ -173,35 +147,33 @@ main (int argc, char **argv) =20 acc_wait (N + 1); =20 - atime =3D stop_timer (0); + gettimeofday (&tv2, NULL); =20 - if (0.10 < atime) + t1 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); + + if (t1 > 1000) { - fprintf (stderr, "actual time too long\n"); + fprintf (stderr, "too long 2\n"); abort (); } =20 - start_timer (0); + gettimeofday (&tv1, NULL); =20 acc_wait_all_async (N); =20 acc_wait (N); =20 - atime =3D stop_timer (0); + gettimeofday (&tv2, NULL); =20 - if (0.10 < atime) + t1 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); + + if (t1 > 1000) { - fprintf (stderr, "actual time too long\n"); + fprintf (stderr, "too long 3\n"); abort (); } =20 - acc_unmap_data (a); - - fini_timers (); - free (streams); - free (a); - acc_free (d_a); =20 acc_shutdown (acc_device_nvidia); =20 diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-82.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/lib-82.c index be30a7f..ecf7488 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-82.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-82.c @@ -10,46 +10,18 @@ int main (int argc, char **argv) { - CUdevice dev; CUfunction delay2; CUmodule module; CUresult r; - int N; + const int N =3D 32; int i; CUstream *streams; - unsigned long **a, **d_a, *tid, ticks; + unsigned long **a, **d_a, *tid; int nbytes; - void *kargs[3]; - int clkrate; - int devnum, nprocs; + void *kargs[2]; =20 acc_init (acc_device_nvidia); =20 - devnum =3D acc_get_device_num (acc_device_nvidia); - - r =3D cuDeviceGet (&dev, devnum); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGet failed: %d\n", r); - abort (); - } - - r =3D - cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUN= T, - dev); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - - r =3D cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, de= v); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r); - abort (); - } - r =3D cuModuleLoad (&module, "subr.ptx"); if (r !=3D CUDA_SUCCESS) { @@ -66,10 +38,6 @@ main (int argc, char **argv) =20 nbytes =3D sizeof (int); =20 - ticks =3D (unsigned long) (200.0 * clkrate); - - N =3D nprocs; - streams =3D (CUstream *) malloc (N * sizeof (void *)); =20 a =3D (unsigned long **) malloc (N * sizeof (unsigned long *)); @@ -103,8 +71,7 @@ main (int argc, char **argv) for (i =3D 0; i < N; i++) { kargs[0] =3D (void *) &d_a[i]; - kargs[1] =3D (void *) &ticks; - kargs[2] =3D (void *) &tid[i]; + kargs[1] =3D (void *) &tid[i]; =20 r =3D cuLaunchKernel (delay2, 1, 1, 1, 1, 1, 1, 0, streams[i], kargs= , 0); if (r !=3D CUDA_SUCCESS) @@ -112,8 +79,6 @@ main (int argc, char **argv) fprintf (stderr, "cuLaunchKernel failed: %d\n", r); abort (); } - - ticks =3D (unsigned long) (50.0 * clkrate); } =20 acc_wait_all_async (0); diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-83.c libgomp/tes= tsuite/libgomp.oacc-c-c++-common/lib-83.c index 1c2e52b..51b7ee7 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-83.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-83.c @@ -5,21 +5,19 @@ #include #include #include -#include "timer.h" +#include +#include =20 int main (int argc, char **argv) { - float atime; CUstream stream; CUresult r; + struct timeval tv1, tv2; + time_t t1; =20 acc_init (acc_device_nvidia); =20 - (void) acc_get_device_num (acc_device_nvidia); - - init_timers (1); - stream =3D (CUstream) acc_get_cuda_stream (0); if (stream !=3D NULL) abort (); @@ -34,22 +32,22 @@ main (int argc, char **argv) if (!acc_set_cuda_stream (0, stream)) abort (); =20 - start_timer (0); + gettimeofday (&tv1, NULL); =20 acc_wait_all_async (0); =20 acc_wait (0); =20 - atime =3D stop_timer (0); + gettimeofday (&tv2, NULL); =20 - if (0.010 < atime) + t1 =3D ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_use= c); + + if (t1 > 1000) { - fprintf (stderr, "actual time too long\n"); + fprintf (stderr, "too long\n"); abort (); } =20 - fini_timers (); - acc_shutdown (acc_device_nvidia); =20 exit (0); diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c libgomp= /testsuite/libgomp.oacc-c-c++-common/parallel-1.c index fd9df33..9a411fe 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c +++ libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c @@ -2,205 +2,5 @@ =20 #include =20 -int i; - -int main(void) -{ - int j, v; - - i =3D -1; - j =3D -2; - v =3D 0; -#pragma acc parallel /* copyout */ present_or_copyout (v) copyin (i, j) - { - if (i !=3D -1 || j !=3D -2) - abort (); - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } -#if ACC_MEM_SHARED - if (v !=3D 1 || i !=3D 2 || j !=3D 1) - abort (); -#else - if (v !=3D 1 || i !=3D -1 || j !=3D -2) - abort (); -#endif - - i =3D -1; - j =3D -2; - v =3D 0; -#pragma acc parallel /* copyout */ present_or_copyout (v) copyout (i, j) - { - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } - if (v !=3D 1 || i !=3D 2 || j !=3D 1) - abort (); - - i =3D -1; - j =3D -2; - v =3D 0; -#pragma acc parallel /* copyout */ present_or_copyout (v) copy (i, j) - { - if (i !=3D -1 || j !=3D -2) - abort (); - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } - if (v !=3D 1 || i !=3D 2 || j !=3D 1) - abort (); - - i =3D -1; - j =3D -2; - v =3D 0; -#pragma acc parallel /* copyout */ present_or_copyout (v) create (i, j) - { - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } -#if ACC_MEM_SHARED - if (v !=3D 1 || i !=3D 2 || j !=3D 1) - abort (); -#else - if (v !=3D 1 || i !=3D -1 || j !=3D -2) - abort (); -#endif - - i =3D -1; - j =3D -2; - v =3D 0; -#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copyi= n (i, j) - { - if (i !=3D -1 || j !=3D -2) - abort (); - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } - if (v !=3D 1) - abort (); -#if ACC_MEM_SHARED - if (v !=3D 1 || i !=3D 2 || j !=3D 1) - abort (); -#else - if (v !=3D 1 || i !=3D -1 || j !=3D -2) - abort (); -#endif - - i =3D -1; - j =3D -2; - v =3D 0; -#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copyo= ut (i, j) - { - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } - if (v !=3D 1 || i !=3D 2 || j !=3D 1) - abort (); - - i =3D -1; - j =3D -2; - v =3D 0; -#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copy = (i, j) - { - if (i !=3D -1 || j !=3D -2) - abort (); - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } - if (v !=3D 1 || i !=3D 2 || j !=3D 1) - abort (); - - i =3D -1; - j =3D -2; - v =3D 0; -#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_creat= e (i, j) - { - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } - if (v !=3D 1) - abort (); -#if ACC_MEM_SHARED - if (v !=3D 1 || i !=3D 2 || j !=3D 1) - abort (); -#else - if (v !=3D 1 || i !=3D -1 || j !=3D -2) - abort (); -#endif - - i =3D -1; - j =3D -2; - v =3D 0; - -#pragma acc data copyin (i, j) - { -#pragma acc parallel /* copyout */ present_or_copyout (v) present (i, j) - { - if (i !=3D -1 || j !=3D -2) - abort (); - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } - } -#if ACC_MEM_SHARED - if (v !=3D 1 || i !=3D 2 || j !=3D 1) - abort (); -#else - if (v !=3D 1 || i !=3D -1 || j !=3D -2) - abort (); -#endif - - i =3D -1; - j =3D -2; - v =3D 0; - -#pragma acc data copyin(i, j) - { -#pragma acc parallel /* copyout */ present_or_copyout (v) - { - if (i !=3D -1 || j !=3D -2) - abort (); - i =3D 2; - j =3D 1; - if (i !=3D 2 || j !=3D 1) - abort (); - v =3D 1; - } - } -#if ACC_MEM_SHARED - if (v !=3D 1 || i !=3D 2 || j !=3D 1) - abort (); -#else - if (v !=3D 1 || i !=3D -1 || j !=3D -2) - abort (); -#endif - - return 0; -} +#define EXEC_DIRECTIVE parallel +#include "data-clauses.h" diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/routine-1.c libgomp/= testsuite/libgomp.oacc-c-c++-common/routine-1.c new file mode 100644 index 0000000..a27d076 --- /dev/null +++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-1.c @@ -0,0 +1,40 @@ +/* FIXME: remove -fno-var-tracking and -fno-exceptions from dg-options. */ + +/* { dg-do run } */ +/* { dg-options "-fno-inline -fno-var-tracking -fno-exceptions" } */ + +#include +#include + +#pragma acc routine +int +fact (int n) +{ + if (n =3D=3D 0 || n =3D=3D 1) + return 1; + + return n * fact (n - 1); +} + +int +main() +{ + int *a, i, n =3D 10; + + a =3D (int *)malloc (sizeof (int) * n); + +#pragma acc parallel copy (a[0:n]) vector_length (5) + { +#pragma acc loop + for (i =3D 0; i < n; i++) + a[i] =3D fact (i); + } + + for (i =3D 0; i < n; i++) + if (a[i] !=3D fact (i)) + abort (); + + free (a); + + return 0; +} diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/routine-2.c libgomp/= testsuite/libgomp.oacc-c-c++-common/routine-2.c new file mode 100644 index 0000000..8ec4d8b --- /dev/null +++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-2.c @@ -0,0 +1,41 @@ +/* FIXME: remove -fno-var-tracking and -fno-exceptions from dg-options. */ + +/* { dg-do run } */ +/* { dg-options "-fno-inline -fno-var-tracking -fno-exceptions" } */ + +#include +#include + +#pragma acc routine (fact) + + +int fact (int n) +{ + if (n =3D=3D 0 || n =3D=3D 1) + return 1; + + return n * fact (n - 1); +} + +int +main() +{ + int *a, i, n =3D 10; + + a =3D (int *)malloc (sizeof (int) * n); + +#pragma acc parallel copy (a[0:n]) vector_length (5) + { +#pragma acc loop + for (i =3D 0; i < n; i++) + a[i] =3D fact (i); + } + + for (i =3D 0; i < n; i++) + if (a[i] !=3D fact (i)) + abort (); + + free (a); + + return 0; +} diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/subr.h libgomp/tests= uite/libgomp.oacc-c-c++-common/subr.h index 9db236c..0c9096f 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/subr.h +++ libgomp/testsuite/libgomp.oacc-c-c++-common/subr.h @@ -1,46 +1,24 @@ =20 -#if ACC_DEVICE_TYPE_nvidia - #pragma acc routine nohost -static int clock (void) -{ - int thetime; - - asm __volatile__ ("mov.u32 %0, %%clock;" : "=3Dr"(thetime)); - - return thetime; -} - -#endif - void -delay (unsigned long *d_o, unsigned long delay) +delay () { - int start, ticks; + int i, sum; + const int N =3D 500000; =20 - start =3D clock (); - - ticks =3D 0; - - while (ticks < delay) - ticks =3D clock () - start; - - return; + for (i =3D 0; i < N; i++) + sum =3D sum + 1; } =20 +#pragma acc routine nohost void -delay2 (unsigned long *d_o, unsigned long delay, unsigned long tid) +delay2 (unsigned long *d_o, unsigned long tid) { - int start, ticks; - - start =3D clock (); - - ticks =3D 0; + int i, sum; + const int N =3D 500000; =20 - while (ticks < delay) - ticks =3D clock () - start; + for (i =3D 0; i < N; i++) + sum =3D sum + 1; =20 d_o[0] =3D tid; - - return; } diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/subr.ptx libgomp/tes= tsuite/libgomp.oacc-c-c++-common/subr.ptx index 6f748fc..88b63bf 100644 --- libgomp/testsuite/libgomp.oacc-c-c++-common/subr.ptx +++ libgomp/testsuite/libgomp.oacc-c-c++-common/subr.ptx @@ -1,148 +1,90 @@ -// BEGIN PREAMBLE - .version 3.1 - .target sm_30 + .version 3.1 + .target sm_30 .address_size 64 -// END PREAMBLE =20 -// BEGIN FUNCTION DEF: clock -.func (.param.u32 %out_retval)clock -{ -.reg.u32 %retval; - .reg.u64 %hr10; - .reg.u32 %r22; - .reg.u32 %r23; - .reg.u32 %r24; - .local.align 8 .b8 %frame[8]; - // #APP=20 -// 7 "subr.c" 1 - mov.u32 %r24, %clock; -// 0 "" 2 - // #NO_APP=20 - st.local.u32 [%frame], %r24; - ld.local.u32 %r22, [%frame]; - mov.u32 %r23, %r22; - mov.u32 %retval, %r23; - st.param.u32 [%out_retval], %retval; - ret; - } -// END FUNCTION DEF -// BEGIN GLOBAL FUNCTION DEF: delay -.visible .entry delay(.param.u64 %in_ar1, .param.u64 %in_ar2) -{ - .reg.u64 %ar1; - .reg.u64 %ar2; - .reg.u64 %hr10; - .reg.u64 %r22; - .reg.u32 %r23; - .reg.u64 %r24; - .reg.u64 %r25; - .reg.u32 %r26; - .reg.u32 %r27; - .reg.u32 %r28; - .reg.u32 %r29; - .reg.u32 %r30; - .reg.u64 %r31; - .reg.pred %r32; - .local.align 8 .b8 %frame[24]; - ld.param.u64 %ar1, [%in_ar1]; - ld.param.u64 %ar2, [%in_ar2]; - mov.u64 %r24, %ar1; - st.u64 [%frame+8], %r24; - mov.u64 %r25, %ar2; - st.local.u64 [%frame+16], %r25; + .visible .entry delay { - .param.u32 %retval_in; - { - call (%retval_in), clock; - } - ld.param.u32 %r26, [%retval_in]; -} - st.local.u32 [%frame+4], %r26; - mov.u32 %r27, 0; - st.local.u32 [%frame], %r27; - bra $L4; -$L5: - { - .param.u32 %retval_in; - { - call (%retval_in), clock; - } - ld.param.u32 %r28, [%retval_in]; -} - mov.u32 %r23, %r28; - ld.local.u32 %r30, [%frame+4]; - sub.u32 %r29, %r23, %r30; - st.local.u32 [%frame], %r29; -$L4: - ld.local.s32 %r22, [%frame]; - ld.local.u64 %r31, [%frame+16]; - setp.lo.u64 %r32,%r22,%r31; - @%r32 bra $L5; + .reg .u64 %hr10; + .reg .u32 %r22; + .reg .u32 %r23; + .reg .u32 %r24; + .reg .u32 %r25; + .reg .u32 %r26; + .reg .u32 %r27; + .reg .u32 %r28; + .reg .u32 %r29; + .reg .pred %r30; + .reg .u64 %frame; + .local .align 8 .b8 %farray[16]; + cvta.local.u64 %frame,%farray; + mov.u32 %r22,500000; + st.u32 [%frame+8],%r22; + mov.u32 %r23,0; + st.u32 [%frame],%r23; + bra $L2; + $L3: + ld.u32 %r25,[%frame+4]; + add.u32 %r24,%r25,1; + st.u32 [%frame+4],%r24; + ld.u32 %r27,[%frame]; + add.u32 %r26,%r27,1; + st.u32 [%frame],%r26; + $L2: + ld.u32 %r28,[%frame]; + ld.u32 %r29,[%frame+8]; + setp.lt.s32 %r30,%r28,%r29; + @%r30=20 + bra $L3; ret; } -// END FUNCTION DEF -// BEGIN GLOBAL FUNCTION DEF: delay2 -.visible .entry delay2(.param.u64 %in_ar1, .param.u64 %in_ar2, .param.u64 = %in_ar3) -{ - .reg.u64 %ar1; - .reg.u64 %ar2; - .reg.u64 %ar3; - .reg.u64 %hr10; - .reg.u64 %r22; - .reg.u32 %r23; - .reg.u64 %r24; - .reg.u64 %r25; - .reg.u64 %r26; - .reg.u32 %r27; - .reg.u32 %r28; - .reg.u32 %r29; - .reg.u32 %r30; - .reg.u32 %r31; - .reg.u64 %r32; - .reg.pred %r33; - .reg.u64 %r34; - .reg.u64 %r35; - .local.align 8 .b8 %frame[32]; - ld.param.u64 %ar1, [%in_ar1]; - ld.param.u64 %ar2, [%in_ar2]; - ld.param.u64 %ar3, [%in_ar3]; - mov.u64 %r24, %ar1; - st.local.u64 [%frame+8], %r24; - mov.u64 %r25, %ar2; - st.local.u64 [%frame+16], %r25; - mov.u64 %r26, %ar3; - st.local.u64 [%frame+24], %r26; - { - .param.u32 %retval_in; - { - call (%retval_in), clock; - } - ld.param.u32 %r27, [%retval_in]; -} - st.local.u32 [%frame+4], %r27; - mov.u32 %r28, 0; - st.local.u32 [%frame], %r28; - bra $L8; -$L9: - { - .param.u32 %retval_in; + + .visible .entry delay2 (.param .u64 %in_ar1, .param .u64 %in_ar2) { - call (%retval_in), clock; - } - ld.param.u32 %r29, [%retval_in]; -} - mov.u32 %r23, %r29; - ld.local.u32 %r31, [%frame+4]; - sub.u32 %r30, %r23, %r31; - st.local.u32 [%frame], %r30; -$L8: - ld.local.s32 %r22, [%frame]; - ld.local.u64 %r32, [%frame+16]; - setp.lo.u64 %r33,%r22,%r32; - @%r33 bra $L9; - ld.local.u64 %r34, [%frame+8]; - ld.local.u64 %r35, [%frame+24]; - st.u64 [%r34], %r35; + .reg .u64 %ar1; + .reg .u64 %ar2; + .reg .u64 %hr10; + .reg .u64 %r22; + .reg .u64 %r23; + .reg .u32 %r24; + .reg .u32 %r25; + .reg .u32 %r26; + .reg .u32 %r27; + .reg .u32 %r28; + .reg .u32 %r29; + .reg .u32 %r30; + .reg .u32 %r31; + .reg .pred %r32; + .reg .u64 %r33; + .reg .u64 %r34; + .reg .u64 %frame; + .local .align 8 .b8 %farray[32]; + cvta.local.u64 %frame,%farray; + ld.param.u64 %ar1,[%in_ar1]; + ld.param.u64 %ar2,[%in_ar2]; + mov.u64 %r22,%ar1; + st.u64 [%frame+16],%r22; + mov.u64 %r23,%ar2; + st.u64 [%frame+24],%r23; + mov.u32 %r24,500000; + st.u32 [%frame+8],%r24; + mov.u32 %r25,0; + st.u32 [%frame],%r25; + bra $L5; + $L6: + ld.u32 %r27,[%frame+4]; + add.u32 %r26,%r27,1; + st.u32 [%frame+4],%r26; + ld.u32 %r29,[%frame]; + add.u32 %r28,%r29,1; + st.u32 [%frame],%r28; + $L5: + ld.u32 %r30,[%frame]; + ld.u32 %r31,[%frame+8]; + setp.lt.s32 %r32,%r30,%r31; + @%r32=20 + bra $L6; + ld.u64 %r33,[%frame+16]; + ld.u64 %r34,[%frame+24]; + st.u64 [%r33],%r34; ret; } -// END FUNCTION DEF diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/timer.h libgomp/test= suite/libgomp.oacc-c-c++-common/timer.h deleted file mode 100644 index 53749da..0000000 --- libgomp/testsuite/libgomp.oacc-c-c++-common/timer.h +++ /dev/null @@ -1,103 +0,0 @@ - -#include -#include - -static int _Tnum_timers; -static CUevent *_Tstart_events, *_Tstop_events; -static CUstream _Tstream; - -void -init_timers (int ntimers) -{ - int i; - CUresult r; - - _Tnum_timers =3D ntimers; - - _Tstart_events =3D (CUevent *) malloc (_Tnum_timers * sizeof (CUevent)); - _Tstop_events =3D (CUevent *) malloc (_Tnum_timers * sizeof (CUevent)); - - r =3D cuStreamCreate (&_Tstream, CU_STREAM_DEFAULT); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuStreamCreate failed: %d\n", r); - abort (); - } - - for (i =3D 0; i < _Tnum_timers; i++) - { - r =3D cuEventCreate (&_Tstart_events[i], CU_EVENT_DEFAULT); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuEventCreate failed: %d\n", r); - abort (); - } - - r =3D cuEventCreate (&_Tstop_events[i], CU_EVENT_DEFAULT); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuEventCreate failed: %d\n", r); - abort (); - } - } -} - -void -fini_timers (void) -{ - int i; - - for (i =3D 0; i < _Tnum_timers; i++) - { - cuEventDestroy (_Tstart_events[i]); - cuEventDestroy (_Tstop_events[i]); - } - - cuStreamDestroy (_Tstream); - - free (_Tstart_events); - free (_Tstop_events); -} - -void -start_timer (int timer) -{ - CUresult r; - - r =3D cuEventRecord (_Tstart_events[timer], _Tstream); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuEventRecord failed: %d\n", r); - abort (); - } -} - -float -stop_timer (int timer) -{ - CUresult r; - float etime; - - r =3D cuEventRecord (_Tstop_events[timer], _Tstream); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuEventRecord failed: %d\n", r); - abort (); - } - - r =3D cuEventSynchronize (_Tstop_events[timer]); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuEventSynchronize failed: %d\n", r); - abort (); - } - - r =3D cuEventElapsedTime (&etime, _Tstart_events[timer], _Tstop_events[t= imer]); - if (r !=3D CUDA_SUCCESS) - { - fprintf (stderr, "cuEventElapsedTime failed: %d\n", r); - abort (); - } - - return etime; -} diff --git libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90 libg= omp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90 new file mode 100644 index 0000000..27c5c9e --- /dev/null +++ libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90 @@ -0,0 +1,784 @@ +! { dg-do run } + +program main + integer igot, iexp, itmp + real fgot, fexp, ftmp + logical lgot, lexp, ltmp + integer, parameter :: N =3D 32 + + igot =3D 0 + iexp =3D N * 2 + + !$acc parallel copy (igot, itmp) + do i =3D 1, N + !$acc atomic capture + itmp =3D igot + igot =3D i + i + !$acc end atomic + end do + !$acc end parallel + + if (igot /=3D iexp) call abort + if (itmp /=3D iexp - 2) call abort + + fgot =3D 1234.0 + fexp =3D 1266.0 + + !$acc parallel loop copy (fgot, ftmp) + do i =3D 1, N + !$acc atomic capture + ftmp =3D fgot + fgot =3D fgot + 1.0 + !$acc end atomic + end do + !$acc end parallel loop + + if (ftmp /=3D fexp - 1.0) call abort + if (fgot /=3D fexp) call abort + + fgot =3D 1.0 + fexp =3D 2.0**32 + + !$acc parallel loop copy (fgot, ftmp) + do i =3D 1, N + !$acc atomic capture + ftmp =3D fgot + fgot =3D fgot * 2.0 + !$acc end atomic + end do + !$acc end parallel loop + + if (ftmp /=3D fexp / 2.0) call abort + if (fgot /=3D fexp) call abort + + fgot =3D 32.0 + fexp =3D fgot - N + + !$acc parallel loop copy (fgot, ftmp) + do i =3D 1, N + !$acc atomic capture + ftmp =3D fgot + fgot =3D fgot - 1.0 + !$acc end atomic + end do + !$acc end parallel loop + + if (ftmp /=3D fexp + 1.0) call abort + if (fgot /=3D fexp) call abort + + fgot =3D 2**32.0 + fexp =3D 1.0 + + !$acc parallel loop copy (fgot, ftmp) + do i =3D 1, N + !$acc atomic capture + ftmp =3D fgot + fgot =3D fgot / 2.0 + !$acc end atomic + end do + !$acc end parallel loop + + if (ftmp /=3D fgot * 2.0) call abort + if (fgot /=3D fexp) call abort + + lgot =3D .TRUE. + lexp =3D .FALSE. + + !$acc parallel copy (lgot, ltmp) + !$acc atomic capture + ltmp =3D lgot + lgot =3D lgot .and. .FALSE. + !$acc end atomic + !$acc end parallel + + if (ltmp .neqv. .not. lexp) call abort + if (lgot .neqv. lexp) call abort + + lgot =3D .FALSE. + lexp =3D .FALSE. + + !$acc parallel copy (lgot, ltmp) + !$acc atomic capture + ltmp =3D lgot + lgot =3D lgot .or. .FALSE. + !$acc end atomic + !$acc end parallel + + if (ltmp .neqv. lexp) call abort + if (lgot .neqv. lexp) call abort + + lgot =3D .FALSE. + lexp =3D .FALSE. + + !$acc parallel copy (lgot, ltmp) + !$acc atomic capture + ltmp =3D lgot + lgot =3D lgot .eqv. .TRUE. + !$acc end atomic + !$acc end parallel + + if (ltmp .neqv. lexp) call abort + if (lgot .neqv. lexp) call abort + + lgot =3D .FALSE. + lexp =3D .TRUE. + + !$acc parallel copy (lgot, ltmp) + !$acc atomic capture + ltmp =3D lgot + lgot =3D lgot .neqv. .TRUE. + !$acc end atomic + !$acc end parallel + + if (ltmp .neqv. .not. lexp) call abort + if (lgot .neqv. lexp) call abort + + fgot =3D 1234.0 + fexp =3D 1266.0 + + !$acc parallel loop copy (fgot, ftmp) + do i =3D 1, N + !$acc atomic capture + ftmp =3D fgot + fgot =3D 1.0 + fgot + !$acc end atomic + end do + !$acc end parallel loop + + if (ftmp /=3D fexp - 1.0) call abort=20 + if (fgot /=3D fexp) call abort + + fgot =3D 1.0 + fexp =3D 2.0**32 + + !$acc parallel loop copy (fgot, ftmp) + do i =3D 1, N + !$acc atomic capture + ftmp =3D fgot + fgot =3D 2.0 * fgot + !$acc end atomic + end do + !$acc end parallel loop + + if (ftmp /=3D fexp / 2.0) call abort + if (fgot /=3D fexp) call abort + + fgot =3D 32.0 + fexp =3D 32.0 + + !$acc parallel loop copy (fgot, ftmp) + do i =3D 1, N + !$acc atomic capture + ftmp =3D fgot + fgot =3D 2.0 - fgot + !$acc end atomic + end do + !$acc end parallel loop + + if (ftmp /=3D 2.0 - fexp) call abort + if (fgot /=3D fexp) call abort + + fgot =3D 2.0**16 + fexp =3D 2.0**16 + + !$acc parallel loop copy (fgot, ftmp) + do i =3D 1, N + !$acc atomic capture + ftmp =3D fgot + fgot =3D 2.0 / fgot + !$acc end atomic + end do + !$acc end parallel loop + + if (ftmp /=3D 2.0 / fexp) call abort + if (fgot /=3D fexp) call abort + + lgot =3D .TRUE. + lexp =3D .FALSE. + + !$acc parallel copy (lgot, ltmp) + !$acc atomic capture + ltmp =3D lgot + lgot =3D .FALSE. .and. lgot + !$acc end atomic + !$acc end parallel + + if (ltmp .neqv. .not. lexp) call abort + if (lgot .neqv. lexp) call abort + + lgot =3D .FALSE. + lexp =3D .FALSE. + + !$acc parallel copy (lgot, ltmp) + !$acc atomic capture + ltmp =3D lgot + lgot =3D .FALSE. .or. lgot + !$acc end atomic + !$acc end parallel + + if (ltmp .neqv. lexp) call abort + if (lgot .neqv. lexp) call abort + + lgot =3D .FALSE. + lexp =3D .FALSE. + + !$acc parallel copy (lgot, ltmp) + !$acc atomic capture + ltmp =3D lgot + lgot =3D .TRUE. .eqv. lgot + !$acc end atomic + !$acc end parallel + + if (ltmp .neqv. lexp) call abort + if (lgot .neqv. lexp) call abort + + lgot =3D .FALSE. + lexp =3D .TRUE. + + !$acc parallel copy (lgot, ltmp) + !$acc atomic capture + ltmp =3D lgot + lgot =3D .TRUE. .neqv. lgot + !$acc end atomic + !$acc end parallel + + if (ltmp .neqv. .not. lexp) call abort + if (lgot .neqv. lexp) call abort + + igot =3D 1 + iexp =3D N + + !$acc parallel loop copy (igot, itmp) + do i =3D 1, N + !$acc atomic capture + itmp =3D igot + igot =3D max (igot, i) + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D iexp - 1) call abort + if (igot /=3D iexp) call abort + + igot =3D N + iexp =3D 1 + + !$acc parallel loop copy (igot, itmp) + do i =3D 1, N + !$acc atomic capture + itmp =3D igot + igot =3D min (igot, i) + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D iexp) call abort + if (igot /=3D iexp) call abort + + igot =3D -1 + iexp =3D 0 + + !$acc parallel loop copy (igot, itmp) + do i =3D 0, N - 1 + iexpr =3D ibclr (-2, i) + !$acc atomic capture + itmp =3D igot + igot =3D iand (igot, iexpr) + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D ibset (iexp, N - 1)) call abort + if (igot /=3D iexp) call abort + + igot =3D 0 + iexp =3D -1=20 + + !$acc parallel loop copy (igot, itmp) + do i =3D 0, N - 1 + iexpr =3D lshift (1, i) + !$acc atomic capture + itmp =3D igot + igot =3D ior (igot, iexpr) + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D ieor (iexp, lshift (1, N - 1))) call abort + if (igot /=3D iexp) call abort + + igot =3D -1 + iexp =3D 0=20 + + !$acc parallel loop copy (igot, itmp) + do i =3D 0, N - 1 + iexpr =3D lshift (1, i) + !$acc atomic capture + itmp =3D igot + igot =3D ieor (igot, iexpr) + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D ior (iexp, lshift (1, N - 1))) call abort + if (igot /=3D iexp) call abort + + igot =3D 1 + iexp =3D N + + !$acc parallel loop copy (igot, itmp) + do i =3D 1, N + !$acc atomic capture + itmp =3D igot + igot =3D max (i, igot) + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D iexp - 1) call abort + if (igot /=3D iexp) call abort + + igot =3D N + iexp =3D 1 + + !$acc parallel loop copy (igot, itmp) + do i =3D 1, N + !$acc atomic capture + itmp =3D igot + igot =3D min (i, igot) + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D iexp) call abort + if (igot /=3D iexp) call abort + + igot =3D -1 + iexp =3D 0 + + !$acc parallel loop copy (igot, itmp) + do i =3D 0, N - 1 + iexpr =3D ibclr (-2, i) + !$acc atomic capture + itmp =3D igot + igot =3D iand (iexpr, igot) + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D ibset (iexp, N - 1)) call abort + if (igot /=3D iexp) call abort + + igot =3D 0 + iexp =3D -1=20 + !! + !$acc parallel loop copy (igot, itmp) + do i =3D 0, N - 1 + iexpr =3D lshift (1, i) + !$acc atomic capture + itmp =3D igot + igot =3D ior (iexpr, igot) + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D ieor (iexp, lshift (1, N - 1))) call abort + if (igot /=3D iexp) call abort + + igot =3D -1 + iexp =3D 0=20 + + !$acc parallel loop copy (igot, itmp) + do i =3D 0, N - 1 + iexpr =3D lshift (1, i) + !$acc atomic capture + itmp =3D igot + igot =3D ieor (iexpr, igot) + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D ior (iexp, lshift (1, N - 1))) call abort + if (igot /=3D iexp) call abort + + fgot =3D 1234.0 + fexp =3D 1266.0 + + !$acc parallel loop copy (fgot, ftmp) + do i =3D 1, N + !$acc atomic capture + fgot =3D fgot + 1.0 + ftmp =3D fgot + !$acc end atomic + end do + !$acc end parallel loop + + if (ftmp /=3D fexp) call abort + if (fgot /=3D fexp) call abort + + fgot =3D 1.0 + fexp =3D 2.0**32 + + !$acc parallel loop copy (fgot, ftmp) + do i =3D 1, N + !$acc atomic capture + fgot =3D fgot * 2.0 + ftmp =3D fgot + !$acc end atomic + end do + !$acc end parallel loop + + if (ftmp /=3D fexp) call abort + if (fgot /=3D fexp) call abort + + fgot =3D 32.0 + fexp =3D fgot - N + + !$acc parallel loop copy (fgot, ftmp) + do i =3D 1, N + !$acc atomic capture + fgot =3D fgot - 1.0 + ftmp =3D fgot + !$acc end atomic + end do + !$acc end parallel loop + + if (ftmp /=3D fexp) call abort + if (fgot /=3D fexp) call abort + + fgot =3D 2**32.0 + fexp =3D 1.0 + + !$acc parallel loop copy (fgot, ftmp) + do i =3D 1, N + !$acc atomic capture + fgot =3D fgot / 2.0 + ftmp =3D fgot + !$acc end atomic + end do + !$acc end parallel loop + + if (ftmp /=3D fexp) call abort + if (fgot /=3D fexp) call abort + + lgot =3D .TRUE. + lexp =3D .FALSE. + + !$acc parallel copy (lgot, ltmp) + !$acc atomic capture + lgot =3D lgot .and. .FALSE. + ltmp =3D lgot + !$acc end atomic + !$acc end parallel + + if (ltmp .neqv. lexp) call abort + if (lgot .neqv. lexp) call abort + + lgot =3D .FALSE. + lexp =3D .FALSE. + + !$acc parallel copy (lgot, ltmp) + !$acc atomic capture + lgot =3D lgot .or. .FALSE. + ltmp =3D lgot + !$acc end atomic + !$acc end parallel + + if (ltmp .neqv. lexp) call abort + if (lgot .neqv. lexp) call abort + + lgot =3D .FALSE. + lexp =3D .FALSE. + + !$acc parallel copy (lgot, ltmp) + !$acc atomic capture + lgot =3D lgot .eqv. .TRUE. + ltmp =3D lgot + !$acc end atomic + !$acc end parallel + + if (ltmp .neqv. lexp) call abort + if (lgot .neqv. lexp) call abort + + lgot =3D .FALSE. + lexp =3D .TRUE. + + !$acc parallel copy (lgot, ltmp) + !$acc atomic capture + lgot =3D lgot .neqv. .TRUE. + ltmp =3D lgot + !$acc end atomic + !$acc end parallel + + if (ltmp .neqv. lexp) call abort + if (lgot .neqv. lexp) call abort + + fgot =3D 1234.0 + fexp =3D 1266.0 + + !$acc parallel loop copy (fgot, ftmp) + do i =3D 1, N + !$acc atomic capture + fgot =3D 1.0 + fgot + ftmp =3D fgot + !$acc end atomic + end do + !$acc end parallel loop + + if (ftmp /=3D fexp) call abort + if (fgot /=3D fexp) call abort + + fgot =3D 1.0 + fexp =3D 2.0**32 + + !$acc parallel loop copy (fgot, ftmp) + do i =3D 1, N + !$acc atomic capture + fgot =3D 2.0 * fgot + ftmp =3D fgot + !$acc end atomic + end do + !$acc end parallel loop + + if (ftmp /=3D fexp) call abort + if (fgot /=3D fexp) call abort + + fgot =3D 32.0 + fexp =3D 32.0 + + !$acc parallel loop copy (fgot, ftmp) + do i =3D 1, N + !$acc atomic capture + fgot =3D 2.0 - fgot + ftmp =3D fgot + !$acc end atomic + end do + !$acc end parallel loop + + if (ftmp /=3D fexp) call abort + if (fgot /=3D fexp) call abort + + fgot =3D 2.0**16 + fexp =3D 2.0**16 + + !$acc parallel loop copy (fgot, ftmp) + do i =3D 1, N + !$acc atomic capture + fgot =3D 2.0 / fgot + ftmp =3D fgot + !$acc end atomic + end do + !$acc end parallel loop + + if (ftmp /=3D fexp) call abort + if (fgot /=3D fexp) call abort + + lgot =3D .TRUE. + lexp =3D .FALSE. + + !$acc parallel copy (lgot, ltmp) + !$acc atomic capture + lgot =3D .FALSE. .and. lgot + ltmp =3D lgot + !$acc end atomic + !$acc end parallel + + if (ltmp .neqv. lexp) call abort + if (lgot .neqv. lexp) call abort + + lgot =3D .FALSE. + lexp =3D .FALSE. + + !$acc parallel copy (lgot, ltmp) + !$acc atomic capture + lgot =3D .FALSE. .or. lgot + ltmp =3D lgot + !$acc end atomic + !$acc end parallel + + if (ltmp .neqv. lexp) call abort + if (lgot .neqv. lexp) call abort + + lgot =3D .FALSE. + lexp =3D .FALSE. + + !$acc parallel copy (lgot, ltmp) + !$acc atomic capture + lgot =3D .TRUE. .eqv. lgot + ltmp =3D lgot + !$acc end atomic + !$acc end parallel + + if (ltmp .neqv. lexp) call abort + if (lgot .neqv. lexp) call abort + + lgot =3D .FALSE. + lexp =3D .TRUE. + + !$acc parallel copy (lgot, ltmp) + !$acc atomic capture + lgot =3D .TRUE. .neqv. lgot + ltmp =3D lgot + !$acc end atomic + !$acc end parallel + + if (ltmp .neqv. lexp) call abort + if (lgot .neqv. lexp) call abort + + igot =3D 1 + iexp =3D N + + !$acc parallel loop copy (igot, itmp) + do i =3D 1, N + !$acc atomic capture + igot =3D max (igot, i) + itmp =3D igot + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D iexp) call abort + if (igot /=3D iexp) call abort + + igot =3D N + iexp =3D 1 + + !$acc parallel loop copy (igot, itmp) + do i =3D 1, N + !$acc atomic capture + igot =3D min (igot, i) + itmp =3D igot + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D iexp) call abort + if (igot /=3D iexp) call abort + + igot =3D -1 + iexp =3D 0 + + !$acc parallel loop copy (igot, itmp) + do i =3D 0, N - 1 + iexpr =3D ibclr (-2, i) + !$acc atomic capture + igot =3D iand (igot, iexpr) + itmp =3D igot + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D iexp) call abort + if (igot /=3D iexp) call abort + + igot =3D 0 + iexp =3D -1=20 + + !$acc parallel loop copy (igot, itmp) + do i =3D 0, N - 1 + iexpr =3D lshift (1, i) + !$acc atomic capture + igot =3D ior (igot, iexpr) + itmp =3D igot + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D iexp) call abort + if (igot /=3D iexp) call abort + + igot =3D -1 + iexp =3D 0=20 + + !$acc parallel loop copy (igot, itmp) + do i =3D 0, N - 1 + iexpr =3D lshift (1, i) + !$acc atomic capture + igot =3D ieor (igot, iexpr) + itmp =3D igot + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D iexp) call abort + if (igot /=3D iexp) call abort + + igot =3D 1 + iexp =3D N + + !$acc parallel loop copy (igot, itmp) + do i =3D 1, N + !$acc atomic capture + igot =3D max (i, igot) + itmp =3D igot + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D iexp) call abort + if (igot /=3D iexp) call abort + + igot =3D N + iexp =3D 1 + + !$acc parallel loop copy (igot, itmp) + do i =3D 1, N + !$acc atomic capture + igot =3D min (i, igot) + itmp =3D igot + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D iexp) call abort + if (igot /=3D iexp) call abort + + igot =3D -1 + iexp =3D 0 + + !$acc parallel loop copy (igot, itmp) + do i =3D 0, N - 1 + iexpr =3D ibclr (-2, i) + !$acc atomic capture + igot =3D iand (iexpr, igot) + itmp =3D igot + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D iexp) call abort + if (igot /=3D iexp) call abort + + igot =3D 0 + iexp =3D -1=20 + + !$acc parallel loop copy (igot, itmp) + do i =3D 0, N - 1 + iexpr =3D lshift (1, i) + !$acc atomic capture + igot =3D ior (iexpr, igot) + itmp =3D igot + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D iexp) call abort + if (igot /=3D iexp) call abort + + igot =3D -1 + iexp =3D 0=20 + + !$acc parallel loop copy (igot, itmp) + do i =3D 0, N - 1 + iexpr =3D lshift (1, i) + !$acc atomic capture + igot =3D ieor (iexpr, igot) + itmp =3D igot + !$acc end atomic + end do + !$acc end parallel loop + + if (itmp /=3D iexp) call abort + if (igot /=3D iexp) call abort + +end program diff --git libgomp/testsuite/libgomp.oacc-fortran/atomic_update-1.f90 libgo= mp/testsuite/libgomp.oacc-fortran/atomic_update-1.f90 new file mode 100644 index 0000000..6607c77 --- /dev/null +++ libgomp/testsuite/libgomp.oacc-fortran/atomic_update-1.f90 @@ -0,0 +1,338 @@ +! { dg-do run } + +program main + integer igot, iexp, iexpr + real fgot, fexp + integer i + integer, parameter :: N =3D 32 + logical lgot, lexp + + fgot =3D 1234.0 + fexp =3D 1266.0 + + !$acc parallel loop copy (fgot) + do i =3D 1, N + !$acc atomic update + fgot =3D fgot + 1.0 + !$acc end atomic + end do + !$acc end parallel loop + + if (fgot /=3D fexp) call abort + + fgot =3D 1.0 + fexp =3D 2.0**32 + + !$acc parallel loop copy (fgot) + do i =3D 1, N + !$acc atomic update + fgot =3D fgot * 2.0 + !$acc end atomic + end do + !$acc end parallel loop + + if (fgot /=3D fexp) call abort + + fgot =3D 32.0 + fexp =3D fgot - N + + !$acc parallel loop copy (fgot) + do i =3D 1, N + !$acc atomic update + fgot =3D fgot - 1.0 + !$acc end atomic + end do + !$acc end parallel loop + + if (fgot /=3D fexp) call abort + + fgot =3D 2**32.0 + fexp =3D 1.0 + + !$acc parallel loop copy (fgot) + do i =3D 1, N + !$acc atomic update + fgot =3D fgot / 2.0 + !$acc end atomic + end do + !$acc end parallel loop + + if (fgot /=3D fexp) call abort + + lgot =3D .TRUE. + lexp =3D .FALSE. + + !$acc parallel copy (lgot) + !$acc atomic update + lgot =3D lgot .and. .FALSE. + !$acc end atomic + !$acc end parallel + + if (lgot .neqv. lexp) call abort + + lgot =3D .FALSE. + lexp =3D .FALSE. + + !$acc parallel copy (lgot) + !$acc atomic update + lgot =3D lgot .or. .FALSE. + !$acc end atomic + !$acc end parallel + + if (lgot .neqv. lexp) call abort + + lgot =3D .FALSE. + lexp =3D .FALSE. + + !$acc parallel copy (lgot) + !$acc atomic update + lgot =3D lgot .eqv. .TRUE. + !$acc end atomic + !$acc end parallel + + if (lgot .neqv. lexp) call abort + + lgot =3D .FALSE. + lexp =3D .TRUE. + + !$acc parallel copy (lgot) + !$acc atomic update + lgot =3D lgot .neqv. .TRUE. + !$acc end atomic + !$acc end parallel + + if (lgot .neqv. lexp) call abort + + fgot =3D 1234.0 + fexp =3D 1266.0 + + !$acc parallel loop copy (fgot) + do i =3D 1, N + !$acc atomic update + fgot =3D 1.0 + fgot + !$acc end atomic + end do + !$acc end parallel loop + + if (fgot /=3D fexp) call abort + + fgot =3D 1.0 + fexp =3D 2.0**32 + + !$acc parallel loop copy (fgot) + do i =3D 1, N + !$acc atomic update + fgot =3D 2.0 * fgot + !$acc end atomic + end do + !$acc end parallel loop + + if (fgot /=3D fexp) call abort + + fgot =3D 32.0 + fexp =3D 32.0 + + !$acc parallel loop copy (fgot) + do i =3D 1, N + !$acc atomic update + fgot =3D 2.0 - fgot + !$acc end atomic + end do + !$acc end parallel loop + + if (fgot /=3D fexp) call abort + + fgot =3D 2.0**16 + fexp =3D 2.0**16 + + !$acc parallel loop copy (fgot) + do i =3D 1, N + !$acc atomic update + fgot =3D 2.0 / fgot + !$acc end atomic + end do + !$acc end parallel loop + + if (fgot /=3D fexp) call abort + + lgot =3D .TRUE. + lexp =3D .FALSE. + + !$acc parallel copy (lgot) + !$acc atomic update + lgot =3D .FALSE. .and. lgot + !$acc end atomic + !$acc end parallel + + if (lgot .neqv. lexp) call abort + + lgot =3D .FALSE. + lexp =3D .FALSE. + + !$acc parallel copy (lgot) + !$acc atomic update + lgot =3D .FALSE. .or. lgot + !$acc end atomic + !$acc end parallel + + if (lgot .neqv. lexp) call abort + + lgot =3D .FALSE. + lexp =3D .FALSE. + + !$acc parallel copy (lgot) + !$acc atomic update + lgot =3D .TRUE. .eqv. lgot + !$acc end atomic + !$acc end parallel + + if (lgot .neqv. lexp) call abort + + lgot =3D .FALSE. + lexp =3D .TRUE. + + !$acc parallel copy (lgot) + !$acc atomic update + lgot =3D .TRUE. .neqv. lgot + !$acc end atomic + !$acc end parallel + + if (lgot .neqv. lexp) call abort + + igot =3D 1 + iexp =3D N + + !$acc parallel loop copy (igot) + do i =3D 1, N + !$acc atomic update + igot =3D max (igot, i) + !$acc end atomic + end do + !$acc end parallel loop + + if (igot /=3D iexp) call abort + + igot =3D N + iexp =3D 1 + + !$acc parallel loop copy (igot) + do i =3D 1, N + !$acc atomic update + igot =3D min (igot, i) + !$acc end atomic + end do + !$acc end parallel loop + + if (igot /=3D iexp) call abort + + igot =3D -1 + iexp =3D 0 + + !$acc parallel loop copy (igot) + do i =3D 0, N - 1 + iexpr =3D ibclr (-2, i) + !$acc atomic update + igot =3D iand (igot, iexpr) + !$acc end atomic + end do + !$acc end parallel loop + + if (igot /=3D iexp) call abort + + igot =3D 0 + iexp =3D -1=20 + + !$acc parallel loop copy (igot) + do i =3D 0, N - 1 + iexpr =3D lshift (1, i) + !$acc atomic update + igot =3D ior (igot, iexpr) + !$acc end atomic + end do + !$acc end parallel loop + + if (igot /=3D iexp) call abort + + igot =3D -1 + iexp =3D 0=20 + + !$acc parallel loop copy (igot) + do i =3D 0, N - 1 + iexpr =3D lshift (1, i) + !$acc atomic update + igot =3D ieor (igot, iexpr) + !$acc end atomic + end do + !$acc end parallel loop + + if (igot /=3D iexp) call abort + + igot =3D 1 + iexp =3D N + + !$acc parallel loop copy (igot) + do i =3D 1, N + !$acc atomic update + igot =3D max (i, igot) + !$acc end atomic + end do + !$acc end parallel loop + + if (igot /=3D iexp) call abort + + igot =3D N + iexp =3D 1 + + !$acc parallel loop copy (igot) + do i =3D 1, N + !$acc atomic update + igot =3D min (i, igot) + !$acc end atomic + end do + !$acc end parallel loop + + if (igot /=3D iexp) call abort + + igot =3D -1 + iexp =3D 0 + + !$acc parallel loop copy (igot) + do i =3D 0, N - 1 + iexpr =3D ibclr (-2, i) + !$acc atomic update + igot =3D iand (iexpr, igot) + !$acc end atomic + end do + !$acc end parallel loop + + if (igot /=3D iexp) call abort + + igot =3D 0 + iexp =3D -1=20 + + !$acc parallel loop copy (igot) + do i =3D 0, N - 1 + iexpr =3D lshift (1, i) + !$acc atomic update + igot =3D ior (iexpr, igot) + !$acc end atomic + end do + !$acc end parallel loop + + if (igot /=3D iexp) call abort + + igot =3D -1 + iexp =3D 0=20 + + !$acc parallel loop copy (igot) + do i =3D 0, N - 1 + iexpr =3D lshift (1, i) + !$acc atomic update + igot =3D ieor (iexpr, igot) + !$acc end atomic + end do + !$acc end parallel loop + + if (igot /=3D iexp) call abort + +end program diff --git libgomp/testsuite/libgomp.oacc-fortran/cache-1.f90 libgomp/tests= uite/libgomp.oacc-fortran/cache-1.f90 new file mode 100644 index 0000000..f01b8e9 --- /dev/null +++ libgomp/testsuite/libgomp.oacc-fortran/cache-1.f90 @@ -0,0 +1,26 @@ + +program main + integer, parameter :: N =3D 8 + integer, dimension (N) :: a, b + integer :: i + integer :: idx, len + + idx =3D 1 + len =3D 2 + + !$acc parallel copyin (a(1:N)) copyout (b(1:N)) + do i =3D 1, N + + !$acc cache (a(1:N)) + !$acc cache (a(0:N)) + !$acc cache (a(0:N), b(0:N)) + !$acc cache (a(0)) + !$acc cache (a(0), a(1), b(0:N)) + !$acc cache (a(idx)) + !$acc cache (a(idx:len)) + + b(i) =3D a(i) + end do + !$acc end parallel + +end program diff --git libgomp/testsuite/libgomp.oacc-fortran/clauses-1.f90 libgomp/tes= tsuite/libgomp.oacc-fortran/clauses-1.f90 new file mode 100644 index 0000000..e6ab78d --- /dev/null +++ libgomp/testsuite/libgomp.oacc-fortran/clauses-1.f90 @@ -0,0 +1,290 @@ +! { dg-do run } +! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=3D0" } } + +program main + use openacc + implicit none + + integer, parameter :: N =3D 32 + real, allocatable :: a(:), b(:), c(:) + integer i + + i =3D 0 + + allocate (a(N)) + allocate (b(N)) + allocate (c(N)) + + a(:) =3D 3.0 + b(:) =3D 0.0 + + !$acc parallel copyin (a(1:N)) copyout (b(1:N)) + do i =3D 1, N + b(i) =3D a(i) + end do + !$acc end parallel + + do i =3D 1, N + if (b(i) .ne. 3.0) call abort + end do + + if (acc_is_present (a) .eqv. .TRUE.) call abort + if (acc_is_present (b) .eqv. .TRUE.) call abort + + a(:) =3D 5.0 + b(:) =3D 1.0 + + !$acc parallel copyin (a(1:N)) copyout (b(1:N)) + do i =3D 1, N + b(i) =3D a(i) + end do + !$acc end parallel + + do i =3D 1, N + if (b(i) .ne. 5.0) call abort + end do + + if (acc_is_present (a) .eqv. .TRUE.) call abort + if (acc_is_present (b) .eqv. .TRUE.) call abort + + a(:) =3D 6.0 + b(:) =3D 0.0 + + call acc_copyin (a, sizeof (a)) + + a(:) =3D 9.0 + + !$acc parallel present_or_copyin (a(1:N)) copyout (b(1:N)) + do i =3D 1, N + b(i) =3D a(i) + end do + !$acc end parallel + + do i =3D 1, N + if (b(i) .ne. 6.0) call abort + end do + + call acc_copyout (a, sizeof (a)) + + if (acc_is_present (a) .eqv. .TRUE.) call abort + if (acc_is_present (b) .eqv. .TRUE.) call abort + + a(:) =3D 6.0 + b(:) =3D 0.0 + + !$acc parallel copyin (a(1:N)) present_or_copyout (b(1:N)) + do i =3D 1, N + b(i) =3D a(i) + end do + !$acc end parallel + + do i =3D 1, N + if (b(i) .ne. 6.0) call abort + end do + + if (acc_is_present (a) .eqv. .TRUE.) call abort + if (acc_is_present (b) .eqv. .TRUE.) call abort + + a(:) =3D 5.0 + b(:) =3D 2.0 + + call acc_copyin (b, sizeof (b)) + + !$acc parallel copyin (a(1:N)) present_or_copyout (b(1:N)) + do i =3D 1, N + b(i) =3D a(i) + end do + !$acc end parallel + + do i =3D 1, N + if (a(i) .ne. 5.0) call abort + if (b(i) .ne. 2.0) call abort + end do + + call acc_copyout (b, sizeof (b)) + + if (acc_is_present (a) .eqv. .TRUE.) call abort + if (acc_is_present (b) .eqv. .TRUE.) call abort + + a(:) =3D 3.0; + b(:) =3D 4.0; + + !$acc parallel copy (a(1:N)) copyout (b(1:N)) + do i =3D 1, N + a(i) =3D a(i) + 1 + b(i) =3D a(i) + 2 + end do + !$acc end parallel + + do i =3D 1, N + if (a(i) .ne. 4.0) call abort + if (b(i) .ne. 6.0) call abort + end do + + if (acc_is_present (a) .eqv. .TRUE.) call abort + if (acc_is_present (b) .eqv. .TRUE.) call abort + + a(:) =3D 4.0 + b(:) =3D 7.0 + + !$acc parallel present_or_copy (a(1:N)) present_or_copy (b(1:N)) + do i =3D 1, N + a(i) =3D a(i) + 1 + b(i) =3D b(i) + 2 + end do + !$acc end parallel + + do i =3D 1, N + if (a(i) .ne. 5.0) call abort + if (b(i) .ne. 9.0) call abort + end do + + if (acc_is_present (a) .eqv. .TRUE.) call abort + if (acc_is_present (b) .eqv. .TRUE.) call abort + + a(:) =3D 3.0 + b(:) =3D 7.0 + + call acc_copyin (a, sizeof (a)) + call acc_copyin (b, sizeof (b)) + + !$acc parallel present_or_copy (a(1:N)) present_or_copy (b(1:N)) + do i =3D 1, N + a(i) =3D a(i) + 1 + b(i) =3D b(i) + 2 + end do + !$acc end parallel + + do i =3D 1, N + if (a(i) .ne. 3.0) call abort + if (b(i) .ne. 7.0) call abort + end do + + call acc_copyout (a, sizeof (a)) + call acc_copyout (b, sizeof (b)) + + if (acc_is_present (a) .eqv. .TRUE.) call abort + if (acc_is_present (b) .eqv. .TRUE.) call abort + + a(:) =3D 3.0 + b(:) =3D 7.0 + + !$acc parallel copyin (a(1:N)) create (c(1:N)) copyout (b(1:N)) + do i =3D 1, N + c(i) =3D a(i) + b(i) =3D c(i) + end do + !$acc end parallel + + do i =3D 1, N + if (a(i) .ne. 3.0) call abort + if (b(i) .ne. 3.0) call abort + end do + + if (acc_is_present (a) .eqv. .TRUE.) call abort + if (acc_is_present (b) .eqv. .TRUE.) call abort + if (acc_is_present (c) .eqv. .TRUE.) call abort + + a(:) =3D 4.0 + b(:) =3D 8.0 + + !$acc parallel copyin (a(1:N)) present_or_create (c(1:N)) copyout (b(1:N= )) + do i =3D 1, N + c(i) =3D a(i) + b(i) =3D c(i) + end do + !$acc end parallel + + do i =3D 1, N + if (a(i) .ne. 4.0) call abort + if (b(i) .ne. 4.0) call abort + end do + + if (acc_is_present (a) .eqv. .TRUE.) call abort + if (acc_is_present (b) .eqv. .TRUE.) call abort + if (acc_is_present (c) .eqv. .TRUE.) call abort + + a(:) =3D 4.0 + + call acc_copyin (a, sizeof (a)) + call acc_copyin (b, sizeof (b)) + call acc_copyin (c, sizeof (c)) + + !$acc parallel present (a(1:N)) present (c(1:N)) present (b(1:N)) + do i =3D 1, N + c(i) =3D a(i) + b(i) =3D c(i) + end do + !$acc end parallel + + call acc_copyout (a, sizeof (a)) + call acc_copyout (b, sizeof (b)) + call acc_copyout (c, sizeof (c)) +=20=20 + do i =3D 1, N + if (a(i) .ne. 4.0) call abort + if (b(i) .ne. 4.0) call abort + end do + + if (acc_is_present (a) .eqv. .TRUE.) call abort + if (acc_is_present (b) .eqv. .TRUE.) call abort + if (acc_is_present (c) .eqv. .TRUE.) call abort + + a(:) =3D 6.0 + b(:) =3D 0.0 + + call acc_copyin (a, sizeof (a)) + + a(:) =3D 9.0 + + !$acc parallel pcopyin (a(1:N)) copyout (b(1:N)) + do i =3D 1, N + b(i) =3D a(i) + end do + !$acc end parallel + + do i =3D 1, N + if (b(i) .ne. 6.0) call abort + end do +=20=20 + call acc_copyout (a, sizeof (a)) + + if (acc_is_present (a) .eqv. .TRUE.) call abort + if (acc_is_present (b) .eqv. .TRUE.) call abort + + a(:) =3D 6.0 + b(:) =3D 0.0 + + !$acc parallel copyin (a(1:N)) pcopyout (b(1:N)) + do i =3D 1, N + b(i) =3D a(i) + end do + !$acc end parallel + + do i =3D 1, N + if (b(i) .ne. 6.0) call abort + end do + + if (acc_is_present (a) .eqv. .TRUE.) call abort + if (acc_is_present (b) .eqv. .TRUE.) call abort + + a(:) =3D 5.0 + b(:) =3D 7.0 + + !$acc parallel copyin (a(1:N)) pcreate (c(1:N)) copyout (b(1:N)) + do i =3D 1, N + c(i) =3D a(i) + b(i) =3D c(i) + end do + !$acc end parallel + + do i =3D 1, N + if (a(i) .ne. 5.0) call abort + if (b(i) .ne. 5.0) call abort + end do + + if (acc_is_present (a) .eqv. .TRUE.) call abort + if (acc_is_present (b) .eqv. .TRUE.) call abort + if (acc_is_present (c) .eqv. .TRUE.) call abort + +end program main diff --git libgomp/testsuite/libgomp.oacc-fortran/data-1.f90 libgomp/testsu= ite/libgomp.oacc-fortran/data-1.f90 index 5e94e2d..bf323b3 100644 --- libgomp/testsuite/libgomp.oacc-fortran/data-1.f90 +++ libgomp/testsuite/libgomp.oacc-fortran/data-1.f90 @@ -1,45 +1,212 @@ ! { dg-do run } +! { dg-additional-options "-cpp" } =20 -program test - integer, parameter :: N =3D 8 - real, allocatable :: a(:), b(:) +function is_mapped (n) result (rc) + use openacc =20 - allocate (a(N)) - allocate (b(N)) + integer, intent (in) :: n + logical rc =20 - a(:) =3D 3.0 - b(:) =3D 0.0 +#if ACC_MEM_SHARED + integer i =20 - !$acc enter data copyin (a(1:N), b(1:N)) + rc =3D .TRUE. + i =3D n +#else + rc =3D acc_is_present (n, sizeof (n)) +#endif =20 - !$acc parallel - do i =3D 1, n - b(i) =3D a (i) - end do - !$acc end parallel +end function is_mapped =20 - !$acc exit data copyout (a(1:N), b(1:N)) +program main + integer i, j + logical is_mapped =20 - do i =3D 1, n - if (a(i) .ne. 3.0) call abort - if (b(i) .ne. 3.0) call abort - end do + i =3D -1 + j =3D -2 =20 - a(:) =3D 5.0 - b(:) =3D 1.0 + !$acc data copyin (i, j) + if (is_mapped (i) .eqv. .FALSE.) call abort + if (is_mapped (j) .eqv. .FALSE.) call abort =20 - !$acc enter data copyin (a(1:N), b(1:N)) + if (i .ne. -1 .or. j .ne. -2) call abort =20 - !$acc parallel - do i =3D 1, n - b(i) =3D a (i) - end do - !$acc end parallel + i =3D 2 + j =3D 1 =20 - !$acc exit data copyout (a(1:N), b(1:N)) + if (i .ne. 2 .or. j .ne. 1) call abort + !$acc end data =20 - do i =3D 1, n - if (a(i) .ne. 5.0) call abort - if (b(i) .ne. 5.0) call abort - end do -end program test + if (i .ne. 2 .or. j .ne. 1) call abort + + i =3D -1 + j =3D -2 + + !$acc data copyout (i, j) + if (is_mapped (i) .eqv. .FALSE.) call abort + if (is_mapped (j) .eqv. .FALSE.) call abort + + if (i .ne. -1 .or. j .ne. -2) call abort + + i =3D 2 + j =3D 1 + + if (i .ne. 2 .or. j .ne. 1) call abort + + !$acc parallel present (i, j) + i =3D 4 + j =3D 2 + !$acc end parallel + !$acc end data + + if (i .ne. 4 .or. j .ne. 2) call abort + + i =3D -1 + j =3D -2 + + !$acc data create (i, j) + if (is_mapped (i) .eqv. .FALSE.) call abort + if (is_mapped (j) .eqv. .FALSE.) call abort + + if (i .ne. -1 .or. j .ne. -2) call abort + + i =3D 2 + j =3D 1 + + if (i .ne. 2 .or. j .ne. 1) call abort + !$acc end data + + if (i .ne. 2 .or. j .ne. 1) call abort + + i =3D -1 + j =3D -2 + + !$acc data present_or_copyin (i, j) + if (is_mapped (i) .eqv. .FALSE.) call abort + if (is_mapped (j) .eqv. .FALSE.) call abort + + if (i .ne. -1 .or. j .ne. -2) call abort + + i =3D 2 + j =3D 1 + + if (i .ne. 2 .or. j .ne. 1) call abort + !$acc end data + + if (i .ne. 2 .or. j .ne. 1) call abort + + i =3D -1 + j =3D -2 + + !$acc data present_or_copyout (i, j) + if (is_mapped (i) .eqv. .FALSE.) call abort + if (is_mapped (j) .eqv. .FALSE.) call abort + + if (i .ne. -1 .or. j .ne. -2) call abort + + i =3D 2 + j =3D 1 + + if (i .ne. 2 .or. j .ne. 1) call abort + + !$acc parallel present (i, j) + i =3D 4 + j =3D 2 + !$acc end parallel + !$acc end data + + if (i .ne. 4 .or. j .ne. 2) call abort + + i =3D -1 + j =3D -2 + + !$acc data present_or_copy (i, j) + if (is_mapped (i) .eqv. .FALSE.) call abort + if (is_mapped (j) .eqv. .FALSE.) call abort + + if (i .ne. -1 .or. j .ne. -2) call abort + + i =3D 2 + j =3D 1 + + if (i .ne. 2 .or. j .ne. 1) call abort + !$acc end data + +#if ACC_MEM_SHARED + if (i .ne. 2 .or. j .ne. 1) call abort +#else + if (i .ne. -1 .or. j .ne. -2) call abort +#endif + + i =3D -1 + j =3D -2 + + !$acc data present_or_create (i, j) + if (is_mapped (i) .eqv. .FALSE.) call abort + if (is_mapped (j) .eqv. .FALSE.) call abort + + i =3D 2 + j =3D 1 + + if (i .ne. 2 .or. j .ne. 1) call abort + !$acc end data + + if (i .ne. 2 .or. j .ne. 1) call abort + + i =3D -1 + j =3D -2 + + !$acc data copyin (i, j) + !$acc data present (i, j) + if (is_mapped (i) .eqv. .FALSE.) call abort + if (is_mapped (j) .eqv. .FALSE.) call abort + + if (i .ne. -1 .or. j .ne. -2) call abort + + i =3D 2 + j =3D 1 + + if (i .ne. 2 .or. j .ne. 1) call abort + !$acc end data + !$acc end data + + if (i .ne. 2 .or. j .ne. 1) call abort + + i =3D -1 + j =3D -2 + + !$acc data copyin (i, j) + !$acc data present (i, j) + if (is_mapped (i) .eqv. .FALSE.) call abort + if (is_mapped (j) .eqv. .FALSE.) call abort + + if (i .ne. -1 .or. j .ne. -2) call abort + + i =3D 2 + j =3D 1 + + if (i .ne. 2 .or. j .ne. 1) call abort + !$acc end data + !$acc end data + + if (i .ne. 2 .or. j .ne. 1) call abort + + i =3D -1 + j =3D -2 + + !$acc data +#if !ACC_MEM_SHARED + if (is_mapped (i) .eqv. .TRUE.) call abort + if (is_mapped (j) .eqv. .TRUE.) call abort +#endif + if (i .ne. -1 .or. j .ne. -2) call abort + + i =3D 2 + j =3D 1 + + if (i .ne. 2 .or. j .ne. 1) call abort + !$acc end data + + if (i .ne. 2 .or. j .ne. 1) call abort + +end program main diff --git libgomp/testsuite/libgomp.oacc-fortran/data-2.f90 libgomp/testsu= ite/libgomp.oacc-fortran/data-2.f90 index 8736c2a..d190700 100644 --- libgomp/testsuite/libgomp.oacc-fortran/data-2.f90 +++ libgomp/testsuite/libgomp.oacc-fortran/data-2.f90 @@ -1,8 +1,14 @@ ! { dg-do run } =20 program test + use openacc integer, parameter :: N =3D 8 real, allocatable :: a(:,:), b(:,:) + real, allocatable :: c(:), d(:) + integer i, j + + i =3D 0 + j =3D 0 =20 allocate (a(N,N)) allocate (b(N,N)) @@ -28,4 +34,48 @@ program test if (b(j,i) .ne. 3.0) call abort end do end do + + allocate (c(N)) + allocate (d(N)) + + c(:) =3D 3.0 + d(:) =3D 0.0 + + !$acc enter data copyin (c(1:N)) create (d(1:N)) async + !$acc wait +=20=20 + !$acc parallel=20 + do i =3D 1, N + d(i) =3D c(i) + 1 + end do + !$acc end parallel + + !$acc exit data copyout (c(1:N), d(1:N)) async + !$acc wait + + do i =3D 1, N + if (d(i) .ne. 4.0) call abort + end do + + c(:) =3D 3.0 + d(:) =3D 0.0 + + !$acc enter data copyin (c(1:N)) async + !$acc enter data create (d(1:N)) wait + !$acc wait + + !$acc parallel=20 + do i =3D 1, N + d(i) =3D c(i) + 1 + end do + !$acc end parallel +=20=20 + !$acc exit data copyout (d(1:N)) async + !$acc exit data async + !$acc wait + + do i =3D 1, N + if (d(i) .ne. 4.0) call abort + end do + end program test diff --git libgomp/testsuite/libgomp.oacc-fortran/data-3.f90 libgomp/testsu= ite/libgomp.oacc-fortran/data-3.f90 index 9868cb0..daf20a5 100644 --- libgomp/testsuite/libgomp.oacc-fortran/data-3.f90 +++ libgomp/testsuite/libgomp.oacc-fortran/data-3.f90 @@ -17,7 +17,7 @@ program asyncwait =20 !$acc enter data copyin (a(1:N)) copyin (b(1:N)) copyin (N) async =20 - !$acc parallel async wait + !$acc parallel async wait present (a(1:N)) present (b(1:N)) present (N) do i =3D 1, N b(i) =3D a(i) end do @@ -36,7 +36,7 @@ program asyncwait =20 !$acc enter data copyin (a(1:N)) copyin (b(1:N)) async (1) =20 - !$acc parallel async (1) wait (1) + !$acc parallel async (1) wait (1) present (a(1:N), b(1:N), N) do i =3D 1, N b(i) =3D a(i) end do @@ -55,28 +55,30 @@ program asyncwait c(:) =3D 0.0 d(:) =3D 0.0 =20 - !$acc enter data copyin (a(1:N)) create (b(1:N)) create (c(1:N)) create = (d(1:N)) + !$acc enter data copyin (a(1:N)) create (b(1:N)) create (c(1:N)) & + !$acc& create (d(1:N)) =20 - !$acc parallel async (1) + !$acc parallel async (1) present (a(1:N), b(1:N), c(1:N), N) do i =3D 1, N b(i) =3D (a(i) * a(i) * a(i)) / a(i) end do !$acc end parallel =20 - !$acc parallel async (1) + !$acc parallel async (1) present (a(1:N), b(1:N), c(1:N), N) do i =3D 1, N c(i) =3D (a(i) * 4) / a(i) end do !$acc end parallel =20 - !$acc parallel async (1) + !$acc parallel async (1) present (a(1:N), b(1:N), c(1:N), d(1:N), N) do i =3D 1, N d(i) =3D ((a(i) * a(i) + a(i)) / a(i)) - a(i) end do !$acc end parallel =20 !$acc wait (1) - !$acc exit data copyout (a(1:N)) copyout (b(1:N)) copyout (c(1:N)) copyo= ut (d(1:N)) + !$acc exit data copyout (a(1:N)) copyout (b(1:N)) copyout (c(1:N)) & + !$acc& copyout (d(1:N)) =20 do i =3D 1, N if (a(i) .ne. 3.0) call abort @@ -91,34 +93,40 @@ program asyncwait d(:) =3D 0.0 e(:) =3D 0.0 =20 - !$acc enter data copyin (a(1:N)) create (b(1:N)) create (c(1:N)) create = (d(1:N)) copyin (e(1:N)) + !$acc enter data copyin (a(1:N)) create (b(1:N)) create (c(1:N)) & + !$acc& create (d(1:N)) copyin (e(1:N)) =20 - !$acc parallel async (1) + !$acc parallel async (1) present (a(1:N), b(1:N), c(1:N), d(1:N)) & + !$acc& present (e(1:N), N) do i =3D 1, N b(i) =3D (a(i) * a(i) * a(i)) / a(i) end do !$acc end parallel =20 - !$acc parallel async (1) + !$acc parallel async (1) present (a(1:N), b(1:N), c(1:N), d(1:N)) & + !$acc& present (e(1:N), N) do i =3D 1, N c(i) =3D (a(i) * 4) / a(i) end do !$acc end parallel =20 - !$acc parallel async (1) + !$acc parallel async (1) present (a(1:N), b(1:N), c(1:N), d(1:N)) & + !$acc& present (e(1:N), N) do i =3D 1, N d(i) =3D ((a(i) * a(i) + a(i)) / a(i)) - a(i) end do !$acc end parallel =20 - !$acc parallel wait (1) async (1) + !$acc parallel wait (1) async (1) present (a(1:N), b(1:N), c(1:N)) & + !$acc& present (d(1:N), e(1:N), N) do i =3D 1, N e(i) =3D a(i) + b(i) + c(i) + d(i) end do !$acc end parallel =20 !$acc wait (1) - !$acc exit data copyout (a(1:N)) copyout (b(1:N)) copyout (c(1:N)) copyo= ut (d(1:N)) copyout (e(1:N)) + !$acc exit data copyout (a(1:N)) copyout (b(1:N)) copyout (c(1:N)) & + !$acc& copyout (d(1:N)) copyout (e(1:N)) !$acc exit data delete (N) =20 do i =3D 1, N diff --git libgomp/testsuite/libgomp.oacc-fortran/data-4-2.f90 libgomp/test= suite/libgomp.oacc-fortran/data-4-2.f90 index 16a8598..d1ecf0a 100644 --- libgomp/testsuite/libgomp.oacc-fortran/data-4-2.f90 +++ libgomp/testsuite/libgomp.oacc-fortran/data-4-2.f90 @@ -19,7 +19,7 @@ program asyncwait =20 !$acc enter data copyin (a(1:N)) copyin (b(1:N)) copyin (N) async =20 - !$acc parallel async wait + !$acc parallel async wait present (a(1:N), b(1:N), N) !$acc loop do i =3D 1, N b(i) =3D a(i) @@ -39,7 +39,7 @@ program asyncwait =20 !$acc update device (a(1:N), b(1:N)) async (1) =20 - !$acc parallel async (1) wait (1) + !$acc parallel async (1) wait (1) present (a(1:N), b(1:N), N) !$acc loop do i =3D 1, N b(i) =3D a(i) @@ -62,19 +62,19 @@ program asyncwait !$acc enter data copyin (c(1:N), d(1:N)) async (1) !$acc update device (a(1:N), b(1:N)) async (1) =20 - !$acc parallel async (1) + !$acc parallel async (1) present (a(1:N), b(1:N), N) do i =3D 1, N b(i) =3D (a(i) * a(i) * a(i)) / a(i) end do !$acc end parallel =20 - !$acc parallel async (1) + !$acc parallel async (1) present (a(1:N), c(1:N), N) do i =3D 1, N c(i) =3D (a(i) * 4) / a(i) end do !$acc end parallel =20 - !$acc parallel async (1) + !$acc parallel async (1) present (a(1:N), d(1:N), N) do i =3D 1, N d(i) =3D ((a(i) * a(i) + a(i)) / a(i)) - a(i) end do @@ -100,25 +100,26 @@ program asyncwait !$acc enter data copyin (e(1:N)) async (1) !$acc update device (a(1:N), b(1:N), c(1:N), d(1:N)) async (1) =20 - !$acc parallel async (1) + !$acc parallel async (1) present (a(1:N), b(1:N), N) do i =3D 1, N b(i) =3D (a(i) * a(i) * a(i)) / a(i) end do !$acc end parallel =20 - !$acc parallel async (1) + !$acc parallel async (1) present (a(1:N), c(1:N), N) do i =3D 1, N c(i) =3D (a(i) * 4) / a(i) end do !$acc end parallel =20 - !$acc parallel async (1) + !$acc parallel async (1) present (a(1:N), d(1:N), N) do i =3D 1, N d(i) =3D ((a(i) * a(i) + a(i)) / a(i)) - a(i) end do !$acc end parallel =20 - !$acc parallel wait (1) async (1) + !$acc parallel wait (1) async (1) present (a(1:N), b(1:N), c(1:N)) & + !$acc& present (d(1:N), e(1:N), N) do i =3D 1, N e(i) =3D a(i) + b(i) + c(i) + d(i) end do diff --git libgomp/testsuite/libgomp.oacc-fortran/data-4.f90 libgomp/testsu= ite/libgomp.oacc-fortran/data-4.f90 index f6886b0..4e95a9c 100644 --- libgomp/testsuite/libgomp.oacc-fortran/data-4.f90 +++ libgomp/testsuite/libgomp.oacc-fortran/data-4.f90 @@ -17,7 +17,7 @@ program asyncwait =20 !$acc enter data copyin (a(1:N)) copyin (b(1:N)) copyin (N) async =20 - !$acc parallel async wait + !$acc parallel async wait present (a(1:N), b(1:N), N) !$acc loop do i =3D 1, N b(i) =3D a(i) @@ -37,7 +37,7 @@ program asyncwait =20 !$acc update device (a(1:N), b(1:N)) async (1) =20 - !$acc parallel async (1) wait (1) + !$acc parallel async (1) wait (1) present (a(1:N), b(1:N), N) !$acc loop do i =3D 1, N b(i) =3D a(i) @@ -60,19 +60,19 @@ program asyncwait !$acc enter data copyin (c(1:N), d(1:N)) async (1) !$acc update device (a(1:N), b(1:N)) async (1) =20 - !$acc parallel async (1) + !$acc parallel async (1) present (a(1:N), b(1:N), N) do i =3D 1, N b(i) =3D (a(i) * a(i) * a(i)) / a(i) end do !$acc end parallel =20 - !$acc parallel async (1) + !$acc parallel async (1) present (a(1:N), c(1:N), N) do i =3D 1, N c(i) =3D (a(i) * 4) / a(i) end do !$acc end parallel =20 - !$acc parallel async (1) + !$acc parallel async (1) present (a(1:N), d(1:N), N) do i =3D 1, N d(i) =3D ((a(i) * a(i) + a(i)) / a(i)) - a(i) end do @@ -98,25 +98,26 @@ program asyncwait !$acc enter data copyin (e(1:N)) async (1) !$acc update device (a(1:N), b(1:N), c(1:N), d(1:N)) async (1) =20 - !$acc parallel async (1) + !$acc parallel async (1) present (a(1:N), b(1:N), N) do i =3D 1, N b(i) =3D (a(i) * a(i) * a(i)) / a(i) end do !$acc end parallel =20 - !$acc parallel async (1) + !$acc parallel async (1) present (a(1:N), c(1:N), N) do i =3D 1, N c(i) =3D (a(i) * 4) / a(i) end do !$acc end parallel =20 - !$acc parallel async (1) + !$acc parallel async (1) present (a(1:N), d(1:N), N) do i =3D 1, N d(i) =3D ((a(i) * a(i) + a(i)) / a(i)) - a(i) end do !$acc end parallel =20 - !$acc parallel wait (1) async (1) + !$acc parallel wait (1) async (1) present (a(1:N), b(1:N), c(1:N)) & + !$acc& present (d(1:N), e(1:N), N) do i =3D 1, N e(i) =3D a(i) + b(i) + c(i) + d(i) end do diff --git libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90 libgomp/tes= tsuite/libgomp.oacc-fortran/declare-1.f90 new file mode 100644 index 0000000..0bab5bd --- /dev/null +++ libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90 @@ -0,0 +1,229 @@ +! { dg-do run { target openacc_nvidia_accel_selected } } + +subroutine subr6 (a, d) + integer, parameter :: N =3D 8 + integer :: i + integer :: a(N) + !$acc declare deviceptr (a) + integer :: d(N) + + i =3D 0 + + !$acc parallel copy (d) + do i =3D 1, N + d(i) =3D a(i) + a(i) + end do + !$acc end parallel + +end subroutine + +subroutine subr5 (a, b, c, d) + integer, parameter :: N =3D 8 + integer :: i + integer :: a(N) + !$acc declare present_or_copyin (a) + integer :: b(N) + !$acc declare present_or_create (b) + integer :: c(N) + !$acc declare present_or_copyout (c) + integer :: d(N) + !$acc declare present_or_copy (d) + + i =3D 0 + + !$acc parallel + do i =3D 1, N + b(i) =3D a(i) + c(i) =3D b(i) + d(i) =3D d(i) + b(i) + end do + !$acc end parallel + +end subroutine + +subroutine subr4 (a, b) + integer, parameter :: N =3D 8 + integer :: i + integer :: a(N) + !$acc declare present (a) + integer :: b(N) + !$acc declare copyout (b) + + i =3D 0 + + !$acc parallel + do i =3D 1, N + b(i) =3D a(i) + end do + !$acc end parallel + +end subroutine + +subroutine subr3 (a, c) + integer, parameter :: N =3D 8 + integer :: i + integer :: a(N) + !$acc declare present (a) + integer :: c(N) + !$acc declare copyin (c) + + i =3D 0 + + !$acc parallel + do i =3D 1, N + a(i) =3D c(i) + c(i) =3D 0 + end do + !$acc end parallel + +end subroutine + +subroutine subr2 (a, b, c) + integer, parameter :: N =3D 8 + integer :: i + integer :: a(N) + !$acc declare present (a) + integer :: b(N) + !$acc declare create (b) + integer :: c(N) + !$acc declare copy (c) + + i =3D 0 + + !$acc parallel + do i =3D 1, N + b(i) =3D a(i) + c(i) =3D b(i) + c(i) + 1 + end do + !$acc end parallel + +end subroutine + +subroutine subr1 (a, b, c) + integer, parameter :: N =3D 8 + integer :: i + integer :: a(N) + !$acc declare present (a) + integer :: b(N) + integer :: c(N) + + i =3D 0 + + !$acc parallel + do i =3D 1, N + a(i) =3D a(i) + 1 + end do + !$acc end parallel + +end subroutine + +subroutine test (a, e) + use openacc + logical :: e + integer, parameter :: N =3D 8 + integer :: a(N) + + if (acc_is_present (a) .neqv. e) call abort + +end subroutine + +subroutine subr0 (a, b, c, d) + integer, parameter :: N =3D 8 + integer :: a(N) + !$acc declare copy (a) + integer :: b(N) + integer :: c(N) + integer :: d(N) + + call test (a, .true.) + call test (b, .false.) + call test (c, .false.) + + call subr1 (a, b, c) + + call test (a, .true.) + call test (b, .false.) + call test (c, .false.) + + call subr2 (a, b, c) + + call test (a, .true.) + call test (b, .false.) + call test (c, .false.) + + do i =3D 1, N + if (c(i) .ne. 8) call abort + end do + + call subr3 (a, c) + + call test (a, .true.) + call test (b, .false.) + call test (c, .false.) + + do i =3D 1, N + if (a(i) .ne. 2) call abort + if (c(i) .ne. 8) call abort + end do + + call subr4 (a, b) + + call test (a, .true.) + call test (b, .false.) + call test (c, .false.) + + do i =3D 1, N + if (b(i) .ne. 8) call abort + end do + + call subr5 (a, b, c, d) + + call test (a, .true.) + call test (b, .false.) + call test (c, .false.) + call test (d, .false.) + + do i =3D 1, N + if (c(i) .ne. 8) call abort + if (d(i) .ne. 13) call abort + end do + + call subr6 (a, d) + + call test (a, .true.) + call test (d, .false.) + + do i =3D 1, N + if (d(i) .ne. 16) call abort + end do + +end subroutine + +program main + use openacc + integer, parameter :: N =3D 8 + integer :: a(N) + integer :: b(N) + integer :: c(N) + integer :: d(N) + + a(:) =3D 2 + b(:) =3D 3 + c(:) =3D 4 + d(:) =3D 5 + + call subr0 (a, b, c, d) + + call test (a, .false.) + call test (b, .false.) + call test (c, .false.) + call test (d, .false.) + + do i =3D 1, N + if (a(i) .ne. 8) call abort + if (b(i) .ne. 8) call abort + if (c(i) .ne. 8) call abort + if (d(i) .ne. 16) call abort + end do + +end program diff --git libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90 libgomp/testsu= ite/libgomp.oacc-fortran/lib-12.f90 new file mode 100644 index 0000000..593cde6 --- /dev/null +++ libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90 @@ -0,0 +1,24 @@ +! { dg-do run } + +program main + use openacc + implicit none + + integer :: i, n + + n =3D 1000000 + + !$acc parallel async (0) + do i =3D 1, 1000000 + end do + !$acc end parallel + + call acc_wait_async (0, 1) + + if (acc_async_test (0) .neqv. .TRUE.) call abort + + if (acc_async_test (1) .neqv. .TRUE.) call abort + + call acc_wait (1) + +end program diff --git libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90 libgomp/testsu= ite/libgomp.oacc-fortran/lib-13.f90 new file mode 100644 index 0000000..cffda87 --- /dev/null +++ libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90 @@ -0,0 +1,28 @@ +! { dg-do run } + +program main + use openacc + implicit none + + integer :: i, j, nprocs + integer, parameter :: N =3D 1000000 + + nprocs =3D 2 + + do j =3D 1, nprocs + !$acc parallel async (j) + do i =3D 1, N + end do + !$acc end parallel + end do + + if (acc_async_test (1) .neqv. .TRUE.) call abort + if (acc_async_test (2) .neqv. .TRUE.) call abort + + call acc_wait_all_async (nprocs + 1) + + if (acc_async_test (nprocs + 1) .neqv. .TRUE.) call abort + + call acc_wait_all () + +end program diff --git libgomp/testsuite/libgomp.oacc-fortran/lib-14.f90 libgomp/testsu= ite/libgomp.oacc-fortran/lib-14.f90 new file mode 100644 index 0000000..72a2b49 --- /dev/null +++ libgomp/testsuite/libgomp.oacc-fortran/lib-14.f90 @@ -0,0 +1,79 @@ +! { dg-do run } + +program main + use openacc + implicit none + + integer, parameter :: N =3D 256 + integer, allocatable :: h(:) + integer :: i + + allocate (h(N)) + + do i =3D 1, N + h(i) =3D i + end do=20 + + call acc_present_or_copyin (h) + + if (acc_is_present (h) .neqv. .TRUE.) call abort + + call acc_copyout (h) + + if (acc_is_present (h) .neqv. .FALSE.) call abort + + do i =3D 1, N + if (h(i) /=3D i) call abort + end do + + do i =3D 1, N + h(i) =3D i + i + end do=20 + + call acc_pcopyin (h, sizeof (h)) + + if (acc_is_present (h) .neqv. .TRUE.) call abort + + call acc_copyout (h) + + if (acc_is_present (h) .neqv. .FALSE.) call abort + + do i =3D 1, N + if (h(i) /=3D i + i) call abort + end do + + call acc_create (h) + + if (acc_is_present (h) .neqv. .TRUE.) call abort + + !$acc parallel loop + do i =3D 1, N + h(i) =3D i + end do + !$end acc parallel + + call acc_copyout (h) + + if (acc_is_present (h) .neqv. .FALSE.) call abort + + do i =3D 1, N + if (h(i) /=3D i) call abort + end do + + call acc_present_or_create (h, sizeof (h)) + + if (acc_is_present (h) .neqv. .TRUE.) call abort + + call acc_delete (h) + + if (acc_is_present (h) .neqv. .FALSE.) call abort + + call acc_pcreate (h) + + if (acc_is_present (h) .neqv. .TRUE.) call abort + + call acc_delete (h) + + if (acc_is_present (h) .neqv. .FALSE.) call abort + +end program diff --git libgomp/testsuite/libgomp.oacc-fortran/lib-15.f90 libgomp/testsu= ite/libgomp.oacc-fortran/lib-15.f90 new file mode 100644 index 0000000..3a834db --- /dev/null +++ libgomp/testsuite/libgomp.oacc-fortran/lib-15.f90 @@ -0,0 +1,52 @@ +! { dg-do run } +! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=3D0" } } + +program main + use openacc + implicit none + + integer, parameter :: N =3D 256 + integer, allocatable :: h(:) + integer :: i + + allocate (h(N)) + + do i =3D 1, N + h(i) =3D i + end do=20 + + call acc_copyin (h) + + do i =3D 1, N + h(i) =3D i + i + end do=20 + + call acc_update_device (h, sizeof (h)) + + if (acc_is_present (h) .neqv. .TRUE.) call abort + + h(:) =3D 0 + + call acc_copyout (h, sizeof (h)) + + do i =3D 1, N + if (h(i) /=3D i + i) call abort + end do=20 + + call acc_copyin (h, sizeof (h)) + + h(:) =3D 0 + + call acc_update_self (h, sizeof (h)) +=20=20 + if (acc_is_present (h) .neqv. .TRUE.) call abort + + do i =3D 1, N + if (h(i) /=3D i + i) call abort + end do=20 + + call acc_delete (h) + + if (acc_is_present (h) .neqv. .FALSE.) call abort +=20=20 +end program diff --git libgomp/testsuite/libgomp.oacc-fortran/routine-5.f90 libgomp/tes= tsuite/libgomp.oacc-fortran/routine-5.f90 new file mode 100644 index 0000000..aaeb994 --- /dev/null +++ libgomp/testsuite/libgomp.oacc-fortran/routine-5.f90 @@ -0,0 +1,27 @@ +! { dg-do run } +! { dg-options "-fno-inline" } + +program main + integer :: n + + n =3D 5 + + !$acc parallel copy (n) + n =3D func (n) + !$acc end parallel + + if (n .ne. 6) call abort + +contains + + function func (n) result (rc) + !$acc routine gang worker vector seq nohost + integer, intent (in) :: n + integer :: rc + + rc =3D n + rc =3D rc + 1 + + end function + +end program Gr=C3=BC=C3=9Fe, Thomas --=-=-= Content-Type: application/pgp-signature Content-length: 472 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJVSIaCAAoJEPoxNhtoi6COWxQH/RQAZhGT355LXKpm+tKC4ypH 8R8eQ8lsNK8LBtW3AEgolWDGk/YHK49+2y0BWFJO+9UJI38JogvTMB36KXXsckDd DvtgkH6AgBLzED1QrLUDdgjszev7WzBxWPUH8/aaFw2jfo+hrbyEHgTu4KCo41ic rgqRM6n+88a80XF1fepaFV8wUL8tV3EGLPEwqq8W97KR/Jbqk1cHp89X8S2hTshA V5S9D4cxke65S0U/WZoe8X68hER78dyrefS+v1vYiooF0HPc5/NFxLJxohVHAZss m/p46PMZYDPE8ifepExrQk4vPWb/eEuoL58urZYQZ75W97jNJBbIhZz87KXgrPc= =3kcP -----END PGP SIGNATURE----- --=-=-=--