From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 125899 invoked by alias); 11 Nov 2019 11:54:45 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 125884 invoked by uid 89); 11 Nov 2019 11:54:44 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-18.0 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_SHORT,SPF_PASS autolearn=ham version=3.3.1 spammy=dimensions, we've, ordering, 23,7 X-HELO: esa2.mentor.iphmx.com Received: from esa2.mentor.iphmx.com (HELO esa2.mentor.iphmx.com) (68.232.141.98) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 11 Nov 2019 11:54:38 +0000 IronPort-SDR: flDrQvAHhgAtWey0xDJi3JoYFhfsna28lkLfCg8zRbD2qZ2ButItMn5GbOPIyo0HQuzwtRh0uJ 5WlJGOeqsXSPem1IecXbTSYKjYnGZiipbGzWw5/ZbteiUBg6O7krhTN1PCNA8VB7rFLl2PEHkh lX6x+WXZvAD8A3gjrWRhPfFh5tmgVbBig9nJqvRqRIhiNHtIV2rIoXmUmjbKcWYDMsAJKcngv5 Mi4r8rHEfEKuoY2jQB07ndmjlKgLI/3Vi6luzmJb2Yfr6AP2oRf/2wT/FcKQk5M70SLNiVfhOM uT8= Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 11 Nov 2019 03:54:36 -0800 IronPort-SDR: Ua4GpLcTJEkfbn/L4DdJ36bDidDFTZ241SdbH98AB/uXGodjMBug+HQYICbRtgzmqcCZSYzBfm N5rPxzNWlJNLbT/S5O58YszZMDnhIRhB3kb1v8D7t5vxUWKDKiX21QUgOecVkEOKc1TCOvGoCZ GcJh0AdYln0nItu/cVujMiIMyJJo2pGdSFWAVl/bYMVASUAtPqADSCWLbcdyxBY08xbEwE1hLG rQ+O5YocMXmVn2E8dwikbe9vYD180fs0C03pIaDJEt62T+okLI5wT7mNJrt8zHOlfxLQipunPh Ikc= From: Thomas Schwinge To: Frederik Harwath CC: , , , Subject: Re: [PATCH] Add OpenACC 2.6 `serial' construct support In-Reply-To: <20191107095213.11618-1-frederik@codesourcery.com> References: <20191107095213.11618-1-frederik@codesourcery.com> User-Agent: Notmuch/0.29.1+93~g67ed7df (https://notmuchmail.org) Emacs/26.1 (x86_64-pc-linux-gnu) Date: Mon, 11 Nov 2019 11:56:00 -0000 Message-ID: <87pnhyocgd.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="==-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" Return-Path: tschwing@mentor.com X-SW-Source: 2019-11/txt/msg00727.txt.bz2 --==-=-= Content-Type: multipart/mixed; boundary="=-=-=" --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-length: 18867 Hi Frederik! On 2019-11-07T10:52:13+0100, Frederik Harwath w= rote: > this patch implements the OpenACC 2.6 "serial" construct. Thanks for taking on that one. > It has been tested by running the testsuite with nvptx-none > offloading on x86_64-pc-linux-gnu. This is OK for trunk with the attached "incremental, into Add OpenACC 2.6 `serial' construct support" merged in. (No need to re-test; I've just done that.) In the incremental patch, I'm streamlining some code, format/handle 'serial' the same as existing 'parallel', etc., plus a few more things, see my comments in the patch review below. To record the review effort, please include "Reviewed-by: Thomas Schwinge " in the commit log, see . I'm working on an additional patch to handle 'serial' in more cases where it's wrong to diverge from 'parallel' (this tells us: a lot of testsuite coverage is missing...), etc. Thus I'm adding a lot of testsuite coverage. I'm not asking you to work on that, as that's not a feasible task for someone who's still new to all this, to figure out the appropriate tests that should be augmented/duplicated for 'serial'. And, coming up with a list for you to work though, I suppose would be more time consuming for me instead of just doing it myself. ;-) However, you're of course always encouraged to learn from reading such patches, and ask questions for any things unclear, of course. > The `serial' construct (cf. section 2.5.3 of the OpenACC 2.6 standard) > is equivalent to a `parallel' construct with clauses `num_gangs(1) > num_workers(1) vector_length(1)' implied. ..., and that's how it -- basically -- is implemented, and thus every usage of 'serial' gets an annoying 'warning: using vector_length (32), ignoring 1' for nvptx offloading compilation. I wonder if we should sinply disable that nvptx back end warning when an 'oacc serial' attribute is present? Or, if we should not, to highlight the issue that I recently filed "OpenACC 'serial' construct might not actually be serial", discovered during this review process? (Summary: by GCC have a default of 'vector_length (32)', we do get vector parallelism with 'loop vector', or 'routine vector' inside 'serial' regions -- not clear if that's intentional, and/or correct.) > These clauses are therefore not supported with the `serial' > construct. All the remaining clauses accepted with `parallel' are also > accepted with `serial'. > > The `serial' construct is implemented like `parallel', except for > hardcoding dimensions rather than taking them from the relevant > clauses, in `expand_omp_target'. > Separate codes are used to denote the `serial' construct throughout t= he > middle end, even though the mapping of `serial' to an equivalent > `parallel' construct could have been done in the individual language > frontends. Yeah, I'd pointed this out early on, and I still wonder if early translating 'serial' into 'parallel num_gangs (1) num_workers (1) vector_length (1)' (if that's really just what it is) would be better? Would save quite some effort (duplicate all 'parallel' handling for 'serial'). On the other hand, we'd then need a different mechanism for: > In particular, this allows to distinguish between `parallel' > and `serial' in warnings, error messages, dumps etc. ... that (or just say "compute construct" instead of 'parallel', 'kernels', 'serial'). But we'll eventually want such a more general mechnisma anyway; "Adapt OpenMP diagnostic messages for OpenACC". So -- we've now got that implementation, and we can still clean it up later on. > * omp-builtins.def (BUILT_IN_GOACC_PARALLEL): Add parameter. Not anymore. > create mode 100644 gcc/testsuite/gfortran.dg/goacc/serial-dims.f90 > create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/serial-dims-au= x.c > create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/serial-dims.f90 Thanks to you (and/or Tobias, I suppose) for adding some Fortran testsuite coversage, because: > --- a/gcc/fortran/parse.c > +++ b/gcc/fortran/parse.c > @@ -683,6 +683,9 @@ decode_oacc_directive (void) > matcha ("end parallel loop", gfc_match_omp_eos_error, > ST_OACC_END_PARALLEL_LOOP); > matcha ("end parallel", gfc_match_omp_eos_error, ST_OACC_END_PARAL= LEL); > + matcha ("end serial loop", gfc_match_omp_eos_error, > + ST_OACC_END_SERIAL_LOOP); > + matcha ("end serial", gfc_match_omp_eos_error, ST_OACC_END_SERIAL); > matcha ("enter data", gfc_match_oacc_enter_data, ST_OACC_ENTER_DAT= A); > matcha ("exit data", gfc_match_oacc_exit_data, ST_OACC_EXIT_DATA); > break; Wow, wow. I see this has not been present in the og8 and og9 commits of the OpenACC 'serial' changes. This tells us: the OpenACC 'serial' construct has *not at all* been tested with Fortran; any compilation attempt would've stopped early in the front end: 25 | !$acc end serial loop | 1 Error: Unclassifiable OpenACC directive at (1) 28 | !$acc end serial | 1 Error: Unclassifiable OpenACC directive at (1) Thanks for fixing that. > --- a/gcc/gimple.h > +++ b/gcc/gimple.h > @@ -182,6 +182,7 @@ enum gf_mask { > GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA =3D 9, > GF_OMP_TARGET_KIND_OACC_DECLARE =3D 10, > GF_OMP_TARGET_KIND_OACC_HOST_DATA =3D 11, > + GF_OMP_TARGET_KIND_OACC_SERIAL =3D 12, That's not wrong, but I've still moved 'GF_OMP_TARGET_KIND_OACC_SERIAL' next to/after the existing 'GF_OMP_TARGET_KIND_OACC_PARALLEL', 'GF_OMP_TARGET_KIND_OACC_KERNELS' (it's OK to renumber 'enum gf_mask' items), so that there's (at least some) consistency in the the 'parallel', 'kernels', 'serial' ordering (which is the order they appear in the current specification), that we shall use unless alphabetical ordering is used. > --- a/gcc/omp-low.c > +++ b/gcc/omp-low.c > @@ -7518,7 +7526,7 @@ lower_oacc_head_mark (location_t loc, tree ddvar, t= ree clauses, >=20=20 > /* In a parallel region, loops are implicitly INDEPENDENT. */ > omp_context *tgt =3D enclosing_target_ctx (ctx); > - if (!tgt || is_oacc_parallel (tgt)) > + if (!tgt || is_oacc_parallel_or_serial (tgt)) > tag |=3D OLF_INDEPENDENT; I would agree, but from a (very) quick look, I don't think the OpenACC specification actually says anything on that topic. Something I'll get that clarified. > --- /dev/null > +++ b/gcc/testsuite/c-c++-common/goacc/serial-dims.c > @@ -0,0 +1,12 @@ > +/* Invalid use of OpenACC parallelism dimensions clauses: num_gangs, > + num_workers, vector_length with the serial construct. */ > + > +void f(void) > +{ > +#pragma acc serial num_gangs (1) /* { dg-error "'num_gangs' is not valid= for '#pragma acc serial'" } */ > + ; > +#pragma acc serial num_workers (1) /* { dg-error "'num_workers' is not v= alid for '#pragma acc serial'" } */ > + ; > +#pragma acc serial vector_length (1) /* { dg-error "'vector_length' is n= ot valid for '#pragma acc serial'" } */ > + ; > +} I've merged that into the existing 'c-c++-common/goacc/parallel-dims-2.c'. > --- /dev/null > +++ b/gcc/testsuite/gfortran.dg/goacc/serial-dims.f90 > @@ -0,0 +1,40 @@ > +! Invalid use of OpenACC parallelism dimensions clauses: num_gangs, > +! num_workers, vector_length with the serial construct. > + > +subroutine s() > + integer :: i > + !$acc parallel > + !$acc end parallel > + > + !$acc parallel loop > + do i =3D 1, 5 > + end do > + > + !$acc parallel loop > + do i =3D 1, 5 > + end do > + !$acc end parallel loop > + > + !$acc serial loop > + do i =3D 1, 5 > + end do > + > + !$acc serial loop > + do i =3D 1, 5 > + end do > + !$acc end serial loop > + > + !$acc serial > + !$acc end serial > +end subroutine s > + > +subroutine f() > +!$acc serial num_gangs (1) ! { dg-error "Failed to match clause at" } > +!$acc end serial ! { dg-error "Unexpected !.ACC END SERIAL statement" } > + > +!$acc serial num_workers (1) ! { dg-error "Failed to match clause at" } > +!$acc end serial ! { dg-error "Unexpected !.ACC END SERIAL statement" } > + > +!$acc serial vector_length (1) ! { dg-error "Failed to match clause at"= } > +!$acc end serial ! { dg-error "Unexpected !.ACC END SERIAL statement" } > +end subroutine f Similarly, for symmetry, moved into (new) 'gfortran.dg/goacc/parallel-dims-2.f90'. > --- /dev/null > +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c > @@ -0,0 +1,92 @@ > +/* OpenACC dimensions with the serial construct. */ This I've merged into the existing 'libgomp.oacc-c-c++-common/parallel-dims.c', instead of duplicating infrastructure here, and doing some things slightly differently (possibly due to incorrect divergence between 'serial' and 'parallel' handling, as I mentioned above, which I shall soon fix). > + /* Serial OpenACC constructs must get launched as 1 x 1 x 1. */ > + { > + int gangs_min, gangs_max; > + int workers_min, workers_max; > + int vectors_min, vectors_max; > + int gangs_actual, workers_actual, vectors_actual; > + int i, j, k; > + > + gangs_min =3D workers_min =3D vectors_min =3D INT_MAX; > + gangs_max =3D workers_max =3D vectors_max =3D INT_MIN; > + gangs_actual =3D workers_actual =3D vectors_actual =3D 1; > +#pragma acc serial /* { dg-warning "using vector_length \\(32\\), ignori= ng 1" "" { target openacc_nvidia_accel_selected } } */ > + { > + if (acc_on_device (acc_device_nvidia)) > + { > + /* The GCC nvptx back end enforces vector_length (32). */ > + vectors_actual =3D 32; So, that's actually a good question, whether that is permissible -- that's "OpenACC 'serial' construct might not actually be serial", as mentioned above. > + } > + else if (!acc_on_device (acc_device_host)) > + __builtin_abort (); > +#pragma acc loop gang \ > + reduction (min: gangs_min, workers_min, vectors_min) \ > + reduction (max: gangs_max, workers_max, vectors_max) > + for (i =3D 100 * gangs_actual; i > -100 * gangs_actual; i--) > +#pragma acc loop worker \ > + reduction (min: gangs_min, workers_min, vectors_min) \ > + reduction (max: gangs_max, workers_max, vectors_max) > + for (j =3D 100 * workers_actual; j > -100 * workers_actual; j--) > +#pragma acc loop vector \ > + reduction (min: gangs_min, workers_min, vectors_min) \ > + reduction (max: gangs_max, workers_max, vectors_max) > + for (k =3D 100 * vectors_actual; k > -100 * vectors_actual; k--) > + { > + gangs_min =3D gangs_max =3D acc_gang (); > + workers_min =3D workers_max =3D acc_worker (); > + vectors_min =3D vectors_max =3D acc_vector (); > + } > + if (gangs_min !=3D 0 || gangs_max !=3D gangs_actual - 1 > + || workers_min !=3D 0 || workers_max !=3D workers_actual - 1 > + || vectors_min !=3D 0 || vectors_max !=3D vectors_actual - 1) > + __builtin_abort (); > + } > + } Per the OpenACC 'loop' directives specified here, that's testing gang-partitioned, worker-partitioned, vector-partitioned execution mode. We should also test gang-redundant, worker-single, vector-single execution mode, which I've added. The Fortran counter part (thanks for creating that!), I suppose, had not yet been tested? > --- /dev/null > +++ b/libgomp/testsuite/libgomp.oacc-fortran/serial-dims-aux.c > @@ -0,0 +1,41 @@ > +/* OpenACC dimensions with the serial construct. */ > +/* Used by serial-dims.f90. */ I indicate where this has been copied from. (Generally, getting rid of these wrapper functions is for another day.) > +#include > +#include > +#include > + > +/* TODO: "(int) acc_device_*" casts because of the C++ acc_on_device wra= pper > + not behaving as expected for -O0. */ > +#pragma acc routine seq > +static unsigned int __attribute__ ((optimize ("O2"))) acc_gang () > +{ > + if (acc_on_device ((int) acc_device_host)) > + return 0; > + else if (acc_on_device ((int) acc_device_nvidia)) > + return __builtin_goacc_parlevel_id (GOMP_DIM_GANG); > + else > + __builtin_abort (); > +} > + > +#pragma acc routine seq > +static unsigned int __attribute__ ((optimize ("O2"))) acc_worker () > +{ > + if (acc_on_device ((int) acc_device_host)) > + return 0; > + else if (acc_on_device ((int) acc_device_nvidia)) > + return __builtin_goacc_parlevel_id (GOMP_DIM_WORKER); > + else > + __builtin_abort (); > +} > + > +#pragma acc routine seq > +static unsigned int __attribute__ ((optimize ("O2"))) acc_vector () > +{ > + if (acc_on_device ((int) acc_device_host)) > + return 0; > + else if (acc_on_device ((int) acc_device_nvidia)) > + return __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR); > + else > + __builtin_abort (); > +} Compilation of 'libgomp.oacc-fortran/serial-dims.f90' fails: serial-dims.f90:(.text+0x124): undefined reference to `acc_gang' serial-dims.f90:(.text+0x130): undefined reference to `acc_gang' serial-dims.f90:(.text+0x13c): undefined reference to `acc_worker' serial-dims.f90:(.text+0x148): undefined reference to `acc_worker' serial-dims.f90:(.text+0x154): undefined reference to `acc_vector' serial-dims.f90:(.text+0x160): undefined reference to `acc_vector' Have to remove 'static' from 'acc_gang', 'acc_worker', 'acc_vector'. > --- /dev/null > +++ b/libgomp/testsuite/libgomp.oacc-fortran/serial-dims.f90 > @@ -0,0 +1,89 @@ > +! OpenACC dimensions with the serial construct. This needs '{ dg-do run }' for torture testing. > + > +! { dg-additional-sources serial-dims-aux.c } > +! { dg-warning "command line option '-fintrinsic-modules-path=3D.*' is v= alid for Fortran but not for C" } We get: FAIL: libgomp.oacc-fortran/serial-dims.f90 -DACC_DEVICE_TYPE_host=3D1 -= DACC_MEM_SHARED=3D1 -foffload=3Ddisable -O (test for warnings, line 4) FAIL: libgomp.oacc-fortran/serial-dims.f90 -DACC_DEVICE_TYPE_host=3D1 -= DACC_MEM_SHARED=3D1 -foffload=3Ddisable -O (test for excess errors) ..., with: Excess errors: cc1: warning: command-line option '-fintrinsic-modules-path=3D[...]' is= valid for Fortran but not for C That's because that diagnostic doesn't appear on the line where the 'dg-warning' directive is present (line 4). I changed that to 'dg-prune-output', but I wonder if there's a better way, so that we can specify to expect/match a diagnostic without line number information -- I can't remember whether such a thing exists. However, that still fails: "command[-]line option" typo. ;-) > +module acc_routines > + implicit none (type, external) > + > + interface > + integer function acc_gang() bind(C) > + !$acc routine seq > + end function acc_gang > + > + integer function acc_worker() bind(C) > + !$acc routine seq > + end function acc_worker > + > + integer function acc_vector() bind(C) > + !$acc routine seq > + end function acc_vector > + end interface > +end module acc_routines With '-Wall', we're told: 14 | integer function acc_gang() bind(C) | 1 Warning: Variable 'acc_gang' at (1) may not be a C interoperable kind b= ut it is BIND(C) [-Wc-binding-type] 22 | integer function acc_vector() bind(C) | 1 Warning: Variable 'acc_vector' at (1) may not be a C interoperable kind= but it is BIND(C) [-Wc-binding-type] 18 | integer function acc_worker() bind(C) | 1 Warning: Variable 'acc_worker' at (1) may not be a C interoperable kind= but it is BIND(C) [-Wc-binding-type] I have not yet looked into that. > +program main > + use iso_c_binding > + use openacc > + use acc_routines > + implicit none (type, external) > + > + integer :: gangs_min, gangs_max > + integer :: workers_min, workers_max > + integer :: vectors_min, vectors_max > + integer :: gangs_actual, workers_actual, vectors_actual > + integer :: i, j, k > + > + call acc_init (acc_device_default) > + > + ! Serial OpenACC constructs must get launched as 1 x 1 x 1. > + gangs_min =3D huge(gangs_min) > + workers_min =3D huge(workers_min) > + vectors_min =3D huge(vectors_min) > + gangs_max =3D -huge(gangs_max) - 1 ! INT_MIN > + workers_max =3D -huge(gangs_max) - 1 > + vectors_max =3D -huge(gangs_max) - 1 Indeed the C/C++ initializes '*_min' variables with 'INT_MAX', and '*_max' variables with 'INT_MIN'. Is the above the generic Fortran counter part for that? > + gangs_actual =3D 1 > + workers_actual =3D 1 > + vectors_actual =3D 1 > + > + !$acc serial ! { dg-warning "using vector_length \\(32\\), ignoring 1"= "" { target openacc_nvidia_accel_selected } } > + if (acc_on_device (acc_device_nvidia)) then > + ! The GCC nvptx back end enforces vector_length (32). > + vectors_actual =3D 32 > + elseif (acc_on_device (acc_device_gcn)) then > + ! AMD GCN relies on the autovectorizer for the vector dimension: > + ! the loop below isn't likely to be vectorized, so vectors_actual > + ! is effectively 1. > + vectors_actual =3D 1 We're told: [...]/libgomp.oacc-fortran/serial-dims.f90:53:41: Error: Symbol 'acc_de= vice_gcn' at (1) has no IMPLICIT type; did you mean 'acc_device_kind'? AMD GCN offloading support doesn't exist on trunk yet, so removed that here. > + elseif (.not. acc_on_device (acc_device_host)) then > + stop 1 > + end if > + > +!$acc loop gang & > +!$acc & reduction (min: gangs_min, workers_min, vectors_min) & > +!$acc & reduction (max: gangs_max, workers_max, vectors_max) > + do i =3D 100 * gangs_actual, -99 * gangs_actual, -1 > +!$acc loop worker & > +!$acc & reduction (min: gangs_min, workers_min, vectors_min) & > +!$acc & reduction (max: gangs_max, workers_max, vectors_max) > + do j =3D 100 * workers_actual, -99 * workers_actual, -1 > +!$acc loop vector & > +!$acc & reduction (min: gangs_min, workers_min, vectors_min) & > +!$acc & reduction (max: gangs_max, workers_max, vectors_max) > + do k =3D 100 * vectors_actual, -99 * vectors_actual, -1 > + gangs_min =3D acc_gang (); > + gangs_max =3D acc_gang (); > + workers_min =3D acc_worker (); > + workers_max =3D acc_worker (); > + vectors_min =3D acc_vector (); > + vectors_max =3D acc_vector (); > + end do > + end do > + end do > + if (gangs_min /=3D 0 .or. gangs_max /=3D gangs_actual - 1 & > + .or. workers_min /=3D 0 .or. workers_max /=3D workers_actual - 1 & > + .or. vectors_min /=3D 0 .or. vectors_max /=3D vectors_actual - 1) & > + stop 2 > +!$acc end serial > + > +end program main Gr=C3=BC=C3=9Fe Thomas --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename=0001-incremental-into-Add-OpenACC-2.6-serial-construct-su.patch Content-Transfer-Encoding: quoted-printable Content-length: 33003 =46rom 788b2ec11009e4c36b28834914cb251134c3b761 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Sun, 10 Nov 2019 22:33:43 +0100 Subject: [PATCH] incremental, into Add OpenACC 2.6 `serial' construct suppo= rt --- gcc/fortran/match.h | 2 +- gcc/fortran/openmp.c | 16 +-- gcc/fortran/parse.c | 2 +- gcc/fortran/trans-openmp.c | 8 +- gcc/gimple.def | 2 +- gcc/gimple.h | 12 +- gcc/gimplify.c | 4 +- gcc/omp-expand.c | 8 +- gcc/omp-low.c | 7 +- .../c-c++-common/goacc/parallel-dims-2.c | 16 ++- .../c-c++-common/goacc/serial-dims.c | 12 -- .../{serial-dims.f90 =3D> parallel-dims-2.f90} | 34 ++--- gcc/tree.h | 3 +- .../libgomp.oacc-c-c++-common/parallel-dims.c | 73 +++++++++++ .../libgomp.oacc-c-c++-common/serial-dims.c | 92 -------------- ...{serial-dims-aux.c =3D> parallel-dims-aux.c} | 14 +- .../libgomp.oacc-fortran/parallel-dims.f90 | 120 ++++++++++++++++++ .../libgomp.oacc-fortran/serial-dims.f90 | 89 ------------- 18 files changed, 257 insertions(+), 257 deletions(-) delete mode 100644 gcc/testsuite/c-c++-common/goacc/serial-dims.c rename gcc/testsuite/gfortran.dg/goacc/{serial-dims.f90 =3D> parallel-dims= -2.f90} (50%) delete mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims= .c rename libgomp/testsuite/libgomp.oacc-fortran/{serial-dims-aux.c =3D> para= llel-dims-aux.c} (67%) create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90 delete mode 100644 libgomp/testsuite/libgomp.oacc-fortran/serial-dims.f90 diff --git a/gcc/fortran/match.h b/gcc/fortran/match.h index 954af72f0e07..7f3d356cbe49 100644 --- a/gcc/fortran/match.h +++ b/gcc/fortran/match.h @@ -146,9 +146,9 @@ match gfc_match_oacc_kernels (void); match gfc_match_oacc_kernels_loop (void); match gfc_match_oacc_parallel (void); match gfc_match_oacc_parallel_loop (void); -match gfc_match_oacc_enter_data (void); match gfc_match_oacc_serial (void); match gfc_match_oacc_serial_loop (void); +match gfc_match_oacc_enter_data (void); match gfc_match_oacc_exit_data (void); match gfc_match_oacc_routine (void); =20 diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index 198facce636d..dc0521b40f0b 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -1965,14 +1965,11 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const = omp_mask mask, | OMP_CLAUSE_CREATE | OMP_CLAUSE_PRESENT | OMP_CLAUSE_DEFAULT \ | OMP_CLAUSE_WAIT) #define OACC_SERIAL_CLAUSES \ - (omp_mask (OMP_CLAUSE_ASYNC) | OMP_CLAUSE_WAIT \ - | OMP_CLAUSE_IF \ - | OMP_CLAUSE_REDUCTION \ + (omp_mask (OMP_CLAUSE_IF) | OMP_CLAUSE_ASYNC | OMP_CLAUSE_REDUCTION = \ | OMP_CLAUSE_COPY | OMP_CLAUSE_COPYIN | OMP_CLAUSE_COPYOUT \ - | OMP_CLAUSE_CREATE | OMP_CLAUSE_PRESENT \ - | OMP_CLAUSE_DEVICEPTR \ - | OMP_CLAUSE_PRIVATE | OMP_CLAUSE_FIRSTPRIVATE \ - | OMP_CLAUSE_DEFAULT) + | OMP_CLAUSE_CREATE | OMP_CLAUSE_PRESENT | OMP_CLAUSE_DEVICEPTR \ + | OMP_CLAUSE_PRIVATE | OMP_CLAUSE_FIRSTPRIVATE | OMP_CLAUSE_DEFAULT = \ + | OMP_CLAUSE_WAIT) #define OACC_DATA_CLAUSES \ (omp_mask (OMP_CLAUSE_IF) | OMP_CLAUSE_DEVICEPTR | OMP_CLAUSE_COPY = \ | OMP_CLAUSE_COPYIN | OMP_CLAUSE_COPYOUT | OMP_CLAUSE_CREATE \ @@ -1986,6 +1983,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, const om= p_mask mask, (OACC_LOOP_CLAUSES | OACC_PARALLEL_CLAUSES) #define OACC_KERNELS_LOOP_CLAUSES \ (OACC_LOOP_CLAUSES | OACC_KERNELS_CLAUSES) +#define OACC_SERIAL_LOOP_CLAUSES \ + (OACC_LOOP_CLAUSES | OACC_SERIAL_CLAUSES) #define OACC_HOST_DATA_CLAUSES omp_mask (OMP_CLAUSE_USE_DEVICE) #define OACC_DECLARE_CLAUSES \ (omp_mask (OMP_CLAUSE_COPY) | OMP_CLAUSE_COPYIN | OMP_CLAUSE_COPYOUT = \ @@ -2050,8 +2049,7 @@ gfc_match_oacc_kernels (void) match gfc_match_oacc_serial_loop (void) { - return match_acc (EXEC_OACC_SERIAL_LOOP, - OACC_SERIAL_CLAUSES | OACC_LOOP_CLAUSES); + return match_acc (EXEC_OACC_SERIAL_LOOP, OACC_SERIAL_LOOP_CLAUSES); } =20 =20 diff --git a/gcc/fortran/parse.c b/gcc/fortran/parse.c index 1a38606682ca..e44cc6971983 100644 --- a/gcc/fortran/parse.c +++ b/gcc/fortran/parse.c @@ -5119,7 +5119,7 @@ parse_oacc_structured_block (gfc_statement acc_st) pop_state (); } =20 -/* Parse the statements of OpenACC loop/parallel loop/kernels loop. */ +/* Parse the statements of OpenACC 'loop', or combined compute 'loop'. */ =20 static gfc_statement parse_oacc_loop (gfc_statement acc_st) diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index 573b55b066f3..d9dfcabc65ef 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -3193,8 +3193,9 @@ gfc_trans_omp_code (gfc_code *code, bool force_empty) return stmt; } =20 -/* Trans OpenACC directives. */ -/* parallel, serial, kernels, data and host_data. */ +/* Translate OpenACC 'parallel', 'kernels', 'serial', 'data', 'host_data' + construct. */ + static tree gfc_trans_oacc_construct (gfc_code *code) { @@ -4020,7 +4021,8 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stm= tblock_t *pblock, return gfc_finish_block (&block); } =20 -/* Combined OpenACC parallel loop, kernels loop and serial loop. */ +/* Translate combined OpenACC 'parallel loop', 'kernels loop', 'serial loo= p' + construct. */ =20 static tree gfc_trans_oacc_combined_directive (gfc_code *code) diff --git a/gcc/gimple.def b/gcc/gimple.def index dd64419e8eb6..38c11f41156d 100644 --- a/gcc/gimple.def +++ b/gcc/gimple.def @@ -359,7 +359,7 @@ DEFGSCODE(GIMPLE_OMP_SECTIONS_SWITCH, "gimple_omp_secti= ons_switch", GSS_BASE) DEFGSCODE(GIMPLE_OMP_SINGLE, "gimple_omp_single", GSS_OMP_SINGLE_LAYOUT) =20 /* GIMPLE_OMP_TARGET represents - #pragma acc {kernels,parallel,data,enter data,exit data,update} + #pragma acc {kernels,parallel,serial,data,enter data,exit data,update} #pragma omp target {,data,update} BODY is the sequence of statements inside the construct (NULL for some variants). diff --git a/gcc/gimple.h b/gcc/gimple.h index 83a449be3643..5a190b1714dc 100644 --- a/gcc/gimple.h +++ b/gcc/gimple.h @@ -177,12 +177,12 @@ enum gf_mask { GF_OMP_TARGET_KIND_EXIT_DATA =3D 4, GF_OMP_TARGET_KIND_OACC_PARALLEL =3D 5, GF_OMP_TARGET_KIND_OACC_KERNELS =3D 6, - GF_OMP_TARGET_KIND_OACC_DATA =3D 7, - GF_OMP_TARGET_KIND_OACC_UPDATE =3D 8, - GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA =3D 9, - GF_OMP_TARGET_KIND_OACC_DECLARE =3D 10, - GF_OMP_TARGET_KIND_OACC_HOST_DATA =3D 11, - GF_OMP_TARGET_KIND_OACC_SERIAL =3D 12, + GF_OMP_TARGET_KIND_OACC_SERIAL =3D 7, + GF_OMP_TARGET_KIND_OACC_DATA =3D 8, + GF_OMP_TARGET_KIND_OACC_UPDATE =3D 9, + GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA =3D 10, + GF_OMP_TARGET_KIND_OACC_DECLARE =3D 11, + GF_OMP_TARGET_KIND_OACC_HOST_DATA =3D 12, GF_OMP_TEAMS_GRID_PHONY =3D 1 << 0, GF_OMP_TEAMS_HOST =3D 1 << 1, =20 diff --git a/gcc/gimplify.c b/gcc/gimplify.c index 31429d5ac3ba..87a640545141 100644 --- a/gcc/gimplify.c +++ b/gcc/gimplify.c @@ -161,7 +161,7 @@ enum omp_region_type ORT_ACC_DATA =3D ORT_ACC | ORT_TARGET_DATA, /* Data construct. */ ORT_ACC_PARALLEL =3D ORT_ACC | ORT_TARGET, /* Parallel construct */ ORT_ACC_KERNELS =3D ORT_ACC | ORT_TARGET | 2, /* Kernels construct. */ - ORT_ACC_SERIAL =3D ORT_ACC | ORT_TARGET | 4, /* Serial construct. */ + ORT_ACC_SERIAL =3D ORT_ACC | ORT_TARGET | 4, /* Serial construct. */ ORT_ACC_HOST_DATA =3D ORT_ACC | ORT_TARGET_DATA | 2, /* Host data. */ =20 /* Dummy OpenMP region, used to disable expansion of @@ -10101,7 +10101,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gim= ple_seq body, tree *list_p, break; } decl =3D OMP_CLAUSE_DECL (c); - /* Data clauses associated with acc parallel reductions must be + /* Data clauses associated with reductions must be compatible with present_or_copy. Warn and adjust the clause if that is not the case. */ if (ctx->region_type =3D=3D ORT_ACC_PARALLEL diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c index d242f4e1ae99..6f945011cf5a 100644 --- a/gcc/omp-expand.c +++ b/gcc/omp-expand.c @@ -7914,8 +7914,8 @@ expand_omp_target (struct omp_region *region) case GF_OMP_TARGET_KIND_UPDATE: case GF_OMP_TARGET_KIND_ENTER_DATA: case GF_OMP_TARGET_KIND_EXIT_DATA: - case GF_OMP_TARGET_KIND_OACC_KERNELS: case GF_OMP_TARGET_KIND_OACC_PARALLEL: + case GF_OMP_TARGET_KIND_OACC_KERNELS: case GF_OMP_TARGET_KIND_OACC_SERIAL: case GF_OMP_TARGET_KIND_OACC_UPDATE: case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA: @@ -8171,8 +8171,8 @@ expand_omp_target (struct omp_region *region) start_ix =3D BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA; flags_i |=3D GOMP_TARGET_FLAG_EXIT_DATA; break; - case GF_OMP_TARGET_KIND_OACC_KERNELS: case GF_OMP_TARGET_KIND_OACC_PARALLEL: + case GF_OMP_TARGET_KIND_OACC_KERNELS: case GF_OMP_TARGET_KIND_OACC_SERIAL: start_ix =3D BUILT_IN_GOACC_PARALLEL; break; @@ -8938,8 +8938,8 @@ build_omp_regions_1 (basic_block bb, struct omp_regio= n *parent, { case GF_OMP_TARGET_KIND_REGION: case GF_OMP_TARGET_KIND_DATA: - case GF_OMP_TARGET_KIND_OACC_KERNELS: case GF_OMP_TARGET_KIND_OACC_PARALLEL: + case GF_OMP_TARGET_KIND_OACC_KERNELS: case GF_OMP_TARGET_KIND_OACC_SERIAL: case GF_OMP_TARGET_KIND_OACC_DATA: case GF_OMP_TARGET_KIND_OACC_HOST_DATA: @@ -9193,8 +9193,8 @@ omp_make_gimple_edges (basic_block bb, struct omp_reg= ion **region, { case GF_OMP_TARGET_KIND_REGION: case GF_OMP_TARGET_KIND_DATA: - case GF_OMP_TARGET_KIND_OACC_KERNELS: case GF_OMP_TARGET_KIND_OACC_PARALLEL: + case GF_OMP_TARGET_KIND_OACC_KERNELS: case GF_OMP_TARGET_KIND_OACC_SERIAL: case GF_OMP_TARGET_KIND_OACC_DATA: case GF_OMP_TARGET_KIND_OACC_HOST_DATA: diff --git a/gcc/omp-low.c b/gcc/omp-low.c index abf63334ca05..781e7cbf27a2 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -185,7 +185,8 @@ static tree scan_omp_1_op (tree *, int *, void *); *handled_ops_p =3D false; \ break; =20 -/* Return true if CTX corresponds to an oacc parallel or serial region. */ +/* Return true if CTX corresponds to an OpenACC 'parallel' or 'serial' + region. */ =20 static bool is_oacc_parallel_or_serial (omp_context *ctx) @@ -2419,7 +2420,7 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx) if (check && OMP_CLAUSE_OPERAND (c, 0)) error_at (gimple_location (stmt), "argument not permitted on %qs clause in" - " OpenACC %", check); + " OpenACC % or %", check); } =20 if (tgt && is_oacc_kernels (tgt)) @@ -11498,7 +11499,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_= context *ctx) if (!maybe_lookup_field (var, ctx)) continue; =20 - /* Don't remap oacc parallel reduction variables, because the + /* Don't remap compute constructs' reduction variables, because the intermediate result must be local to each gang. */ if (offloaded && !(OMP_CLAUSE_CODE (c) =3D=3D OMP_CLAUSE_MAP && OMP_CLAUSE_MAP_IN_REDUCTION (c))) diff --git a/gcc/testsuite/c-c++-common/goacc/parallel-dims-2.c b/gcc/tests= uite/c-c++-common/goacc/parallel-dims-2.c index acfbe7ff031a..31c4ee349f2c 100644 --- a/gcc/testsuite/c-c++-common/goacc/parallel-dims-2.c +++ b/gcc/testsuite/c-c++-common/goacc/parallel-dims-2.c @@ -1,5 +1,7 @@ -/* Invalid use of OpenACC parallelism dimensions clauses: num_gangs, - num_workers, vector_length. */ +/* Invalid use of OpenACC parallelism dimensions clauses: 'num_gangs', + 'num_workers', 'vector_length'. */ + +/* See also '../../gfortran.dg/goacc/parallel-dims-2.f90'. */ =20 void f(int i, float f) { @@ -255,4 +257,14 @@ void f(int i, float f) vector_length(&f) /* { dg-error "'vector_length' expression must be inte= gral" } */ \ num_gangs( /* { dg-error "expected (primary-|)expression before end of l= ine" "TODO" { xfail c } } */ ; + + + /* The 'serial' construct doesn't allow these at all. */ + +#pragma acc serial num_gangs (1) /* { dg-error "'num_gangs' is not valid f= or '#pragma acc serial'" } */ + ; +#pragma acc serial num_workers (1) /* { dg-error "'num_workers' is not val= id for '#pragma acc serial'" } */ + ; +#pragma acc serial vector_length (1) /* { dg-error "'vector_length' is not= valid for '#pragma acc serial'" } */ + ; } diff --git a/gcc/testsuite/c-c++-common/goacc/serial-dims.c b/gcc/testsuite= /c-c++-common/goacc/serial-dims.c deleted file mode 100644 index 41698d279c98..000000000000 --- a/gcc/testsuite/c-c++-common/goacc/serial-dims.c +++ /dev/null @@ -1,12 +0,0 @@ -/* Invalid use of OpenACC parallelism dimensions clauses: num_gangs, - num_workers, vector_length with the serial construct. */ - -void f(void) -{ -#pragma acc serial num_gangs (1) /* { dg-error "'num_gangs' is not valid f= or '#pragma acc serial'" } */ - ; -#pragma acc serial num_workers (1) /* { dg-error "'num_workers' is not val= id for '#pragma acc serial'" } */ - ; -#pragma acc serial vector_length (1) /* { dg-error "'vector_length' is not= valid for '#pragma acc serial'" } */ - ; -} diff --git a/gcc/testsuite/gfortran.dg/goacc/serial-dims.f90 b/gcc/testsuit= e/gfortran.dg/goacc/parallel-dims-2.f90 similarity index 50% rename from gcc/testsuite/gfortran.dg/goacc/serial-dims.f90 rename to gcc/testsuite/gfortran.dg/goacc/parallel-dims-2.f90 index 72b4a8361776..91a5c300a94c 100644 --- a/gcc/testsuite/gfortran.dg/goacc/serial-dims.f90 +++ b/gcc/testsuite/gfortran.dg/goacc/parallel-dims-2.f90 @@ -1,34 +1,15 @@ -! Invalid use of OpenACC parallelism dimensions clauses: num_gangs, -! num_workers, vector_length with the serial construct. +! Invalid use of OpenACC parallelism dimensions clauses: 'num_gangs', +! 'num_workers', 'vector_length'. =20 -subroutine s() - integer :: i - !$acc parallel - !$acc end parallel +! See also '../../c-c++-common/goacc/parallel-dims-2.c'. =20 - !$acc parallel loop - do i =3D 1, 5 - end do - - !$acc parallel loop - do i =3D 1, 5 - end do - !$acc end parallel loop - - !$acc serial loop - do i =3D 1, 5 - end do +subroutine f() + !TODO 'kernels', 'parallel' testing per '../../c-c++-common/goacc/parall= el-dims-2.c'. + !TODO This should incorporate some of the testing done in 'sie.f95'. =20 - !$acc serial loop - do i =3D 1, 5 - end do - !$acc end serial loop =20 - !$acc serial - !$acc end serial -end subroutine s + ! The 'serial' construct doesn't allow these at all. =20 -subroutine f() !$acc serial num_gangs (1) ! { dg-error "Failed to match clause at" } !$acc end serial ! { dg-error "Unexpected !.ACC END SERIAL statement" } =20 @@ -37,4 +18,5 @@ subroutine f() =20 !$acc serial vector_length (1) ! { dg-error "Failed to match clause at" } !$acc end serial ! { dg-error "Unexpected !.ACC END SERIAL statement" } + end subroutine f diff --git a/gcc/tree.h b/gcc/tree.h index a7d39c3a74df..4bec90d9a729 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -1622,7 +1622,8 @@ class auto_suppress_location_wrappers treatment if OMP_CLAUSE_SIZE is zero. */ #define OMP_CLAUSE_MAP_MAYBE_ZERO_LENGTH_ARRAY_SECTION(NODE) \ TREE_PROTECTED (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)) -/* Nonzero if this map clause is for an ACC parallel reduction variable. = */ +/* Nonzero if this map clause is for an OpenACC compute construct's reduct= ion + variable. */ #define OMP_CLAUSE_MAP_IN_REDUCTION(NODE) \ TREE_PRIVATE (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_MAP)) =20 diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c b/= libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c index 7e699f476b21..a5edfc6ca164 100644 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-dims.c @@ -1,6 +1,8 @@ /* OpenACC parallelism dimensions clauses: num_gangs, num_workers, vector_length. */ =20 +/* See also '../libgomp.oacc-fortran/parallel-dims.f90'. */ + #include #include #include @@ -45,6 +47,8 @@ int main () { acc_init (acc_device_default); =20 + /* OpenACC parallel construct. */ + /* Non-positive value. */ =20 /* GR, WS, VS. */ @@ -478,6 +482,8 @@ int main () } =20 =20 + /* OpenACC kernels construct. */ + /* We can't test parallelized OpenACC kernels constructs in this way: us= e of the acc_gang, acc_worker, acc_vector functions will make the construct unparallelizable. */ @@ -544,5 +550,72 @@ int main () } =20 =20 + /* OpenACC serial construct. */ + + /* GR, WS, VS. */ + { + int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vecto= rs_max; + gangs_min =3D workers_min =3D vectors_min =3D INT_MAX; + gangs_max =3D workers_max =3D vectors_max =3D INT_MIN; +#pragma acc serial /* { dg-warning "using vector_length \\(32\\), ignoring= 1" "" { target openacc_nvidia_accel_selected } } */ \ + reduction (min: gangs_min, workers_min, vectors_min) reduction (max: gan= gs_max, workers_max, vectors_max) + { + for (int i =3D 100; i > -100; i--) + { + gangs_min =3D gangs_max =3D acc_gang (); + workers_min =3D workers_max =3D acc_worker (); + vectors_min =3D vectors_max =3D acc_vector (); + } + } + if (gangs_min !=3D 0 || gangs_max !=3D 1 - 1 + || workers_min !=3D 0 || workers_max !=3D 1 - 1 + || vectors_min !=3D 0 || vectors_max !=3D 1 - 1) + __builtin_abort (); + } + + /* Composition of GP, WP, VP. */ + { + int vectors_actual =3D 1; /* Implicit 'vector_length (1)' clause. */ + int gangs_min, gangs_max, workers_min, workers_max, vectors_min, vecto= rs_max; + gangs_min =3D workers_min =3D vectors_min =3D INT_MAX; + gangs_max =3D workers_max =3D vectors_max =3D INT_MIN; +#pragma acc serial copy (vectors_actual) /* { dg-warning "using vector_len= gth \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } } */ \ + copy (gangs_min, gangs_max, workers_min, workers_max, vectors_min, vecto= rs_max) + { + if (acc_on_device (acc_device_nvidia)) + { + /* The GCC nvptx back end enforces vector_length (32). */ + /* It's unclear if that's actually permissible here; + "OpenACC + 'serial' construct might not actually be serial". */ + vectors_actual =3D 32; + } +#pragma acc loop gang reduction (min: gangs_min, workers_min, vectors_min)= reduction (max: gangs_max, workers_max, vectors_max) + for (int i =3D 100; i > -100; i--) +#pragma acc loop worker reduction (min: gangs_min, workers_min, vectors_mi= n) reduction (max: gangs_max, workers_max, vectors_max) + for (int j =3D 100; j > -100; j--) +#pragma acc loop vector reduction (min: gangs_min, workers_min, vectors_mi= n) reduction (max: gangs_max, workers_max, vectors_max) + for (int k =3D 100 * vectors_actual; k > -100 * vectors_actual; k--) + { + gangs_min =3D gangs_max =3D acc_gang (); + workers_min =3D workers_max =3D acc_worker (); + vectors_min =3D vectors_max =3D acc_vector (); + } + } + if (acc_get_device_type () =3D=3D acc_device_nvidia) + { + if (vectors_actual !=3D 32) + __builtin_abort (); + } + else + if (vectors_actual !=3D 1) + __builtin_abort (); + if (gangs_min !=3D 0 || gangs_max !=3D 1 - 1 + || workers_min !=3D 0 || workers_max !=3D 1 - 1 + || vectors_min !=3D 0 || vectors_max !=3D vectors_actual - 1) + __builtin_abort (); + } + + return 0; } diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c b/li= bgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c deleted file mode 100644 index bb91c9221f89..000000000000 --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/serial-dims.c +++ /dev/null @@ -1,92 +0,0 @@ -/* OpenACC dimensions with the serial construct. */ - -#include -#include -#include - -/* TODO: "(int) acc_device_*" casts because of the C++ acc_on_device wrapp= er - not behaving as expected for -O0. */ -#pragma acc routine seq -static unsigned int __attribute__ ((optimize ("O2"))) acc_gang () -{ - if (acc_on_device ((int) acc_device_host)) - return 0; - else if (acc_on_device ((int) acc_device_nvidia)) - return __builtin_goacc_parlevel_id (GOMP_DIM_GANG); - else - __builtin_abort (); -} - -#pragma acc routine seq -static unsigned int __attribute__ ((optimize ("O2"))) acc_worker () -{ - if (acc_on_device ((int) acc_device_host)) - return 0; - else if (acc_on_device ((int) acc_device_nvidia)) - return __builtin_goacc_parlevel_id (GOMP_DIM_WORKER); - else - __builtin_abort (); -} - -#pragma acc routine seq -static unsigned int __attribute__ ((optimize ("O2"))) acc_vector () -{ - if (acc_on_device ((int) acc_device_host)) - return 0; - else if (acc_on_device ((int) acc_device_nvidia)) - return __builtin_goacc_parlevel_id (GOMP_DIM_VECTOR); - else - __builtin_abort (); -} - - -int main () -{ - acc_init (acc_device_default); - - /* Serial OpenACC constructs must get launched as 1 x 1 x 1. */ - { - int gangs_min, gangs_max; - int workers_min, workers_max; - int vectors_min, vectors_max; - int gangs_actual, workers_actual, vectors_actual; - int i, j, k; - - gangs_min =3D workers_min =3D vectors_min =3D INT_MAX; - gangs_max =3D workers_max =3D vectors_max =3D INT_MIN; - gangs_actual =3D workers_actual =3D vectors_actual =3D 1; -#pragma acc serial /* { dg-warning "using vector_length \\(32\\), ignoring= 1" "" { target openacc_nvidia_accel_selected } } */ - { - if (acc_on_device (acc_device_nvidia)) - { - /* The GCC nvptx back end enforces vector_length (32). */ - vectors_actual =3D 32; - } - else if (!acc_on_device (acc_device_host)) - __builtin_abort (); -#pragma acc loop gang \ - reduction (min: gangs_min, workers_min, vectors_min) \ - reduction (max: gangs_max, workers_max, vectors_max) - for (i =3D 100 * gangs_actual; i > -100 * gangs_actual; i--) -#pragma acc loop worker \ - reduction (min: gangs_min, workers_min, vectors_min) \ - reduction (max: gangs_max, workers_max, vectors_max) - for (j =3D 100 * workers_actual; j > -100 * workers_actual; j--) -#pragma acc loop vector \ - reduction (min: gangs_min, workers_min, vectors_min) \ - reduction (max: gangs_max, workers_max, vectors_max) - for (k =3D 100 * vectors_actual; k > -100 * vectors_actual; k--) - { - gangs_min =3D gangs_max =3D acc_gang (); - workers_min =3D workers_max =3D acc_worker (); - vectors_min =3D vectors_max =3D acc_vector (); - } - if (gangs_min !=3D 0 || gangs_max !=3D gangs_actual - 1 - || workers_min !=3D 0 || workers_max !=3D workers_actual - 1 - || vectors_min !=3D 0 || vectors_max !=3D vectors_actual - 1) - __builtin_abort (); - } - } - - return 0; -} diff --git a/libgomp/testsuite/libgomp.oacc-fortran/serial-dims-aux.c b/lib= gomp/testsuite/libgomp.oacc-fortran/parallel-dims-aux.c similarity index 67% rename from libgomp/testsuite/libgomp.oacc-fortran/serial-dims-aux.c rename to libgomp/testsuite/libgomp.oacc-fortran/parallel-dims-aux.c index 45c260510c29..b5986f4afef7 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/serial-dims-aux.c +++ b/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims-aux.c @@ -1,5 +1,9 @@ -/* OpenACC dimensions with the serial construct. */ -/* Used by serial-dims.f90. */ +/* OpenACC parallelism dimensions clauses: num_gangs, num_workers, + vector_length. */ + +/* Copied from '../libgomp.oacc-c-c++-common/parallel-dims.c'. */ + +/* Used by 'parallel-dims.f90'. */ =20 #include #include @@ -8,7 +12,7 @@ /* TODO: "(int) acc_device_*" casts because of the C++ acc_on_device wrapp= er not behaving as expected for -O0. */ #pragma acc routine seq -static unsigned int __attribute__ ((optimize ("O2"))) acc_gang () +/* static */ unsigned int __attribute__ ((optimize ("O2"))) acc_gang () { if (acc_on_device ((int) acc_device_host)) return 0; @@ -19,7 +23,7 @@ static unsigned int __attribute__ ((optimize ("O2"))) acc= _gang () } =20 #pragma acc routine seq -static unsigned int __attribute__ ((optimize ("O2"))) acc_worker () +/* static */ unsigned int __attribute__ ((optimize ("O2"))) acc_worker () { if (acc_on_device ((int) acc_device_host)) return 0; @@ -30,7 +34,7 @@ static unsigned int __attribute__ ((optimize ("O2"))) acc= _worker () } =20 #pragma acc routine seq -static unsigned int __attribute__ ((optimize ("O2"))) acc_vector () +/* static */ unsigned int __attribute__ ((optimize ("O2"))) acc_vector () { if (acc_on_device ((int) acc_device_host)) return 0; diff --git a/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90 b/lib= gomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90 new file mode 100644 index 000000000000..1bfcd6ce0998 --- /dev/null +++ b/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90 @@ -0,0 +1,120 @@ +! OpenACC parallelism dimensions clauses: num_gangs, num_workers, +! vector_length. + +! { dg-additional-sources parallel-dims-aux.c } +! { dg-do run } +! { dg-prune-output "command-line option '-fintrinsic-modules-path=3D.*' i= s valid for Fortran but not for C" } + +! See also '../libgomp.oacc-c-c++-common/parallel-dims.c'. + +module acc_routines + implicit none (type, external) + + interface + integer function acc_gang() bind(C) + !$acc routine seq + end function acc_gang + + integer function acc_worker() bind(C) + !$acc routine seq + end function acc_worker + + integer function acc_vector() bind(C) + !$acc routine seq + end function acc_vector + end interface +end module acc_routines + +program main + use iso_c_binding + use openacc + use acc_routines + implicit none (type, external) + + integer :: gangs_min, gangs_max, workers_min, workers_max, vectors_min, = vectors_max + integer :: vectors_actual + integer :: i, j, k + + call acc_init (acc_device_default) + + ! OpenACC parallel construct. + + !TODO + + + ! OpenACC kernels construct. + + !TODO + + + ! OpenACC serial construct. + + ! GR, WS, VS. + + gangs_min =3D huge(gangs_min) ! INT_MAX + workers_min =3D huge(workers_min) ! INT_MAX + vectors_min =3D huge(vectors_min) ! INT_MAX + gangs_max =3D -huge(gangs_max) - 1 ! INT_MIN + workers_max =3D -huge(gangs_max) - 1 ! INT_MIN + vectors_max =3D -huge(gangs_max) - 1 ! INT_MIN + !$acc serial & + !$acc reduction (min: gangs_min, workers_min, vectors_min) reduction (= max: gangs_max, workers_max, vectors_max) ! { dg-warning "using vector_leng= th \\(32\\), ignoring 1" "" { target openacc_nvidia_accel_selected } } + do i =3D 100, -99, -1 + gangs_min =3D acc_gang (); + gangs_max =3D acc_gang (); + workers_min =3D acc_worker (); + workers_max =3D acc_worker (); + vectors_min =3D acc_vector (); + vectors_max =3D acc_vector (); + end do + !$acc end serial + if (gangs_min /=3D 0 .or. gangs_max /=3D 1 - 1 & + .or. workers_min /=3D 0 .or. workers_max /=3D 1 - 1 & + .or. vectors_min /=3D 0 .or. vectors_max /=3D 1 - 1) & + stop 1 + + ! Composition of GP, WP, VP. + + vectors_actual =3D 1 ! Implicit 'vector_length (1)' clause. + gangs_min =3D huge(gangs_min) ! INT_MAX + workers_min =3D huge(workers_min) ! INT_MAX + vectors_min =3D huge(vectors_min) ! INT_MAX + gangs_max =3D -huge(gangs_max) - 1 ! INT_MIN + workers_max =3D -huge(gangs_max) - 1 ! INT_MIN + vectors_max =3D -huge(gangs_max) - 1 ! INT_MIN + !$acc serial copy (vectors_actual) & + !$acc copy (gangs_min, gangs_max, workers_min, workers_max, vectors_mi= n, vectors_max) ! { dg-warning "using vector_length \\(32\\), ignoring 1" "= " { target openacc_nvidia_accel_selected } } + if (acc_on_device (acc_device_nvidia)) then + ! The GCC nvptx back end enforces vector_length (32). + ! It's unclear if that's actually permissible here; + ! "OpenACC 'seri= al' + ! construct might not actually be serial". + vectors_actual =3D 32 + end if + !$acc loop gang reduction (min: gangs_min, workers_min, vectors_min) red= uction (max: gangs_max, workers_max, vectors_max) + do i =3D 100, -99, -1 + !$acc loop worker reduction (min: gangs_min, workers_min, vectors_min= ) reduction (max: gangs_max, workers_max, vectors_max) + do j =3D 100, -99, -1 + !$acc loop vector reduction (min: gangs_min, workers_min, vectors_= min) reduction (max: gangs_max, workers_max, vectors_max) + do k =3D 100 * vectors_actual, -99 * vectors_actual, -1 + gangs_min =3D acc_gang (); + gangs_max =3D acc_gang (); + workers_min =3D acc_worker (); + workers_max =3D acc_worker (); + vectors_min =3D acc_vector (); + vectors_max =3D acc_vector (); + end do + end do + end do + !$acc end serial + if (acc_get_device_type () .eq. acc_device_nvidia) then + if (vectors_actual /=3D 32) stop 2 + else + if (vectors_actual /=3D 1) stop 3 + end if + if (gangs_min /=3D 0 .or. gangs_max /=3D 1 - 1 & + .or. workers_min /=3D 0 .or. workers_max /=3D 1 - 1 & + .or. vectors_min /=3D 0 .or. vectors_max /=3D vectors_actual - 1) & + stop 4 + +end program main diff --git a/libgomp/testsuite/libgomp.oacc-fortran/serial-dims.f90 b/libgo= mp/testsuite/libgomp.oacc-fortran/serial-dims.f90 deleted file mode 100644 index 25c933629045..000000000000 --- a/libgomp/testsuite/libgomp.oacc-fortran/serial-dims.f90 +++ /dev/null @@ -1,89 +0,0 @@ -! OpenACC dimensions with the serial construct. - -! { dg-additional-sources serial-dims-aux.c } -! { dg-warning "command line option '-fintrinsic-modules-path=3D.*' is val= id for Fortran but not for C" } - -module acc_routines - implicit none (type, external) - - interface - integer function acc_gang() bind(C) - !$acc routine seq - end function acc_gang - - integer function acc_worker() bind(C) - !$acc routine seq - end function acc_worker - - integer function acc_vector() bind(C) - !$acc routine seq - end function acc_vector - end interface -end module acc_routines - -program main - use iso_c_binding - use openacc - use acc_routines - implicit none (type, external) - - integer :: gangs_min, gangs_max - integer :: workers_min, workers_max - integer :: vectors_min, vectors_max - integer :: gangs_actual, workers_actual, vectors_actual - integer :: i, j, k - - call acc_init (acc_device_default) - - ! Serial OpenACC constructs must get launched as 1 x 1 x 1. - gangs_min =3D huge(gangs_min) - workers_min =3D huge(workers_min) - vectors_min =3D huge(vectors_min) - gangs_max =3D -huge(gangs_max) - 1 ! INT_MIN - workers_max =3D -huge(gangs_max) - 1 - vectors_max =3D -huge(gangs_max) - 1 - gangs_actual =3D 1 - workers_actual =3D 1 - vectors_actual =3D 1 - - !$acc serial ! { dg-warning "using vector_length \\(32\\), ignoring 1" "= " { target openacc_nvidia_accel_selected } } - if (acc_on_device (acc_device_nvidia)) then - ! The GCC nvptx back end enforces vector_length (32). - vectors_actual =3D 32 - elseif (acc_on_device (acc_device_gcn)) then - ! AMD GCN relies on the autovectorizer for the vector dimension: - ! the loop below isn't likely to be vectorized, so vectors_actual - ! is effectively 1. - vectors_actual =3D 1 - elseif (.not. acc_on_device (acc_device_host)) then - stop 1 - end if - -!$acc loop gang & -!$acc & reduction (min: gangs_min, workers_min, vectors_min) & -!$acc & reduction (max: gangs_max, workers_max, vectors_max) - do i =3D 100 * gangs_actual, -99 * gangs_actual, -1 -!$acc loop worker & -!$acc & reduction (min: gangs_min, workers_min, vectors_min) & -!$acc & reduction (max: gangs_max, workers_max, vectors_max) - do j =3D 100 * workers_actual, -99 * workers_actual, -1 -!$acc loop vector & -!$acc & reduction (min: gangs_min, workers_min, vectors_min) & -!$acc & reduction (max: gangs_max, workers_max, vectors_max) - do k =3D 100 * vectors_actual, -99 * vectors_actual, -1 - gangs_min =3D acc_gang (); - gangs_max =3D acc_gang (); - workers_min =3D acc_worker (); - workers_max =3D acc_worker (); - vectors_min =3D acc_vector (); - vectors_max =3D acc_vector (); - end do - end do - end do - if (gangs_min /=3D 0 .or. gangs_max /=3D gangs_actual - 1 & - .or. workers_min /=3D 0 .or. workers_max /=3D workers_actual - 1 & - .or. vectors_min /=3D 0 .or. vectors_max /=3D vectors_actual - 1) & - stop 2 -!$acc end serial - -end program main --=20 2.17.1 --=-=-=-- --==-=-= Content-Type: application/pgp-signature; name="signature.asc" Content-length: 658 -----BEGIN PGP SIGNATURE----- iQGzBAEBCgAdFiEEU9WEfWKGQazCmycCAKI7+41Q4XkFAl3JS/IACgkQAKI7+41Q 4Xmn/QwAzjom2z1bS9IvHMo2ikdniUYnADKHdjuPTywJFKQNZO25gMPU44FCJGfx cgx9CsdFC2BqfAwtsmzfA7kl73u+kFVumjNcVIKs5t6zBy1LEb7qyTQsyTsfJF2b 0JkfdLE0eAaVJi9xIxQFHdFPNt4Ni91pfEk1BwWsNZOrC/8F/zW+KQt3zywud9zR VzWQhodzdWROeBND/UtlNEBeiouU+zXMjGwjQ0Z4dADh99h+T3olgDlD9k7ViDHZ t9JefvKgQ8aoZlUSPHznSpfNEnfOMNTf/v9WYCr/G11ls9may1bjBuZEBTBwAIVy 29MlpiwPRpcGEbxJWjwyqj5Ym1IRyBv4G8ojwuSpXmz6Sk19jS7Vx/hn7R0YbVSJ k0kJIWVzDAe700hWZXGLEYyy48BZi9UY585rMOf7hv0oG55hxMZcpgCBMrzKEjPj eop65aP4MQ1nYA0z7MP44gGu5OcwPWB6J66iOyQlyVGVDBzeCBqC655iov/T3rzH A7367jrw =WYO2 -----END PGP SIGNATURE----- --==-=-=--