From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id 06F603858D28; Tue, 30 Nov 2021 12:05:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 06F603858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com IronPort-SDR: fjlChBzAeKGpCIiUlPt74y7a65JIag4vBp0c/iOSKXYVrkKJP/+6rFVwlYRFR75JTBRsnQeSqM JnDkTYQeyPin5Uhew4V731QPeay2dW/Avy/45lhVrLsN/4AdkGWWRbjPeyU5ZPmXl+HmzDxPPj qOmzNd74qZc5gy5KrOhUc/MzOVp63rknHf+w0uiLFUWnQENQUXU2VlzdUeQCLFXuQe1nBSgIkm CuHyOxJX9JdQQHT9AgV+AuOWZB/aj6cW8jNcoxu0sBj/RuGK45AxTsFfoTzPZLvGEvnYm0QJQ1 C9m73wv5F6ZjhGw73pvq2JVV X-IronPort-AV: E=Sophos;i="5.87,275,1631606400"; d="scan'208,223";a="69073758" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 30 Nov 2021 04:05:54 -0800 IronPort-SDR: huRZwAtNVq/RzAB15UwsT1BinLPPrR8WtGNLaseOvdh4dVLPiR6KAXgPtRTNM/nUlw/hcCIVkF 4/pzw0zIX0Fyq3BuypGi0UDQBVWENoro38MbkLPTw5Na230FuT5oeGClNYchTPhmUA09Al49pu QtOKxoLsyFMSKvBzD/cCYH7cfjNG3Ceq4WaCNMi+3k1jcYFeX4yRw5GOXBgngkkREWigxOcZrE E0IwXQHrhSDe++PfLeDd/RRHbzoY/EJTfM2So1BVcdgvzgM2rK0Yt4polcyRkgJOtV9+d2kDVX 9CY= From: Thomas Schwinge To: , Subject: Re: [gomp4] Make OpenACC orphan gang reductions errors In-Reply-To: <91521931-373f-a3a6-7a3f-1752f08858a1@codesourcery.com> References: <91521931-373f-a3a6-7a3f-1752f08858a1@codesourcery.com> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/27.1 (x86_64-pc-linux-gnu) Date: Tue, 30 Nov 2021 13:05:46 +0100 Message-ID: <87mtllx2zp.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-05.mgc.mentorg.com (139.181.222.5) To svr-ies-mbx-01.mgc.mentorg.com (139.181.222.1) X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: fortran@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Fortran mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Nov 2021 12:06:00 -0000 --=-=-= Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Hi! On 2017-05-01T18:27:59-0700, Cesar Philippidis wro= te: > This patch promotes all OpenACC gang reductions on orphan loops as > errors. Accord to the spec, orphan loops are those which are not > lexically nested inside an OpenACC parallel or kernels regions. I.e., > acc loops inside acc routines. > > At first I thought this could be a warning because the gang reduction > finalizer uses an atomic update. However, because there is no > synchronization between gangs, there is way to guarantee that reduction > will have completed once a single gang entity returns from the acc > routine call. > > I've applied this patch to gomp-4_0-branch. ... which I've now adapted (with several things to be fixed in follow-up commits) and pushed to master branch in commit 2b7dac2c0dcb087da9e4018943c023c0678234a3 "Make OpenACC orphan gang reductions errors", see attached. Gr=C3=BC=C3=9Fe Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra=C3=9Fe 201= , 80634 M=C3=BCnchen; Gesellschaft mit beschr=C3=A4nkter Haftung; Gesch=C3= =A4ftsf=C3=BChrer: Thomas Heurung, Frank Th=C3=BCrauf; Sitz der Gesellschaf= t: M=C3=BCnchen; Registergericht M=C3=BCnchen, HRB 106955 --=-=-= Content-Type: text/x-diff Content-Disposition: inline; filename="0001-Make-OpenACC-orphan-gang-reductions-errors.patch" Content-Transfer-Encoding: quoted-printable >From 2b7dac2c0dcb087da9e4018943c023c0678234a3 Mon Sep 17 00:00:00 2001 From: Cesar Philippidis Date: Mon, 1 May 2017 18:27:59 -0700 Subject: [PATCH] Make OpenACC orphan gang reductions errors This patch promotes all OpenACC gang reductions on orphan loops as errors. Accord to the spec, orphan loops are those which are not lexically nested inside an OpenACC parallel or kernels regions. I.e., acc loops inside acc routines. At first I thought this could be a warning because the gang reduction finalizer uses an atomic update. However, because there is no synchronization between gangs, there is way to guarantee that reduction will have completed once a single gang entity returns from the acc routine call. gcc/c/ * c-typeck.c (c_finish_omp_clauses): Emit an error on orphan OpenACC gang reductions. gcc/cp/ * semantics.c (finish_omp_clauses): Emit an error on orphan OpenACC gang reductions. gcc/fortran/ * openmp.c (oacc_is_parallel, oacc_is_kernels): New 'static' functions. (resolve_oacc_loop_blocks): Emit an error on orphan OpenACC gang reductions. gcc/ * omp-general.h (enum oacc_loop_flags): Add OLF_REDUCTION enum. * omp-low.c (lower_oacc_head_mark): Use it to mark OpenACC reductions. * omp-offload.c (oacc_loop_auto_partitions): Don't assign gang level parallelism to orphan reductions. gcc/testsuite/ * c-c++-common/goacc/nested-reductions-1-routine.c: Adjust. * c-c++-common/goacc/nested-reductions-2-routine.c: Likewise. * gcc.dg/goacc/loop-processing-1.c: Likewise. * gfortran.dg/goacc/nested-reductions-1-routine.f90: Likewise. * gfortran.dg/goacc/nested-reductions-2-routine.f90: Likewise. * c-c++-common/goacc/orphan-reductions-1.c: New test. * c-c++-common/goacc/orphan-reductions-2.c: New test. * gfortran.dg/goacc/orphan-reductions-1.f90: New test. * gfortran.dg/goacc/orphan-reductions-2.f90: New test. libgomp/ * testsuite/libgomp.oacc-fortran/parallel-dims.f90: Temporarily skip. Co-Authored-By: Thomas Schwinge --- gcc/c/c-typeck.c | 8 + gcc/cp/semantics.c | 8 + gcc/fortran/openmp.c | 24 ++ gcc/omp-general.h | 3 +- gcc/omp-low.c | 4 + gcc/omp-offload.c | 7 + .../goacc/nested-reductions-1-routine.c | 3 + .../goacc/nested-reductions-2-routine.c | 9 + .../c-c++-common/goacc/orphan-reductions-1.c | 56 +++++ .../c-c++-common/goacc/orphan-reductions-2.c | 87 ++++++++ .../gcc.dg/goacc/loop-processing-1.c | 2 +- .../goacc/nested-reductions-1-routine.f90 | 3 + .../goacc/nested-reductions-2-routine.f90 | 9 + .../gfortran.dg/goacc/orphan-reductions-1.f90 | 206 ++++++++++++++++++ .../gfortran.dg/goacc/orphan-reductions-2.f90 | 89 ++++++++ .../libgomp.oacc-fortran/parallel-dims.f90 | 1 + 16 files changed, 517 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c create mode 100644 gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c create mode 100644 gcc/testsuite/gfortran.dg/goacc/orphan-reductions-1.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/orphan-reductions-2.f90 diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index 7524304f2bd..a025740e618 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -14135,6 +14135,14 @@ c_finish_omp_clauses (tree clauses, enum c_omp_reg= ion_type ort) goto check_dup_generic; =20 case OMP_CLAUSE_REDUCTION: + if (ort =3D=3D C_ORT_ACC && oacc_get_fn_attrib (current_function_decl) + && omp_find_clause (clauses, OMP_CLAUSE_GANG)) + { + error_at (OMP_CLAUSE_LOCATION (c), + "gang reduction on an orphan loop"); + remove =3D true; + break; + } if (reduction_seen =3D=3D 0) reduction_seen =3D OMP_CLAUSE_REDUCTION_INSCAN (c) ? -1 : 1; else if (reduction_seen !=3D -2 diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c index cd1956497f8..c84caf43251 100644 --- a/gcc/cp/semantics.c +++ b/gcc/cp/semantics.c @@ -6667,6 +6667,14 @@ finish_omp_clauses (tree clauses, enum c_omp_region_= type ort) field_ok =3D ((ort & C_ORT_OMP_DECLARE_SIMD) =3D=3D C_ORT_OMP); goto check_dup_generic; case OMP_CLAUSE_REDUCTION: + if (ort =3D=3D C_ORT_ACC && oacc_get_fn_attrib (current_function_decl) + && omp_find_clause (clauses, OMP_CLAUSE_GANG)) + { + error_at (OMP_CLAUSE_LOCATION (c), + "gang reduction on an orphan loop"); + remove =3D true; + break; + } if (reduction_seen =3D=3D 0) reduction_seen =3D OMP_CLAUSE_REDUCTION_INSCAN (c) ? -1 : 1; else if (reduction_seen !=3D -2 diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c index d120be81467..4fa38691c01 100644 --- a/gcc/fortran/openmp.c +++ b/gcc/fortran/openmp.c @@ -8322,6 +8322,17 @@ resolve_omp_do (gfc_code *code) } } =20 +static bool +oacc_is_parallel (gfc_code *code) +{ + return code->op =3D=3D EXEC_OACC_PARALLEL || code->op =3D=3D EXEC_OACC_P= ARALLEL_LOOP; +} + +static bool +oacc_is_kernels (gfc_code *code) +{ + return code->op =3D=3D EXEC_OACC_KERNELS || code->op =3D=3D EXEC_OACC_KE= RNELS_LOOP; +} =20 static gfc_statement omp_code_to_statement (gfc_code *code) @@ -8625,6 +8636,19 @@ resolve_oacc_loop_blocks (gfc_code *code) if (!oacc_is_loop (code)) return; =20 + if (code->op =3D=3D EXEC_OACC_LOOP + && code->ext.omp_clauses->lists[OMP_LIST_REDUCTION] + && code->ext.omp_clauses->gang) + { + fortran_omp_context *c; + for (c =3D omp_current_ctx; c; c =3D c->previous) + if (!oacc_is_loop (c->code)) + break; + if (c =3D=3D NULL || !(oacc_is_parallel (c->code) + || oacc_is_kernels (c->code))) + gfc_error ("gang reduction on an orphan loop at %L", &code->loc); + } + if (code->ext.omp_clauses->tile_list && code->ext.omp_clauses->gang && code->ext.omp_clauses->worker && code->ext.omp_clauses->vector) gfc_error ("Tiled loop cannot be parallelized across gangs, workers an= d " diff --git a/gcc/omp-general.h b/gcc/omp-general.h index 8fe744c6a7a..a0c7c71148c 100644 --- a/gcc/omp-general.h +++ b/gcc/omp-general.h @@ -32,9 +32,10 @@ enum oacc_loop_flags { OLF_INDEPENDENT =3D 1u << 2, /* Iterations are known independent. */ OLF_GANG_STATIC =3D 1u << 3, /* Gang partitioning is static (has op). */ OLF_TILE =3D 1u << 4, /* Tiled loop. */ + OLF_REDUCTION =3D 1u << 5, /* Reduction loop. */ =20=20=20 /* Explicitly specified loop axes. */ - OLF_DIM_BASE =3D 5, + OLF_DIM_BASE =3D 6, OLF_DIM_GANG =3D 1u << (OLF_DIM_BASE + GOMP_DIM_GANG), OLF_DIM_WORKER =3D 1u << (OLF_DIM_BASE + GOMP_DIM_WORKER), OLF_DIM_VECTOR =3D 1u << (OLF_DIM_BASE + GOMP_DIM_VECTOR), diff --git a/gcc/omp-low.c b/gcc/omp-low.c index 63a47f62d08..de3a26e08fc 100644 --- a/gcc/omp-low.c +++ b/gcc/omp-low.c @@ -8271,6 +8271,10 @@ lower_oacc_head_mark (location_t loc, tree ddvar, tr= ee clauses, tag |=3D OLF_TILE; break; =20 + case OMP_CLAUSE_REDUCTION: + tag |=3D OLF_REDUCTION; + break; + default: continue; } diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c index 833f7ddea58..0aec26b04e7 100644 --- a/gcc/omp-offload.c +++ b/gcc/omp-offload.c @@ -1611,6 +1611,13 @@ oacc_loop_auto_partitions (oacc_loop *loop, unsigned= outer_mask, non-innermost available level. */ unsigned this_mask =3D GOMP_DIM_MASK (GOMP_DIM_GANG); =20 + /* Orphan reductions cannot have gang partitioning. */ + if ((loop->flags & OLF_REDUCTION) + && oacc_get_fn_attrib (current_function_decl) + && !lookup_attribute ("omp target entrypoint", + DECL_ATTRIBUTES (current_function_decl))) + this_mask =3D GOMP_DIM_MASK (GOMP_DIM_WORKER); + /* Find the first outermost available partition. */ while (this_mask <=3D outer_mask) this_mask <<=3D 1; diff --git a/gcc/testsuite/c-c++-common/goacc/nested-reductions-1-routine.c= b/gcc/testsuite/c-c++-common/goacc/nested-reductions-1-routine.c index 83d39950295..9e34614eb15 100644 --- a/gcc/testsuite/c-c++-common/goacc/nested-reductions-1-routine.c +++ b/gcc/testsuite/c-c++-common/goacc/nested-reductions-1-routine.c @@ -44,6 +44,7 @@ void acc_routine (void) #pragma acc loop reduction(+:sum) for (i =3D 0; i < 10; i++) #pragma acc loop reduction(+:sum) + // { dg-warning "insufficient partitioning available to parallelize = loop" "" { target *-*-* } .-1 } for (j =3D 0; j < 10; j++) #pragma acc loop reduction(+:sum) for (k =3D 0; k < 10; k++) @@ -53,12 +54,14 @@ void acc_routine (void) for (i =3D 0; i < 10; i++) { #pragma acc loop reduction(+:sum) + // { dg-warning "insufficient partitioning available to parallelize loop"= "" { target *-*-* } .-1 } for (j =3D 0; j < 10; j++) #pragma acc loop reduction(+:sum) for (k =3D 0; k < 10; k++) sum =3D 1; =20 #pragma acc loop reduction(-:diff) + // { dg-warning "insufficient partitioning available to parallelize loop"= "" { target *-*-* } .-1 } for (j =3D 0; j < 10; j++) #pragma acc loop reduction(-:diff) for (k =3D 0; k < 10; k++) diff --git a/gcc/testsuite/c-c++-common/goacc/nested-reductions-2-routine.c= b/gcc/testsuite/c-c++-common/goacc/nested-reductions-2-routine.c index 5988d509bec..9bd79dea4cf 100644 --- a/gcc/testsuite/c-c++-common/goacc/nested-reductions-2-routine.c +++ b/gcc/testsuite/c-c++-common/goacc/nested-reductions-2-routine.c @@ -11,6 +11,7 @@ void acc_routine (void) #pragma acc loop reduction(+:sum) for (i =3D 0; i < 10; i++) #pragma acc loop // { dg-warning "nested loop in reduction needs red= uction clause for .sum." } + // { dg-warning "insufficient partitioning available to parallelize = loop" "" { target *-*-* } .-1 } for (j =3D 0; j < 10; j++) #pragma acc loop reduction(+:sum) for (k =3D 0; k < 10; k++) @@ -19,6 +20,7 @@ void acc_routine (void) #pragma acc loop reduction(+:sum) for (i =3D 0; i < 10; i++) #pragma acc loop collapse(2) // { dg-warning "nested loop in reducti= on needs reduction clause for .sum." } + // { dg-warning "insufficient partitioning available to parallelize = loop" "" { target *-*-* } .-1 } for (j =3D 0; j < 10; j++) for (k =3D 0; k < 10; k++) #pragma acc loop reduction(+:sum) @@ -28,6 +30,7 @@ void acc_routine (void) #pragma acc loop reduction(+:sum) for (i =3D 0; i < 10; i++) #pragma acc loop // { dg-warning "nested loop in reduction needs red= uction clause for .sum." } + // { dg-warning "insufficient partitioning available to parallelize = loop" "" { target *-*-* } .-1 } for (j =3D 0; j < 10; j++) #pragma acc loop // { dg-warning "nested loop in reduction needs r= eduction clause for .sum." } // { dg-warning "insufficient partitioning available to paralleliz= e loop" "" { target *-*-* } .-1 }=20 @@ -39,6 +42,7 @@ void acc_routine (void) #pragma acc loop reduction(+:sum) for (i =3D 0; i < 10; i++) #pragma acc loop reduction(-:sum) // { dg-warning "conflicting reduc= tion operations for .sum." } + // { dg-warning "insufficient partitioning available to parallelize = loop" "" { target *-*-* } .-1 } for (j =3D 0; j < 10; j++) #pragma acc loop reduction(+:sum) // { dg-warning "conflicting red= uction operations for .sum." } for (k =3D 0; k < 10; k++) @@ -47,6 +51,7 @@ void acc_routine (void) #pragma acc loop reduction(+:sum) for (i =3D 0; i < 10; i++) #pragma acc loop reduction(-:sum) // { dg-warning "conflicting reduc= tion operations for .sum." } + // { dg-warning "insufficient partitioning available to parallelize = loop" "" { target *-*-* } .-1 } for (j =3D 0; j < 10; j++) #pragma acc loop reduction(-:sum) for (k =3D 0; k < 10; k++) @@ -55,6 +60,7 @@ void acc_routine (void) #pragma acc loop reduction(+:sum) for (i =3D 0; i < 10; i++) #pragma acc loop reduction(-:sum) // { dg-warning "conflicting reduc= tion operations for .sum." } + // { dg-warning "insufficient partitioning available to parallelize = loop" "" { target *-*-* } .-1 } for (j =3D 0; j < 10; j++) #pragma acc loop // { dg-warning "nested loop in reduction needs r= eduction clause for .sum." } // { dg-warning "insufficient partitioning available to paralleliz= e loop" "" { target *-*-* } .-1 }=20 @@ -66,6 +72,7 @@ void acc_routine (void) #pragma acc loop reduction(+:sum) for (i =3D 0; i < 10; i++) #pragma acc loop reduction(-:sum) // { dg-warning "conflicting reduc= tion operations for .sum." } + // { dg-warning "insufficient partitioning available to parallelize = loop" "" { target *-*-* } .-1 } for (j =3D 0; j < 10; j++) #pragma acc loop reduction(+:sum) // { dg-warning "conflicting reduc= tion operations for .sum." }) // { dg-warning "insufficient partitioning available to parallelize = loop" "" { target *-*-* } .-1 }=20 @@ -78,12 +85,14 @@ void acc_routine (void) for (i =3D 0; i < 10; i++) { #pragma acc loop reduction(-:diff) // { dg-warning "nested loop in= reduction needs reduction clause for .sum." } + // { dg-warning "insufficient partitioning available to parallelize loop"= "" { target *-*-* } .-1 } for (j =3D 0; j < 10; j++) #pragma acc loop reduction(+:sum) for (k =3D 0; k < 10; k++) sum =3D 1; =20 #pragma acc loop reduction(+:sum) // { dg-warning "nested loop in = reduction needs reduction clause for .diff." } + // { dg-warning "insufficient partitioning available to parallelize loop"= "" { target *-*-* } .-1 } for (j =3D 0; j < 10; j++) #pragma acc loop reduction(-:diff) for (k =3D 0; k < 10; k++) diff --git a/gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c b/gcc/t= estsuite/c-c++-common/goacc/orphan-reductions-1.c new file mode 100644 index 00000000000..d2fec108214 --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c @@ -0,0 +1,56 @@ +/* Test orphan reductions. */ + +/* { dg-do compile } */ + +#pragma acc routine seq +int +seq_reduction (int n) +{ + int i, sum =3D 0; +#pragma acc loop seq reduction(+:sum) + for (i =3D 0; i < n; i++) + sum =3D sum + 1; + + return sum; +} + +#pragma acc routine gang +int +gang_reduction (int n) +{ + int i, s1 =3D 0, s2 =3D 0; +#pragma acc loop gang reduction(+:s1) /* { dg-error "gang reduction on an = orphan loop" } */ + for (i =3D 0; i < n; i++) + s1 =3D s1 + 2; + +#pragma acc loop gang reduction(+:s2) /* { dg-error "gang reduction on an = orphan loop" } */ + for (i =3D 0; i < n; i++) + s2 =3D s2 + 2; + + + return s1 + s2; +} + +#pragma acc routine worker +int +worker_reduction (int n) +{ + int i, sum =3D 0; +#pragma acc loop worker reduction(+:sum) + for (i =3D 0; i < n; i++) + sum =3D sum + 3; + + return sum; +} + +#pragma acc routine vector +int +vector_reduction (int n) +{ + int i, sum =3D 0; +#pragma acc loop vector reduction(+:sum) + for (i =3D 0; i < n; i++) + sum =3D sum + 4; + + return sum; +} diff --git a/gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c b/gcc/t= estsuite/c-c++-common/goacc/orphan-reductions-2.c new file mode 100644 index 00000000000..941e5c6126a --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-2.c @@ -0,0 +1,87 @@ +/* Ensure that the middle end does not assign gang level parallelism + to orphan loop containing reductions. */ + +/* { dg-do compile } */ +/* { dg-additional-options "-fopt-info-optimized-omp" } */ +/* { dg-additional-options "-Wopenacc-parallelism" } */ + +#pragma acc routine gang +int +f1 () /* { dg-warning "region is gang partitioned but does not contain gan= g partitioned code" } */ +{ + int sum =3D 0, i; + +#pragma acc loop reduction (+:sum) /* { dg-optimized "assigned OpenACC wor= ker vector loop parallelism" } */ + for (i =3D 0; i < 100; i++) + sum++; + + return sum; +} + +#pragma acc routine gang +int +f2 () /* { dg-warning "region is gang partitioned but does not contain gan= g partitioned code" } */ +{ + int sum =3D 0, i, j; + +#pragma acc loop reduction (+:sum) /* { dg-optimized "assigned OpenACC wor= ker loop parallelism" } */ + for (i =3D 0; i < 100; i++) +#pragma acc loop reduction (+:sum) /* { dg-optimized "assigned OpenACC vec= tor loop parallelism" } */ + for (j =3D 0; j < 100; j++) + sum++; + + return sum; +} + +#pragma acc routine gang +int +f3 () /* { dg-warning "region is gang partitioned but does not contain gan= g partitioned code" } */ +{ + int sum =3D 0, i, j, k; + +#pragma acc loop reduction (+:sum) /* { dg-optimized "assigned OpenACC wor= ker loop parallelism" } */ + for (i =3D 0; i < 100; i++) +#pragma acc loop reduction (+:sum) /* { dg-optimized "assigned OpenACC seq= loop parallelism" } */ + /* { dg-warning "insufficient partitioning available to parallelize lo= op" "" { target *-*-* } .-1 } */ + for (j =3D 0; j < 100; j++) +#pragma acc loop reduction (+:sum) /* { dg-optimized "assigned OpenACC vec= tor loop parallelism" } */ + for (k =3D 0; k < 100; k++) + sum++; + + return sum; +} + +int +main () +{ + int sum =3D 0, i, j, k; + +#pragma acc parallel copy (sum) + { +#pragma acc loop reduction (+:sum) /* { dg-optimized "assigned OpenACC gan= g vector loop parallelism" } */ + for (i =3D 0; i < 100; i++) + sum++; + } + +#pragma acc parallel copy (sum) + { +#pragma acc loop reduction (+:sum) /* { dg-optimized "assigned OpenACC gan= g worker loop parallelism" } */ + for (i =3D 0; i < 100; i++) +#pragma acc loop reduction (+:sum) /* { dg-optimized "assigned OpenACC vec= tor loop parallelism" } */ + for (j =3D 0; j < 100; j++) + sum++; + } + +#pragma acc parallel copy (sum) + { +#pragma acc loop reduction (+:sum) /* { dg-optimized "assigned OpenACC gan= g loop parallelism" } */ + for (i =3D 0; i < 100; i++) +#pragma acc loop reduction (+:sum) /* { dg-optimized "assigned OpenACC wor= ker loop parallelism" } */ + for (j =3D 0; j < 100; j++) +#pragma acc loop reduction (+:sum) /* { dg-optimized "assigned OpenACC vec= tor loop parallelism" } */ + for (k =3D 0; k < 100; k++) + sum++; + } + + return sum; +} diff --git a/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c b/gcc/testsuite= /gcc.dg/goacc/loop-processing-1.c index 78b9aed89be..f6e25151e1e 100644 --- a/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c +++ b/gcc/testsuite/gcc.dg/goacc/loop-processing-1.c @@ -15,4 +15,4 @@ void vector_1 (int *ary, int size) } } =20 -/* { dg-final { scan-tree-dump {OpenACC loops.*Loop 0\(0\).*Loop 24\(1\).*= \.data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*Head-0:.*\.= data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 36\);.*\.data_dep\.[= 0-9_]+ =3D \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 0\);.*Tail-0:.*\.data= _dep\.[0-9_]+ =3D \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.d= ata_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 0\);.*Loop = 6\(6\).*\.data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*Head= -0:.*\.data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*\.data_= dep\.[0-9_]+ =3D \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 1\);.*Head-1:.*= \.data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_HEAD_MARK, \.data_dep\.[0-9_]+, 1\)= ;.*\.data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 2\);.= *Tail-1:.*\.data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0= -9_]+, 2\);.*\.data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9= _]+, 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_TAIL_MARK, \.da= ta_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_JOIN, \.data= _dep\.[0-9_]+, 1\);} "oaccloops" } } */ +/* { dg-final { scan-tree-dump {OpenACC loops.*Loop 0\(0\).*Loop 44\(1\).*= \.data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 68\);.*Head-0:.*\.= data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_HEAD_MARK, 0, 1, 68\);.*\.data_dep\.[= 0-9_]+ =3D \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 0\);.*Tail-0:.*\.data= _dep\.[0-9_]+ =3D \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0-9_]+, 1\);.*\.d= ata_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9_]+, 0\);.*Loop = 6\(6\).*\.data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*Head= -0:.*\.data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_HEAD_MARK, 0, 2, 6\);.*\.data_= dep\.[0-9_]+ =3D \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 1\);.*Head-1:.*= \.data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_HEAD_MARK, \.data_dep\.[0-9_]+, 1\)= ;.*\.data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 2\);.= *Tail-1:.*\.data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_TAIL_MARK, \.data_dep\.[0= -9_]+, 2\);.*\.data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_JOIN, \.data_dep\.[0-9= _]+, 2\);.*Tail-0:.*\.data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_TAIL_MARK, \.da= ta_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ =3D \.UNIQUE \(OACC_JOIN, \.data= _dep\.[0-9_]+, 1\);} "oaccloops" } } */ diff --git a/gcc/testsuite/gfortran.dg/goacc/nested-reductions-1-routine.f9= 0 b/gcc/testsuite/gfortran.dg/goacc/nested-reductions-1-routine.f90 index 17a586152c7..e8264114714 100644 --- a/gcc/testsuite/gfortran.dg/goacc/nested-reductions-1-routine.f90 +++ b/gcc/testsuite/gfortran.dg/goacc/nested-reductions-1-routine.f90 @@ -59,6 +59,7 @@ subroutine acc_routine () !$acc loop reduction(+:sum) do i =3D 1, 10 !$acc loop reduction(+:sum) + ! { dg-warning "insufficient partitioning available to parallelize l= oop" "" { target *-*-* } .-1 } do j =3D 1, 10 !$acc loop reduction(+:sum) do k =3D 1, 10 @@ -70,6 +71,7 @@ subroutine acc_routine () !$acc loop reduction(+:sum) reduction(-:diff) do i =3D 1, 10 !$acc loop reduction(+:sum) + ! { dg-warning "insufficient partitioning available to parallelize l= oop" "" { target *-*-* } .-1 } do j =3D 1, 10 !$acc loop reduction(+:sum) do k =3D 1, 10 @@ -78,6 +80,7 @@ subroutine acc_routine () end do =20 !$acc loop reduction(-:diff) + ! { dg-warning "insufficient partitioning available to parallelize l= oop" "" { target *-*-* } .-1 } do j =3D 1, 10 !$acc loop reduction(-:diff) do k =3D 1, 10 diff --git a/gcc/testsuite/gfortran.dg/goacc/nested-reductions-2-routine.f9= 0 b/gcc/testsuite/gfortran.dg/goacc/nested-reductions-2-routine.f90 index cc7802ecd10..98b1aa641c0 100644 --- a/gcc/testsuite/gfortran.dg/goacc/nested-reductions-2-routine.f90 +++ b/gcc/testsuite/gfortran.dg/goacc/nested-reductions-2-routine.f90 @@ -10,6 +10,7 @@ subroutine acc_routine () !$acc loop reduction(+:sum) do i =3D 1, 10 !$acc loop ! { dg-warning "nested loop in reduction needs reduction= clause for .sum." } + ! { dg-warning "insufficient partitioning available to parallelize l= oop" "" { target *-*-* } .-1 } do j =3D 1, 10 !$acc loop reduction(+:sum) do k =3D 1, 10 @@ -21,6 +22,7 @@ subroutine acc_routine () !$acc loop reduction(+:sum) do i =3D 1, 10 !$acc loop collapse(2) ! { dg-warning "nested loop in reduction nee= ds reduction clause for .sum." } + ! { dg-warning "insufficient partitioning available to parallelize l= oop" "" { target *-*-* } .-1 } do j =3D 1, 10 do k =3D 1, 10 !$acc loop reduction(+:sum) @@ -34,6 +36,7 @@ subroutine acc_routine () !$acc loop reduction(+:sum) do i =3D 1, 10 !$acc loop ! { dg-warning "nested loop in reduction needs reduction= clause for .sum." } + ! { dg-warning "insufficient partitioning available to parallelize l= oop" "" { target *-*-* } .-1 } do j =3D 1, 10 !$acc loop ! { dg-warning "nested loop in reduction needs reducti= on clause for .sum." } ! { dg-warning "insufficient partitioning available to parallelize= loop" "" { target *-*-* } .-1 } @@ -49,6 +52,7 @@ subroutine acc_routine () !$acc loop reduction(+:sum) do i =3D 1, 10 !$acc loop reduction(-:sum) ! { dg-warning "conflicting reduction o= perations for .sum." } + ! { dg-warning "insufficient partitioning available to parallelize l= oop" "" { target *-*-* } .-1 } do j =3D 1, 10 !$acc loop reduction(+:sum) ! { dg-warning "conflicting reduction= operations for .sum." } do k =3D 1, 10 @@ -60,6 +64,7 @@ subroutine acc_routine () !$acc loop reduction(+:sum) do i =3D 1, 10 !$acc loop reduction(-:sum) ! { dg-warning "conflicting reduction o= perations for .sum." } + ! { dg-warning "insufficient partitioning available to parallelize l= oop" "" { target *-*-* } .-1 } do j =3D 1, 10 !$acc loop reduction(-:sum) do k =3D 1, 10 @@ -71,6 +76,7 @@ subroutine acc_routine () !$acc loop reduction(+:sum) do i =3D 1, 10 !$acc loop reduction(-:sum) ! { dg-warning "conflicting reduction o= perations for .sum." } + ! { dg-warning "insufficient partitioning available to parallelize l= oop" "" { target *-*-* } .-1 } do j =3D 1, 10 !$acc loop ! { dg-warning "nested loop in reduction needs reducti= on clause for .sum." } ! { dg-warning "insufficient partitioning available to parallelize= loop" "" { target *-*-* } .-1 } @@ -86,6 +92,7 @@ subroutine acc_routine () !$acc loop reduction(+:sum) do i =3D 1, 10 !$acc loop reduction(-:sum) ! { dg-warning "conflicting reduction o= perations for .sum." } + ! { dg-warning "insufficient partitioning available to parallelize l= oop" "" { target *-*-* } .-1 } do j =3D 1, 10 !$acc loop reduction(+:sum) ! { dg-warning "conflicting reduction= operations for .sum." } ! { dg-warning "insufficient partitioning available to parallelize= loop" "" { target *-*-* } .-1 } @@ -101,6 +108,7 @@ subroutine acc_routine () !$acc loop reduction(+:sum) reduction(-:diff) do i =3D 1, 10 !$acc loop reduction(-:diff) ! { dg-warning "nested loop in reducti= on needs reduction clause for .sum." } + ! { dg-warning "insufficient partitioning available to parallelize l= oop" "" { target *-*-* } .-1 } do j =3D 1, 10 !$acc loop reduction(+:sum) do k =3D 1, 10 @@ -109,6 +117,7 @@ subroutine acc_routine () end do =20 !$acc loop reduction(+:sum) ! { dg-warning "nested loop in reductio= n needs reduction clause for .diff." } + ! { dg-warning "insufficient partitioning available to parallelize l= oop" "" { target *-*-* } .-1 } do j =3D 1, 10 !$acc loop reduction(-:diff) do k =3D 1, 10 diff --git a/gcc/testsuite/gfortran.dg/goacc/orphan-reductions-1.f90 b/gcc/= testsuite/gfortran.dg/goacc/orphan-reductions-1.f90 new file mode 100644 index 00000000000..c7fcc9d4ac5 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/orphan-reductions-1.f90 @@ -0,0 +1,206 @@ +! Verify that gang reduction on orphan OpenACC loops reported as errors. + +! { dg-do compile } + +subroutine s1 + implicit none + + integer, parameter :: n =3D 100 + integer :: i, sum + sum =3D 0 + + !$acc parallel reduction(+:sum) + do i =3D 1, n + sum =3D sum + 1 + end do + !$acc end parallel + + !$acc parallel loop gang reduction(+:sum) + do i =3D 1, n + sum =3D sum + 1 + end do + + !$acc parallel + !$acc loop gang reduction(+:sum) + do i =3D 1, n + sum =3D sum + 1 + end do + !$acc end parallel +end subroutine s1 + +subroutine s2 + implicit none + !$acc routine worker + + integer, parameter :: n =3D 100 + integer :: i, j, sum + sum =3D 0 + + !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an orph= an loop" } + do i =3D 1, n + sum =3D sum + 1 + end do + + !$acc loop reduction(+:sum) + do i =3D 1, n + !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an o= rphan loop" } + do j =3D 1, n + sum =3D sum + 1 + end do + end do +end subroutine s2 + +integer function f1 () + implicit none + + integer, parameter :: n =3D 100 + integer :: i, sum + sum =3D 0 + + !$acc parallel reduction(+:sum) + do i =3D 1, n + sum =3D sum + 1 + end do + !$acc end parallel + + !$acc parallel loop gang reduction(+:sum) + do i =3D 1, n + sum =3D sum + 1 + end do + + !$acc parallel + !$acc loop gang reduction(+:sum) + do i =3D 1, n + sum =3D sum + 1 + end do + !$acc end parallel + + f1 =3D sum +end function f1 + +integer function f2 () + implicit none + !$acc routine worker + + integer, parameter :: n =3D 100 + integer :: i, j, sum + sum =3D 0 + + !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an orph= an loop" } + do i =3D 1, n + sum =3D sum + 1 + end do + + !$acc loop reduction(+:sum) + do i =3D 1, n + !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an o= rphan loop" } + do j =3D 1, n + sum =3D sum + 1 + end do + end do + + f2 =3D sum +end function f2 + +module m +contains + subroutine s3 + implicit none + + integer, parameter :: n =3D 100 + integer :: i, sum + sum =3D 0 + + !$acc parallel reduction(+:sum) + do i =3D 1, n + sum =3D sum + 1 + end do + !$acc end parallel + + !$acc parallel loop gang reduction(+:sum) + do i =3D 1, n + sum =3D sum + 1 + end do + + !$acc parallel + !$acc loop gang reduction(+:sum) + do i =3D 1, n + sum =3D sum + 1 + end do + !$acc end parallel + end subroutine s3 + + subroutine s4 + implicit none + !$acc routine worker + + integer, parameter :: n =3D 100 + integer :: i, j, sum + sum =3D 0 + + !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an or= phan loop" } + do i =3D 1, n + sum =3D sum + 1 + end do + + !$acc loop reduction(+:sum) + do i =3D 1, n + !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an= orphan loop" } + do j =3D 1, n + sum =3D sum + 1 + end do + end do + end subroutine s4 + + integer function f3 () + implicit none + + integer, parameter :: n =3D 100 + integer :: i, sum + sum =3D 0 + + !$acc parallel reduction(+:sum) + do i =3D 1, n + sum =3D sum + 1 + end do + !$acc end parallel + + !$acc parallel loop gang reduction(+:sum) + do i =3D 1, n + sum =3D sum + 1 + end do + + !$acc parallel + !$acc loop gang reduction(+:sum) + do i =3D 1, n + sum =3D sum + 1 + end do + !$acc end parallel + + f3 =3D sum + end function f3 + + integer function f4 () + implicit none + !$acc routine worker + + integer, parameter :: n =3D 100 + integer :: i, j, sum + sum =3D 0 + + !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an or= phan loop" } + do i =3D 1, n + sum =3D sum + 1 + end do + + !$acc loop reduction(+:sum) + do i =3D 1, n + !$acc loop gang reduction(+:sum) ! { dg-error "gang reduction on an= orphan loop" } + do j =3D 1, n + sum =3D sum + 1 + end do + end do + + f4 =3D sum + end function f4 +end module m diff --git a/gcc/testsuite/gfortran.dg/goacc/orphan-reductions-2.f90 b/gcc/= testsuite/gfortran.dg/goacc/orphan-reductions-2.f90 new file mode 100644 index 00000000000..7ff0a57e620 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/goacc/orphan-reductions-2.f90 @@ -0,0 +1,89 @@ +! Ensure that the middle end does not assign gang level parallelism to +! orphan loop containing reductions. + +! { dg-do compile } +! { dg-additional-options "-fopt-info-optimized-omp" } +! { dg-additional-options "-Wopenacc-parallelism" } + +subroutine s1 ! { dg-warning "region is gang partitioned but does not cont= ain gang partitioned code" } + implicit none + !$acc routine gang + integer i, sum + + sum =3D 0 + !$acc loop reduction (+:sum) ! { dg-optimized "assigned OpenACC worker v= ector loop parallelism" } + do i =3D 1, 10 + sum =3D sum + 1 + end do +end subroutine s1 + +subroutine s2 ! { dg-warning "region is gang partitioned but does not cont= ain gang partitioned code" } + implicit none + !$acc routine gang + integer i, j, sum + + sum =3D 0 + !$acc loop reduction (+:sum) ! { dg-optimized "assigned OpenACC worker l= oop parallelism" } + do i =3D 1, 10 + !$acc loop reduction (+:sum) ! { dg-optimized "assigned OpenACC vecto= r loop parallelism" } + do j =3D 1, 10 + sum =3D sum + 1 + end do + end do +end subroutine s2 + +subroutine s3 ! { dg-warning "region is gang partitioned but does not cont= ain gang partitioned code" } + implicit none + !$acc routine gang + integer i, j, k, sum + + sum =3D 0 + !$acc loop reduction (+:sum) ! { dg-optimized "assigned OpenACC worker l= oop parallelism" } + do i =3D 1, 10 + !$acc loop reduction (+:sum) ! { dg-optimized "assigned OpenACC seq l= oop parallelism" } + ! { dg-warning "insufficient partitioning available to parallelize lo= op" "" { target *-*-* } .-1 } + do j =3D 1, 10 + !$acc loop reduction (+:sum) ! { dg-optimized "assigned OpenACC ve= ctor loop parallelism" } + do k =3D 1, 10 + sum =3D sum + 1 + end do + end do + end do +end subroutine s3 + +subroutine s4 + implicit none + + integer i, j, k, sum + + sum =3D 0 + !$acc parallel copy(sum) + !$acc loop reduction (+:sum) ! { dg-optimized "assigned OpenACC gang vec= tor loop parallelism" } + do i =3D 1, 10 + sum =3D sum + 1 + end do + !$acc end parallel + + !$acc parallel copy(sum) + !$acc loop reduction (+:sum) ! { dg-optimized "assigned OpenACC gang wor= ker loop parallelism" } + do i =3D 1, 10 + !$acc loop reduction (+:sum) ! { dg-optimized "assigned OpenACC vecto= r loop parallelism" } + do j =3D 1, 10 + sum =3D sum + 1 + end do + end do + !$acc end parallel + + !$acc parallel copy(sum) + !$acc loop reduction (+:sum) ! { dg-optimized "assigned OpenACC gang loo= p parallelism" } + do i =3D 1, 10 + !$acc loop reduction (+:sum) ! { dg-optimized "assigned OpenACC worke= r loop parallelism" } + do j =3D 1, 10 + !$acc loop reduction (+:sum) ! { dg-optimized "assigned OpenACC ve= ctor loop parallelism" } + do k =3D 1, 10 + sum =3D sum + 1 + end do + end do + end do + !$acc end parallel +end subroutine s4 diff --git a/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90 b/lib= gomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90 index fad3d9d6a80..80d64030414 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90 +++ b/libgomp/testsuite/libgomp.oacc-fortran/parallel-dims.f90 @@ -3,6 +3,7 @@ =20 ! { dg-additional-sources parallel-dims-aux.c } ! { dg-do run } + ! { dg-skip-if TODO { *-*-* } } ! { dg-prune-output "command-line option '-fintrinsic-modules-path=3D.*' i= s valid for Fortran but not for C" } =20 ! { dg-additional-options "-fopt-info-note-omp" } --=20 2.33.0 --=-=-=--