public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [og8] OpenACC 'kernels' construct changes: splitting of the construct into several regions
@ 2019-02-01  0:00 Thomas Schwinge
  2019-02-01 19:48 ` Thomas Schwinge
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Thomas Schwinge @ 2019-02-01  0:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: Gergö Barany, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 430 bytes --]

Hi!

I've just pushed the attached nine patches to openacc-gcc-8-branch:
OpenACC 'kernels' construct changes: splitting of the construct into
several regions.

There's more work to be done there, and we're aware of a number of TODO
items, but nevertheless: it's a good first step.


(Tom, CCed you just for your information, as you've been working on the
OpenACC 'kernels' construct before.)


Grüße
 Thomas



[-- Attachment #2: 0001-Use-fopenacc-kernels-parloops-to-document-parloops-t.patch --]
[-- Type: text/x-diff, Size: 77711 bytes --]

From 07b7c5d9de8a3e77adca53a8eb3f4235903ea368 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 30 Jan 2019 10:32:10 +0100
Subject: [PATCH 1/9] Use "-fopenacc-kernels=parloops" to document "parloops"
 test cases

	gcc/
	* flag-types.h (enum openacc_kernels): New type.
	gcc/c-family/
	* c.opt (fopenacc-kernels): New flag.
	gcc/fortran/
	* lang.opt (fopenacc-kernels): New flag.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-1.c: Add
	"-fopenacc-kernels=parloops".
	* c-c++-common/goacc/kernels-acc-loop-reduction.c: Likewise.
	* c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: Likewise.
	* c-c++-common/goacc/kernels-alias-2.c: Likewise.
	* c-c++-common/goacc/kernels-alias-3.c: Likewise.
	* c-c++-common/goacc/kernels-alias-4.c: Likewise.
	* c-c++-common/goacc/kernels-alias-5.c: Likewise.
	* c-c++-common/goacc/kernels-alias-6.c: Likewise.
	* c-c++-common/goacc/kernels-alias-7.c: Likewise.
	* c-c++-common/goacc/kernels-alias-8.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-2.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-3.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-4.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta.c: Likewise.
	* c-c++-common/goacc/kernels-alias.c: Likewise.
	* c-c++-common/goacc/kernels-counter-var-redundant-load.c:
	Likewise.
	* c-c++-common/goacc/kernels-counter-vars-function-scope.c:
	Likewise.
	* c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
	* c-c++-common/goacc/kernels-double-reduction.c: Likewise.
	* c-c++-common/goacc/kernels-loop-2-acc-loop.c: Likewise.
	* c-c++-common/goacc/kernels-loop-2.c: Likewise.
	* c-c++-common/goacc/kernels-loop-3-acc-loop.c: Likewise.
	* c-c++-common/goacc/kernels-loop-3.c: Likewise.
	* c-c++-common/goacc/kernels-loop-acc-loop.c: Likewise.
	* c-c++-common/goacc/kernels-loop-data-2.c: Likewise.
	* c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Likewise.
	* c-c++-common/goacc/kernels-loop-data-enter-exit.c: Likewise.
	* c-c++-common/goacc/kernels-loop-data-update.c: Likewise.
	* c-c++-common/goacc/kernels-loop-data.c: Likewise.
	* c-c++-common/goacc/kernels-loop-g.c: Likewise.
	* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
	* c-c++-common/goacc/kernels-loop-n-acc-loop.c: Likewise.
	* c-c++-common/goacc/kernels-loop-n.c: Likewise.
	* c-c++-common/goacc/kernels-loop-nest.c: Likewise.
	* c-c++-common/goacc/kernels-loop-offload-alias-none.c: Likewise.
	* c-c++-common/goacc/kernels-loop-offload-alias-ptr.c: Likewise.
	* c-c++-common/goacc/kernels-loop.c: Likewise.
	* c-c++-common/goacc/kernels-offload-alias-2.c: Likewise.
	* c-c++-common/goacc/kernels-offload-alias-3.c: Likewise.
	* c-c++-common/goacc/kernels-offload-alias-4.c: Likewise.
	* c-c++-common/goacc/kernels-offload-alias-5.c: Likewise.
	* c-c++-common/goacc/kernels-offload-alias-6.c: Likewise.
	* c-c++-common/goacc/kernels-offload-alias.c: Likewise.
	* c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
	* c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c:
	Likewise.
	* c-c++-common/goacc/kernels-reduction.c: Likewise.
	* gcc.dg/goacc/kern-1.c: Likewise.
	* gfortran.dg/goacc/kernels-alias-2.f95: Likewise.
	* gfortran.dg/goacc/kernels-alias-3.f95: Likewise.
	* gfortran.dg/goacc/kernels-alias-4.f95: Likewise.
	* gfortran.dg/goacc/kernels-alias.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-2.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-data.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-n.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop.f95: Likewise.
	* gfortran.dg/goacc/kernels-loops-adjacent.f95: Likewise.
	* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95:
	Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c:
	Add "-fopenacc-kernels=parloops".
	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-empty.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop-data.f95: Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop.f95: Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90:
	Likewise.
---
 gcc/ChangeLog.openacc                         |  4 ++
 gcc/c-family/ChangeLog.openacc                |  4 ++
 gcc/c-family/c.opt                            |  9 +++
 gcc/flag-types.h                              |  6 ++
 gcc/fortran/ChangeLog.openacc                 |  4 ++
 gcc/fortran/lang.opt                          |  3 +
 gcc/testsuite/ChangeLog.openacc               | 68 +++++++++++++++++++
 gcc/testsuite/c-c++-common/goacc/kernels-1.c  |  2 +
 .../goacc/kernels-acc-loop-reduction.c        |  2 +
 .../goacc/kernels-acc-loop-smaller-equal.c    |  2 +
 .../c-c++-common/goacc/kernels-alias-2.c      |  2 +
 .../c-c++-common/goacc/kernels-alias-3.c      |  2 +
 .../c-c++-common/goacc/kernels-alias-4.c      |  2 +
 .../c-c++-common/goacc/kernels-alias-5.c      |  2 +
 .../c-c++-common/goacc/kernels-alias-6.c      |  2 +
 .../c-c++-common/goacc/kernels-alias-7.c      |  2 +
 .../c-c++-common/goacc/kernels-alias-8.c      |  2 +
 .../goacc/kernels-alias-ipa-pta-2.c           |  2 +
 .../goacc/kernels-alias-ipa-pta-3.c           |  2 +
 .../goacc/kernels-alias-ipa-pta-4.c           |  2 +
 .../goacc/kernels-alias-ipa-pta.c             |  2 +
 .../c-c++-common/goacc/kernels-alias.c        |  2 +
 .../kernels-counter-var-redundant-load.c      |  2 +
 .../kernels-counter-vars-function-scope.c     |  2 +
 .../goacc/kernels-double-reduction-n.c        |  2 +
 .../goacc/kernels-double-reduction.c          |  2 +
 .../goacc/kernels-loop-2-acc-loop.c           |  2 +
 .../c-c++-common/goacc/kernels-loop-2.c       |  2 +
 .../goacc/kernels-loop-3-acc-loop.c           |  2 +
 .../c-c++-common/goacc/kernels-loop-3.c       |  2 +
 .../goacc/kernels-loop-acc-loop.c             |  2 +
 .../c-c++-common/goacc/kernels-loop-data-2.c  |  2 +
 .../goacc/kernels-loop-data-enter-exit-2.c    |  2 +
 .../goacc/kernels-loop-data-enter-exit.c      |  2 +
 .../goacc/kernels-loop-data-update.c          |  2 +
 .../c-c++-common/goacc/kernels-loop-data.c    |  2 +
 .../c-c++-common/goacc/kernels-loop-g.c       |  2 +
 .../goacc/kernels-loop-mod-not-zero.c         |  2 +
 .../goacc/kernels-loop-n-acc-loop.c           |  2 +
 .../c-c++-common/goacc/kernels-loop-n.c       |  2 +
 .../c-c++-common/goacc/kernels-loop-nest.c    |  2 +
 .../goacc/kernels-loop-offload-alias-none.c   |  2 +
 .../goacc/kernels-loop-offload-alias-ptr.c    |  2 +
 .../c-c++-common/goacc/kernels-loop.c         |  2 +
 .../goacc/kernels-offload-alias-2.c           |  2 +
 .../goacc/kernels-offload-alias-3.c           |  2 +
 .../goacc/kernels-offload-alias-4.c           |  2 +
 .../goacc/kernels-offload-alias-5.c           |  2 +
 .../goacc/kernels-offload-alias-6.c           |  2 +
 .../goacc/kernels-offload-alias.c             |  2 +
 .../goacc/kernels-one-counter-var.c           |  2 +
 .../kernels-parallel-loop-data-enter-exit.c   |  2 +
 .../c-c++-common/goacc/kernels-reduction.c    |  2 +
 gcc/testsuite/gcc.dg/goacc/kern-1.c           |  2 +
 .../gfortran.dg/goacc/kernels-alias-2.f95     |  2 +
 .../gfortran.dg/goacc/kernels-alias-3.f95     |  2 +
 .../gfortran.dg/goacc/kernels-alias-4.f95     |  2 +
 .../gfortran.dg/goacc/kernels-alias.f95       |  2 +
 .../gfortran.dg/goacc/kernels-loop-2.f95      |  2 +
 .../gfortran.dg/goacc/kernels-loop-data-2.f95 |  2 +
 .../goacc/kernels-loop-data-enter-exit-2.f95  |  2 +
 .../goacc/kernels-loop-data-enter-exit.f95    |  2 +
 .../goacc/kernels-loop-data-update.f95        |  2 +
 .../gfortran.dg/goacc/kernels-loop-data.f95   |  2 +
 .../gfortran.dg/goacc/kernels-loop-inner.f95  |  2 +
 .../gfortran.dg/goacc/kernels-loop-n.f95      |  2 +
 .../gfortran.dg/goacc/kernels-loop.f95        |  2 +
 .../goacc/kernels-loops-adjacent.f95          |  2 +
 .../kernels-parallel-loop-data-enter-exit.f95 |  2 +
 libgomp/ChangeLog.openacc                     | 62 +++++++++++++++++
 .../kernels-alias-ipa-pta-2.c                 |  2 +
 .../kernels-alias-ipa-pta-3.c                 |  2 +
 .../kernels-alias-ipa-pta.c                   |  2 +
 .../libgomp.oacc-c-c++-common/kernels-empty.c |  2 +
 .../kernels-loop-2.c                          |  3 +
 .../kernels-loop-3.c                          |  3 +
 .../kernels-loop-and-seq-2.c                  |  2 +
 .../kernels-loop-and-seq-3.c                  |  3 +
 .../kernels-loop-and-seq-4.c                  |  3 +
 .../kernels-loop-and-seq-5.c                  |  2 +
 .../kernels-loop-and-seq-6.c                  |  2 +
 .../kernels-loop-and-seq.c                    |  2 +
 .../kernels-loop-collapse.c                   |  2 +
 .../kernels-loop-data-2.c                     |  3 +
 .../kernels-loop-data-enter-exit-2.c          |  3 +
 .../kernels-loop-data-enter-exit.c            |  3 +
 .../kernels-loop-data-update.c                |  3 +
 .../kernels-loop-data.c                       |  3 +
 .../kernels-loop-g.c                          |  2 +
 .../kernels-loop-mod-not-zero.c               |  3 +
 .../kernels-loop-n.c                          |  3 +
 .../kernels-loop-nest.c                       |  3 +
 .../libgomp.oacc-c-c++-common/kernels-loop.c  |  3 +
 .../kernels-parallel-loop-data-enter-exit.c   |  3 +
 .../kernels-reduction-1.c                     |  2 +
 .../kernels-reduction.c                       |  3 +
 .../libgomp.oacc-fortran/kernels-loop-2.f95   |  2 +
 .../kernels-loop-data-2.f95                   |  2 +
 .../kernels-loop-data-enter-exit-2.f95        |  2 +
 .../kernels-loop-data-enter-exit.f95          |  2 +
 .../kernels-loop-data-update.f95              |  2 +
 .../kernels-loop-data.f95                     |  2 +
 .../libgomp.oacc-fortran/kernels-loop.f95     |  2 +
 .../kernels-parallel-loop-data-enter-exit.f95 |  2 +
 .../kernels-reduction-1.f90                   |  2 +
 105 files changed, 369 insertions(+)

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index 58258fd7a1d..9316130243c 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,7 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* flag-types.h (enum openacc_kernels): New type.
+
 2019-01-31  Kwok Cheung Yeung  <kcy@codesourcery.com>
 
 	* omp-low.c (lower_omp_target): For use_device clauses, generate
diff --git a/gcc/c-family/ChangeLog.openacc b/gcc/c-family/ChangeLog.openacc
index 08dc56107bb..5b60c3a0dee 100644
--- a/gcc/c-family/ChangeLog.openacc
+++ b/gcc/c-family/ChangeLog.openacc
@@ -1,3 +1,7 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* c.opt (fopenacc-kernels): New flag.
+
 2019-01-09  Julian Brown  <julian@codesourcery.com>
 
 	* c-cppbuiltin.c (c_cpp_builtins): Update _OPENACC define to 201711.
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index a4c8c8ffcb3..73b01598377 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1617,6 +1617,15 @@ fopenacc-dim=
 C ObjC C++ ObjC++ LTO Joined Var(flag_openacc_dims)
 Specify default OpenACC compute dimensions.
 
+fopenacc-kernels=
+C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS) Undocumented
+
+Enum
+Name(openacc_kernels) Type(enum openacc_kernels)
+
+EnumValue
+Enum(openacc_kernels) String(parloops) Value(OPENACC_KERNELS_PARLOOPS)
+
 fopenmp
 C ObjC C++ ObjC++ LTO Var(flag_openmp)
 Enable OpenMP (implies -frecursive in Fortran).
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 6261f6e1106..910be7c7fd4 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -343,4 +343,10 @@ enum cf_protection_level
   CF_FULL = CF_BRANCH | CF_RETURN,
   CF_SET = 1 << 2
 };
+
+/* OpenACC 'kernels' constructs handling.  */
+enum openacc_kernels
+{
+  OPENACC_KERNELS_PARLOOPS
+};
 #endif /* ! GCC_FLAG_TYPES_H */
diff --git a/gcc/fortran/ChangeLog.openacc b/gcc/fortran/ChangeLog.openacc
index 05462a0173c..acb2177f22f 100644
--- a/gcc/fortran/ChangeLog.openacc
+++ b/gcc/fortran/ChangeLog.openacc
@@ -1,3 +1,7 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* lang.opt (fopenacc-kernels): New flag.
+
 2019-01-29  Gergö Barany  <gergo@codesourcery.com>
 
 	* trans-openmp.c (gfc_privatize_nodesc_array_clauses): Renamed from
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index 1cb7b6b4f84..097c623eb50 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -642,6 +642,9 @@ fopenacc-dim=
 Fortran LTO Joined Var(flag_openacc_dims)
 ; Documented in C
 
+fopenacc-kernels=
+Fortran RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS) Undocumented
+
 fopenmp
 Fortran LTO
 ; Documented in C
diff --git a/gcc/testsuite/ChangeLog.openacc b/gcc/testsuite/ChangeLog.openacc
index 2479367dce4..4acd174dca9 100644
--- a/gcc/testsuite/ChangeLog.openacc
+++ b/gcc/testsuite/ChangeLog.openacc
@@ -1,5 +1,73 @@
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* c-c++-common/goacc/kernels-1.c: Add
+	"-fopenacc-kernels=parloops".
+	* c-c++-common/goacc/kernels-acc-loop-reduction.c: Likewise.
+	* c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: Likewise.
+	* c-c++-common/goacc/kernels-alias-2.c: Likewise.
+	* c-c++-common/goacc/kernels-alias-3.c: Likewise.
+	* c-c++-common/goacc/kernels-alias-4.c: Likewise.
+	* c-c++-common/goacc/kernels-alias-5.c: Likewise.
+	* c-c++-common/goacc/kernels-alias-6.c: Likewise.
+	* c-c++-common/goacc/kernels-alias-7.c: Likewise.
+	* c-c++-common/goacc/kernels-alias-8.c: Likewise.
+	* c-c++-common/goacc/kernels-alias-ipa-pta-2.c: Likewise.
+	* c-c++-common/goacc/kernels-alias-ipa-pta-3.c: Likewise.
+	* c-c++-common/goacc/kernels-alias-ipa-pta-4.c: Likewise.
+	* c-c++-common/goacc/kernels-alias-ipa-pta.c: Likewise.
+	* c-c++-common/goacc/kernels-alias.c: Likewise.
+	* c-c++-common/goacc/kernels-counter-var-redundant-load.c:
+	Likewise.
+	* c-c++-common/goacc/kernels-counter-vars-function-scope.c:
+	Likewise.
+	* c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
+	* c-c++-common/goacc/kernels-double-reduction.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-2-acc-loop.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-2.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-3-acc-loop.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-3.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-acc-loop.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-data-2.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-data-enter-exit.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-data-update.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-data.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-g.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-n-acc-loop.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-n.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-nest.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-offload-alias-none.c: Likewise.
+	* c-c++-common/goacc/kernels-loop-offload-alias-ptr.c: Likewise.
+	* c-c++-common/goacc/kernels-loop.c: Likewise.
+	* c-c++-common/goacc/kernels-offload-alias-2.c: Likewise.
+	* c-c++-common/goacc/kernels-offload-alias-3.c: Likewise.
+	* c-c++-common/goacc/kernels-offload-alias-4.c: Likewise.
+	* c-c++-common/goacc/kernels-offload-alias-5.c: Likewise.
+	* c-c++-common/goacc/kernels-offload-alias-6.c: Likewise.
+	* c-c++-common/goacc/kernels-offload-alias.c: Likewise.
+	* c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
+	* c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c:
+	Likewise.
+	* c-c++-common/goacc/kernels-reduction.c: Likewise.
+	* gcc.dg/goacc/kern-1.c: Likewise.
+	* gfortran.dg/goacc/kernels-alias-2.f95: Likewise.
+	* gfortran.dg/goacc/kernels-alias-3.f95: Likewise.
+	* gfortran.dg/goacc/kernels-alias-4.f95: Likewise.
+	* gfortran.dg/goacc/kernels-alias.f95: Likewise.
+	* gfortran.dg/goacc/kernels-loop-2.f95: Likewise.
+	* gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise.
+	* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise.
+	* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise.
+	* gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise.
+	* gfortran.dg/goacc/kernels-loop-data.f95: Likewise.
+	* gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
+	* gfortran.dg/goacc/kernels-loop-n.f95: Likewise.
+	* gfortran.dg/goacc/kernels-loop.f95: Likewise.
+	* gfortran.dg/goacc/kernels-loops-adjacent.f95: Likewise.
+	* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95:
+	Likewise.
+
 	* lib/target-supports.exp
 	(check_effective_target_opt_levels_2_plus)
 	(check_effective_target_opt_levels_size): New.
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-1.c
index 0a4bd854611..8512f349aeb 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-1.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-fopt-info-optimized-omp" } */
 
 int
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c
index 4824e530925..19739ade492 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c
index d70afb0e662..0dd95b32ed8 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-2.c
index d437c47779d..99682498037 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-2.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-ealias-all" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-3.c
index b051481cdfd..fc492a27e95 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-3.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-3.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2 -foffload-alias=all" } */
 /* { dg-additional-options "-fdump-tree-ealias-all" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-4.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-4.c
index 1f626750495..7fa84e5f0eb 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-4.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-4.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2 -foffload-alias=all" } */
 /* { dg-additional-options "-fdump-tree-ealias-all" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-5.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-5.c
index ff0044df683..19b2a6705a3 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-5.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-5.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2 -foffload-alias=all" } */
 /* { dg-additional-options "-fdump-tree-ealias-all" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-6.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-6.c
index 861dc2c75b4..f4205fb6d3c 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-6.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-6.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2 -foffload-alias=all" } */
 /* { dg-additional-options "-fdump-tree-ealias-all" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-7.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-7.c
index d39128eec90..9356f436e8e 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-7.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-7.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2 -foffload-alias=all" } */
 /* { dg-additional-options "-fdump-tree-ealias-all" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c
index 0896b732235..5f451a56846 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2 -foffload-alias=all" } */
 /* { dg-additional-options "-fdump-tree-ealias-all" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-2.c
index f16d698af0d..7a57477ca25 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-2.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fipa-pta -fdump-tree-optimized" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-3.c
index e177abfabed..31ba223aacc 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-3.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-3.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2 -foffload-alias=all" } */
 /* { dg-additional-options "-fipa-pta -fdump-tree-optimized" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-4.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-4.c
index 20b21dcb577..41f901585cf 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-4.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-4.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fipa-pta -fdump-tree-optimized" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta.c
index 969b466e8a8..e587f96efe8 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fipa-pta -fdump-tree-optimized" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias.c b/gcc/testsuite/c-c++-common/goacc/kernels-alias.c
index 25821ab2aea..21285839dad 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-ealias-all" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c b/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c
index 1fd355b487c..642040f2c83 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2 -foffload-alias=all" } */
 /* { dg-additional-options "-fdump-tree-dom3" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c b/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
index c475333f1ae..f40de679fe7 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c b/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
index dd3b7c8b144..4c1984ecbc7 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fopt-info-optimized-omp" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c b/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
index 0175434a20c..1da2af4dddc 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fopt-info-optimized-omp" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c
index 7b127cb6fd9..c6e66fcfc93 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
index acef6a1a179..238956c51ef 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c
index a040e096fc1..4d041492a05 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
index 75e2bb78cea..2bbb0711abe 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop.c
index 070a5b5bf3d..54b5946d85e 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c
index 71800217991..e7830b6b80a 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c
index 0c9f8331240..b5c26705338 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c
index 0bd21b68d31..84f92a901ab 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c
index dd5a84146a8..dbdce4547c1 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c
index a658182de90..23f4e221f54 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
index 73b469d7061..2cbd6becfbf 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-g" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
index 55926230d57..28480aead92 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-n-acc-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-n-acc-loop.c
index 1f25e63fbbb..164c3949789 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-n-acc-loop.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-n-acc-loop.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
index e86be1b1cdc..26bc3e011bf 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
index 2b0e186ae29..b3fdde197b6 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c
index db5f94c96dd..288c4dedde4 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-none.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 /* { dg-additional-options "-fdump-tree-alias-all" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-ptr.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-ptr.c
index 30ad2c8ee9a..4256656932d 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-ptr.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-offload-alias-ptr.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 /* { dg-additional-options "-fdump-tree-alias-all" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop.c
index 9619d53b43d..d0423a88ed1 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-2.c
index 554c2b72667..a7d6a6baa0f 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-2.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2 -foffload-alias=pointer" } */
 /* { dg-additional-options "-fdump-tree-ealias-all -fdump-tree-optimized" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-3.c
index 7236be95635..de8ffd9b0f1 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-3.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-3.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2 -foffload-alias=pointer" } */
 /* { dg-additional-options "-fdump-tree-ealias-all -fdump-tree-optimized" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-4.c b/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-4.c
index 797575bab7b..9901cc8d9cf 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-4.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-4.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2 -foffload-alias=pointer" } */
 /* { dg-additional-options "-fdump-tree-ealias-all -fdump-tree-optimized" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-5.c b/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-5.c
index 746494c4b61..30d95ebf371 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-5.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-5.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2 -foffload-alias=pointer" } */
 /* { dg-additional-options "-fdump-tree-ealias-all -fdump-tree-optimized" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-6.c b/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-6.c
index 10d2897fb3d..f1df43f9037 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-6.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias-6.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2 -foffload-alias=pointer" } */
 /* { dg-additional-options "-fdump-tree-ealias-all -fdump-tree-optimized" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias.c b/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias.c
index 65095b72d90..bdaff2ef7b0 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-offload-alias.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2 -foffload-alias=pointer" } */
 /* { dg-additional-options "-fdump-tree-ealias-all -fdump-tree-optimized" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c b/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
index 69539b24a78..15a8d3732fc 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c b/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c
index 81b0fee5a44..457a79abf79 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c b/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
index 5921b88920f..76039888b55 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/gcc.dg/goacc/kern-1.c b/gcc/testsuite/gcc.dg/goacc/kern-1.c
index c48e826e306..948eb507825 100644
--- a/gcc/testsuite/gcc.dg/goacc/kern-1.c
+++ b/gcc/testsuite/gcc.dg/goacc/kern-1.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-O2" } */
 
 /* The reduction on sum could cause an ICE with a non-simple latch loop.   */
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-alias-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-alias-2.f95
index 7e348dde2bd..2ca68a1a8e5 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-alias-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-alias-2.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 ! { dg-additional-options "-O2" }
 ! { dg-additional-options "-fdump-tree-ealias-all" }
 
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-alias-3.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-alias-3.f95
index 36b06d34e58..21f9ea58e63 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-alias-3.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-alias-3.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 ! { dg-additional-options "-O2" }
 ! { dg-additional-options "-foffload-alias=all" }
 ! { dg-additional-options "-fdump-tree-ealias-all" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-alias-4.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-alias-4.f95
index e41da824550..da0c5311c6a 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-alias-4.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-alias-4.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 ! { dg-additional-options "-O2" }
 ! { dg-additional-options "-foffload-alias=all" }
 ! { dg-additional-options "-fdump-tree-ealias-all" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-alias.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-alias.f95
index 8d6ccb338b9..f1a2ece47b9 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-alias.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-alias.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 ! { dg-additional-options "-O2" }
 ! { dg-additional-options "-fdump-tree-ealias-all" }
 
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
index ef53324dd2a..59001e4734d 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 ! { dg-additional-options "-O2" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
 ! { dg-additional-options "-fdump-tree-optimized" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
index 2f1dcd603a1..b6f50cbd2d6 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 ! { dg-additional-options "-O2" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
 ! { dg-additional-options "-fdump-tree-optimized" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
index 447e85d6448..779073a1f3b 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 ! { dg-additional-options "-O2" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
 ! { dg-additional-options "-fdump-tree-optimized" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
index 4edb2889b7b..30ae2cb0f2a 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 ! { dg-additional-options "-O2" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
 ! { dg-additional-options "-fdump-tree-optimized" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
index fc113e1f660..b68945a91ac 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 ! { dg-additional-options "-O2" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
 ! { dg-additional-options "-fdump-tree-optimized" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
index 94522f58636..f5c6688a05b 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 ! { dg-additional-options "-O2" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
 ! { dg-additional-options "-fdump-tree-optimized" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
index 16a64e6d76d..b93c89ab9c1 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 ! { dg-additional-options "-O2" }
 ! { dg-additional-options "-fopt-info-optimized-omp" }
 
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95
index aaef42974a6..1bb363ce880 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 ! { dg-additional-options "-O2" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
 ! { dg-additional-options "-fdump-tree-optimized" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
index 6dc7b2e0f28..4da70409c8f 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 ! { dg-additional-options "-O2" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
 ! { dg-additional-options "-fdump-tree-optimized" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95
index fb92da8c08b..a83ff951457 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 ! { dg-additional-options "-O2" }
 
 program main
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95
index 16c9b80b295..260b11446ad 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 ! { dg-additional-options "-O2" }
 ! { dg-additional-options "-fdump-tree-parloops1-all" }
 ! { dg-additional-options "-fdump-tree-optimized" }
diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc
index edb7d3f76eb..bf572ee8922 100644
--- a/libgomp/ChangeLog.openacc
+++ b/libgomp/ChangeLog.openacc
@@ -1,5 +1,67 @@
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c:
+	Add "-fopenacc-kernels=parloops".
+	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-empty.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-loop-data.f95: Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-loop.f95: Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90:
+	Likewise.
+
 	* testsuite/libgomp.oacc-c-c++-common/lib-43.c: Remove.
 	* testsuite/libgomp.oacc-c-c++-common/lib-47.c: Likewise.
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c
index 0a280d58d8d..1bb2da110b0 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-fipa-pta" } */
 /* Override the compiler's "avoid offloading" decision.
    { dg-additional-options "-foffload-force" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c
index 6a50e626c02..42e27fb4aa9 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-foffload-alias=all -fipa-pta" } */
 /* Override the compiler's "avoid offloading" decision.
    { dg-additional-options "-foffload-force" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c
index 2a89cd3a4a1..38f34810105 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-fipa-pta" } */
 /* Override the compiler's "avoid offloading" decision.
    { dg-additional-options "-foffload-force" } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c
index d527e1497d7..cb1dd42afbc 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* Override the compiler's "avoid offloading" decision.
    { dg-additional-options "-foffload-force" } */
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
index b84088879c6..4aeeed1f674 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
 #include <stdlib.h>
 
 #define N (1024 * 512)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
index 31114ac86d7..9cbace156a1 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
 #include <stdlib.h>
 
 #define N (1024 * 512)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
index 6a85bf5b50f..fe344903904 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* Override the compiler's "avoid offloading" decision.
    { dg-additional-options "-foffload-force" } */
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
index e62297129fd..d53e39304a0 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
 #include <stdlib.h>
 
 #define N 32
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
index c73127897a9..7435c854172 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
 #include <stdlib.h>
 
 #define N 32
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
index 98a017fc55d..58c6402b3a7 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* Override the compiler's "avoid offloading" decision.
    { dg-additional-options "-foffload-force" } */
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
index 131aa40b2c8..6361001b358 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* Override the compiler's "avoid offloading" decision.
    { dg-additional-options "-foffload-force" } */
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
index 377f9dcde76..41984e3d354 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* Override the compiler's "avoid offloading" decision.
    { dg-additional-options "-foffload-force" } */
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
index bdc10dbe5bd..97658c0b8c5 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* Override the compiler's "avoid offloading" decision.
    { dg-additional-options "-foffload-force" } */
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c
index 607c35018df..337ad91507f 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
 #include <stdlib.h>
 
 #define N (1024 * 512)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c
index 8b9dd5f815a..214dd7e3e69 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
 #include <stdlib.h>
 
 #define N (1024 * 512)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c
index 5d5da6fcc01..7d097dadb62 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
 #include <stdlib.h>
 
 #define N (1024 * 512)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c
index c111c8f56e7..661cb286a0d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
 #include <stdlib.h>
 
 #define N (1024 * 512)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c
index 947bcdac452..2f4f699195d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
 #include <stdlib.h>
 
 #define N (1024 * 512)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
index 88258be16bd..e5a556b36b7 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* { dg-additional-options "-g" } */
 
 #include "kernels-loop.c"
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
index 147ebb59945..eeb318e7bda 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
 #include <stdlib.h>
 
 #define N ((1024 * 512) + 1)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
index 9a3eaca1380..eeccc1ddb06 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
 #include <stdlib.h>
 
 #define N ((1024 * 512) + 1)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c
index 28c725a61a2..c59c47ecd4a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
 #include <stdlib.h>
 
 #define N 1000
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
index 355123c6088..36eabb959ca 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
 #include <stdlib.h>
 
 #define N (1024 * 512)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c
index 374014a1e86..eeb08106dc8 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
 #include <stdlib.h>
 
 #define N (1024 * 512)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c
index 23ed7e9b801..d988d9f4277 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c
@@ -1,6 +1,8 @@
 /* Verify that a simple, explicit acc loop reduction works inside
  a kernels region.  */
 
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
 /* Override the compiler's "avoid offloading" decision.
    { dg-additional-options "-foffload-force" } */
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c
index 8647a9432fc..e67340cca20 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
 #include <stdlib.h>
 
 #define n 10000
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-2.f95 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-2.f95
index 8becc159dd1..14055a36284 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-2.f95
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-2.f95
@@ -1,4 +1,6 @@
 ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 
 program main
   implicit none
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95
index 2191ebedee3..ec3e74a6c51 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95
@@ -1,4 +1,6 @@
 ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 
 program main
   implicit none
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95
index 75fb8a32beb..f6e6e21ef24 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95
@@ -1,4 +1,6 @@
 ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 
 program main
   implicit none
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95
index 8ea34bf7bf8..b6b3a7aa3da 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95
@@ -1,4 +1,6 @@
 ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 
 program main
   implicit none
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95
index 710068a707a..686dfddcf3c 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95
@@ -1,4 +1,6 @@
 ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 
 program main
   implicit none
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data.f95 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data.f95
index c1dec2c89c5..b2aa0e7f3db 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data.f95
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data.f95
@@ -1,4 +1,6 @@
 ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 
 program main
   implicit none
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop.f95 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop.f95
index c9d3c4adc95..96cf48b14f1 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop.f95
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop.f95
@@ -1,4 +1,6 @@
 ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 
 program main
   implicit none
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95
index 99300ec88b1..ea752c348dc 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95
@@ -1,4 +1,6 @@
 ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 
 program main
   implicit none
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90
index a25e8a80f05..a2cdb2a79e8 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90
@@ -1,6 +1,8 @@
 ! Test a simple acc loop reduction inside a kernels region. 
 
 ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
 ! Override the compiler's "avoid offloading" decision.
 ! { dg-additional-options "-foffload-force" }
 
-- 
2.17.1


[-- Attachment #3: 0002-Add-OpenACC-target-kinds-for-decomposed-kernels-regi.patch --]
[-- Type: text/x-diff, Size: 16731 bytes --]

From 60cee29a41b1c42b8a621d436b0c843447ce4dfa Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 23 Jan 2019 06:56:52 -0800
Subject: [PATCH 2/9] Add OpenACC target kinds for decomposed kernels regions

This patch is in preparation for changes that will cut up OpenACC kernels
regions into individual parts. For the new sub-regions that will be
generated, this adds the following new kinds of OpenACC regions for internal
use:

- GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED for parts of kernels
  regions to be executed in gang-redundant mode
- GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE for parts of kernels
  regions to be executed in gang-single mode
- GF_OMP_TARGET_KIND_OACC_DATA_KERNELS for data regions generated around the
  body of a kernels region

    gcc/
    * gimple.h (enum gf_mask): Add new target kinds
    GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED,
    GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE, and
    GF_OMP_TARGET_KIND_OACC_DATA_KERNELS.
    (is_gimple_omp_oacc): Handle new target kinds.
    (is_gimple_omp_offloaded): Likewise.
    * gimple-pretty-print.c (dump_gimple_omp_target): Likewise.
    * omp-expand.c (expand_omp_target): Likewise.
    (build_omp_regions_1): Likewise.
    (omp_make_gimple_edges): Likewise.
    * omp-low.c (is_oacc_parallel_or_serial): Likewise.
    (was_originally_oacc_kernels): New function.
    (scan_omp_for): Update check for illegal nesting.
    (check_omp_nesting_restrictions): Handle new target kinds.
    (lower_oacc_reductions): Likewise.
    (lower_omp_target): Likewise.
    * omp-offload.c (execute_oacc_device_lower): Likewise.
---
 gcc/ChangeLog.openacc     | 21 ++++++++++++++++++
 gcc/gimple-pretty-print.c |  9 ++++++++
 gcc/gimple.h              | 14 ++++++++++++
 gcc/omp-expand.c          | 31 +++++++++++++++++++++-----
 gcc/omp-low.c             | 46 ++++++++++++++++++++++++++++++++++++---
 gcc/omp-offload.c         | 20 +++++++++++++++++
 6 files changed, 132 insertions(+), 9 deletions(-)

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index 9316130243c..80f7efe18b1 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,24 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+	    Gergö Barany  <gergo@codesourcery.com>
+
+	* gimple.h (enum gf_mask): Add new target kinds
+	GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED,
+	GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE, and
+	GF_OMP_TARGET_KIND_OACC_DATA_KERNELS.
+	(is_gimple_omp_oacc): Handle new target kinds.
+	(is_gimple_omp_offloaded): Likewise.
+	* gimple-pretty-print.c (dump_gimple_omp_target): Likewise.
+	* omp-expand.c (expand_omp_target): Likewise.
+	(build_omp_regions_1): Likewise.
+	(omp_make_gimple_edges): Likewise.
+	* omp-low.c (is_oacc_parallel_or_serial): Likewise.
+	(was_originally_oacc_kernels): New function.
+	(scan_omp_for): Update check for illegal nesting.
+	(check_omp_nesting_restrictions): Handle new target kinds.
+	(lower_oacc_reductions): Likewise.
+	(lower_omp_target): Likewise.
+	* omp-offload.c (execute_oacc_device_lower): Likewise.
+
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* flag-types.h (enum openacc_kernels): New type.
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index d2e3fddd7d8..cc6ee1860fc 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -1623,6 +1623,15 @@ dump_gimple_omp_target (pretty_printer *buffer, gomp_target *gs,
     case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
       kind = " oacc_host_data";
       break;
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+      kind = " oacc_parallel_kernels_parallelized";
+      break;
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+      kind = " oacc_parallel_kernels_gang_single";
+      break;
+    case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
+      kind = " oacc_data_kernels";
+      break;
     default:
       gcc_unreachable ();
     }
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 072b09b1fad..6d126aab3fd 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -184,6 +184,15 @@ enum gf_mask {
     GF_OMP_TARGET_KIND_OACC_DECLARE = 10,
     GF_OMP_TARGET_KIND_OACC_HOST_DATA = 11,
     GF_OMP_TARGET_KIND_OACC_SERIAL = 12,
+    /* A GF_OMP_TARGET_KIND_OACC_PARALLEL that originates from a 'kernels'
+       construct, parallelized.  */
+    GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED = 13,
+    /* A GF_OMP_TARGET_KIND_OACC_PARALLEL that originates from a 'kernels'
+       construct, "gang-single".  */
+    GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE = 14,
+    /* A GF_OMP_TARGET_KIND_OACC_DATA that originates from a 'kernels'
+       construct.  */
+    GF_OMP_TARGET_KIND_OACC_DATA_KERNELS = 15,
     GF_OMP_TEAMS_GRID_PHONY	= 1 << 0,
 
     /* True on an GIMPLE_OMP_RETURN statement if the return does not require
@@ -6306,6 +6315,9 @@ is_gimple_omp_oacc (const gimple *stmt)
 	case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
 	case GF_OMP_TARGET_KIND_OACC_DECLARE:
 	case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+	case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
 	  return true;
 	default:
 	  return false;
@@ -6331,6 +6343,8 @@ is_gimple_omp_offloaded (const gimple *stmt)
 	case GF_OMP_TARGET_KIND_OACC_PARALLEL:
 	case GF_OMP_TARGET_KIND_OACC_KERNELS:
 	case GF_OMP_TARGET_KIND_OACC_SERIAL:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
 	  return true;
 	default:
 	  return false;
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 42c491099fc..9de47de8a84 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -6960,6 +6960,8 @@ expand_omp_target (struct omp_region *region)
     {
     case GF_OMP_TARGET_KIND_OACC_PARALLEL:
     case GF_OMP_TARGET_KIND_OACC_SERIAL:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
       oacc_parallel = true;
       gcc_fallthrough ();
     case GF_OMP_TARGET_KIND_REGION:
@@ -6975,6 +6977,7 @@ expand_omp_target (struct omp_region *region)
     case GF_OMP_TARGET_KIND_DATA:
     case GF_OMP_TARGET_KIND_OACC_DATA:
     case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+    case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
       data_region = true;
       break;
     default:
@@ -6997,26 +7000,33 @@ expand_omp_target (struct omp_region *region)
   entry_bb = region->entry;
   exit_bb = region->exit;
 
+  /* Further down, all OpenACC compute constructs will be mapped to
+     BUILT_IN_GOACC_PARALLEL, and to distinguish between them, we now attach
+     attributes.  */
   switch (gimple_omp_target_kind (entry_stmt))
     {
     case GF_OMP_TARGET_KIND_OACC_KERNELS:
       mark_loops_in_oacc_kernels_region (region->entry, region->exit);
 
-      /* Further down, all OpenACC compute constructs will be mapped to
-	 BUILT_IN_GOACC_PARALLEL, and to distinguish between them, there
-	 is an "oacc kernels" attribute set for OpenACC kernels.  */
       DECL_ATTRIBUTES (child_fn)
 	= tree_cons (get_identifier ("oacc kernels"),
 		     NULL_TREE, DECL_ATTRIBUTES (child_fn));
       break;
     case GF_OMP_TARGET_KIND_OACC_SERIAL:
-      /* Further down, all OpenACC compute constructs will be mapped to
-	 BUILT_IN_GOACC_PARALLEL, and to distinguish between them, there
-	 is an "oacc serial" attribute set for OpenACC serial.  */
       DECL_ATTRIBUTES (child_fn)
 	= tree_cons (get_identifier ("oacc serial"),
 		     NULL_TREE, DECL_ATTRIBUTES (child_fn));
       break;
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+      DECL_ATTRIBUTES (child_fn)
+	= tree_cons (get_identifier ("oacc parallel_kernels_parallelized"),
+		     NULL_TREE, DECL_ATTRIBUTES (child_fn));
+      break;
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+      DECL_ATTRIBUTES (child_fn)
+	= tree_cons (get_identifier ("oacc parallel_kernels_gang_single"),
+		     NULL_TREE, DECL_ATTRIBUTES (child_fn));
+      break;
     default:
       break;
     }
@@ -7228,10 +7238,13 @@ expand_omp_target (struct omp_region *region)
     case GF_OMP_TARGET_KIND_OACC_KERNELS:
     case GF_OMP_TARGET_KIND_OACC_PARALLEL:
     case GF_OMP_TARGET_KIND_OACC_SERIAL:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
       start_ix = BUILT_IN_GOACC_PARALLEL;
       break;
     case GF_OMP_TARGET_KIND_OACC_DATA:
     case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+    case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
       start_ix = BUILT_IN_GOACC_DATA_START;
       break;
     case GF_OMP_TARGET_KIND_OACC_UPDATE:
@@ -8072,6 +8085,9 @@ build_omp_regions_1 (basic_block bb, struct omp_region *parent,
 		case GF_OMP_TARGET_KIND_OACC_SERIAL:
 		case GF_OMP_TARGET_KIND_OACC_DATA:
 		case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+		case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+		case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+		case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
 		  if (is_gimple_omp_oacc (stmt))
 		    region->kind = gimple_omp_target_kind (stmt);
 		  break;
@@ -8321,6 +8337,9 @@ omp_make_gimple_edges (basic_block bb, struct omp_region **region,
 	case GF_OMP_TARGET_KIND_OACC_SERIAL:
 	case GF_OMP_TARGET_KIND_OACC_DATA:
 	case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+	case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
 	  break;
 	case GF_OMP_TARGET_KIND_UPDATE:
 	case GF_OMP_TARGET_KIND_ENTER_DATA:
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 3ae39c33c9d..636a4a00307 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -166,7 +166,11 @@ is_oacc_parallel_or_serial (omp_context *ctx)
 	  && ((gimple_omp_target_kind (ctx->stmt)
 	       == GF_OMP_TARGET_KIND_OACC_PARALLEL)
 	      || (gimple_omp_target_kind (ctx->stmt)
-		  == GF_OMP_TARGET_KIND_OACC_SERIAL)));
+		  == GF_OMP_TARGET_KIND_OACC_SERIAL)
+	      || (gimple_omp_target_kind (ctx->stmt)
+		  == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED)
+	      || (gimple_omp_target_kind (ctx->stmt)
+		  == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE)));
 }
 
 /* Return true if CTX corresponds to an oacc kernels region.  */
@@ -180,6 +184,22 @@ is_oacc_kernels (omp_context *ctx)
 	      == GF_OMP_TARGET_KIND_OACC_KERNELS));
 }
 
+/* Return true if CTX corresponds to an oacc region that was generated from
+   an original kernels region that has been lowered to parallel regions.  */
+
+static bool
+was_originally_oacc_kernels (omp_context *ctx)
+{
+  enum gimple_code outer_type = gimple_code (ctx->stmt);
+  return ((outer_type == GIMPLE_OMP_TARGET)
+	  && ((gimple_omp_target_kind (ctx->stmt)
+	       == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED)
+	      || (gimple_omp_target_kind (ctx->stmt)
+		  == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE)
+	      || (gimple_omp_target_kind (ctx->stmt)
+		  == GF_OMP_TARGET_KIND_OACC_DATA_KERNELS)));
+}
+
 /* If DECL is the artificial dummy VAR_DECL created for non-static
    data member privatization, return the underlying "this" parameter,
    otherwise return NULL.  */
@@ -2421,7 +2441,8 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
     {
       omp_context *tgt = enclosing_target_ctx (outer_ctx);
 
-      if (!tgt || is_oacc_parallel_or_serial (tgt))
+      if (!tgt || (is_oacc_parallel_or_serial (tgt)
+                    && !was_originally_oacc_kernels (tgt)))
 	for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
 	  {
 	    char const *check = NULL;
@@ -2908,6 +2929,8 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
 		  case GF_OMP_TARGET_KIND_OACC_PARALLEL:
 		  case GF_OMP_TARGET_KIND_OACC_KERNELS:
 		  case GF_OMP_TARGET_KIND_OACC_SERIAL:
+		  case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+		  case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
 		    ok = true;
 		    break;
 
@@ -3331,6 +3354,11 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
 	      stmt_name = "enter/exit data"; break;
 	    case GF_OMP_TARGET_KIND_OACC_HOST_DATA: stmt_name = "host_data";
 	      break;
+	    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+	    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+	    case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
+	      /* These three cases arise from kernels conversion.  */
+	      stmt_name = "kernels"; break;
 	    default: gcc_unreachable ();
 	    }
 	  switch (gimple_omp_target_kind (ctx->stmt))
@@ -3346,6 +3374,11 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
 	    case GF_OMP_TARGET_KIND_OACC_DATA: ctx_stmt_name = "data"; break;
 	    case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
 	      ctx_stmt_name = "host_data"; break;
+	    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+	    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+	    case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
+	      /* These three cases arise from kernels conversion.  */
+	      ctx_stmt_name = "kernels"; break;
 	    default: gcc_unreachable ();
 	    }
 
@@ -5374,7 +5407,11 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
 		    if ((gimple_omp_target_kind (probe->stmt)
 			 != GF_OMP_TARGET_KIND_OACC_PARALLEL)
 			&& (gimple_omp_target_kind (probe->stmt)
-			    != GF_OMP_TARGET_KIND_OACC_SERIAL))
+			    != GF_OMP_TARGET_KIND_OACC_SERIAL)
+			&& (gimple_omp_target_kind (probe->stmt)
+			    != GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED)
+			&& (gimple_omp_target_kind (probe->stmt)
+			    != GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE))
 		      goto do_lookup;
 
 		    cls = gimple_omp_target_clauses (probe->stmt);
@@ -8149,11 +8186,14 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
     case GF_OMP_TARGET_KIND_OACC_UPDATE:
     case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
     case GF_OMP_TARGET_KIND_OACC_DECLARE:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
       data_region = false;
       break;
     case GF_OMP_TARGET_KIND_DATA:
     case GF_OMP_TARGET_KIND_OACC_DATA:
     case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+    case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
       data_region = true;
       break;
     default:
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 02b366ed4f7..2d265c22c3c 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1604,6 +1604,20 @@ execute_oacc_device_lower ()
   bool is_oacc_kernels_parallelized
     = (lookup_attribute ("oacc kernels parallelized",
 			 DECL_ATTRIBUTES (current_function_decl)) != NULL);
+  if (is_oacc_kernels_parallelized)
+    gcc_checking_assert (is_oacc_kernels);
+  bool is_oacc_parallel_kernels_parallelized
+    = (lookup_attribute ("oacc parallel_kernels_parallelized",
+			 DECL_ATTRIBUTES (current_function_decl)) != NULL);
+  if (is_oacc_parallel_kernels_parallelized)
+    gcc_checking_assert (!is_oacc_kernels);
+  bool is_oacc_parallel_kernels_gang_single
+    = (lookup_attribute ("oacc parallel_kernels_gang_single",
+			 DECL_ATTRIBUTES (current_function_decl)) != NULL);
+  if (is_oacc_parallel_kernels_gang_single)
+    gcc_checking_assert (!is_oacc_kernels);
+  gcc_checking_assert (!(is_oacc_parallel_kernels_parallelized
+			 && is_oacc_parallel_kernels_gang_single));
 
   /* Unparallelized OpenACC kernels constructs must get launched as 1 x 1 x 1
      kernels, so remove the parallelism dimensions function attributes
@@ -1627,6 +1641,12 @@ execute_oacc_device_lower ()
 	fprintf (dump_file, "Function is %s OpenACC kernels offload\n",
 		 (is_oacc_kernels_parallelized
 		  ? "parallelized" : "unparallelized"));
+      else if (is_oacc_parallel_kernels_parallelized)
+	fprintf (dump_file, "Function is %s OpenACC kernels offload\n",
+		 "parallel_kernels_parallelized");
+      else if (is_oacc_parallel_kernels_gang_single)
+	fprintf (dump_file, "Function is %s OpenACC kernels offload\n",
+		 "parallel_kernels_gang_single");
       else
 	fprintf (dump_file, "Function is OpenACC parallel offload\n");
     }
-- 
2.17.1


[-- Attachment #4: 0003-Separate-OpenACC-kernels-regions-in-data-and-paralle.patch --]
[-- Type: text/x-diff, Size: 24144 bytes --]

From b900846645ddacbe957cf0f6acdab03bcde68caa Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gerg=C3=B6=20Barany?= <gergo@codesourcery.com>
Date: Mon, 21 Jan 2019 05:28:20 -0800
Subject: [PATCH 3/9] Separate OpenACC kernels regions in data and parallel
 parts

This is the first in a series of patches that completely rework the handling
of the OpenACC "kernels" directive. In the future, kernels regions will be
transformed into data regions containing a sequence of serial and parallel
offloaded regions. This first patch sets up a new pass that is responsible
for this transformation, and in a first step constructs the new data region
containing a parallel region with the original kernels region's body.

	gcc/
	* Makefile.in: Add...
	* omp-oacc-kernels.c: ... this new file for the kernels conversion
	pass.
	* flag-types.h (enum openacc_kernels): Add "split" style.  Adjust
	all users.
	* doc/invoke.texi (-fopenacc-kernels): Update.
	* passes.def: Add pass_convert_oacc_kernels to pipeline.
	* tree-pass.h (make_pass_convert_oacc_kernels): Add declaration.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-conversion.c: New test.
	* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
	* c-c++-common/goacc/if-clause-2.c: Update.
	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-fortran/initialize_kernels_loops.f90:
	Update.
---
 gcc/ChangeLog.openacc                         |  12 +
 gcc/Makefile.in                               |   2 +
 gcc/c-family/c.opt                            |   6 +-
 gcc/doc/invoke.texi                           |  13 +-
 gcc/flag-types.h                              |   1 +
 gcc/fortran/lang.opt                          |   3 +-
 gcc/omp-oacc-kernels.c                        | 245 ++++++++++++++++++
 gcc/passes.def                                |   1 +
 gcc/testsuite/ChangeLog.openacc               |   8 +
 .../c-c++-common/goacc/if-clause-2.c          |   7 +
 .../c-c++-common/goacc/kernels-conversion.c   |  36 +++
 .../gfortran.dg/goacc/kernels-conversion.f95  |  33 +++
 .../gfortran.dg/goacc/kernels-tree.f95        |   6 +
 gcc/tree-pass.h                               |   1 +
 libgomp/ChangeLog.openacc                     |   3 +
 .../initialize_kernels_loops.f90              |   8 +-
 16 files changed, 379 insertions(+), 6 deletions(-)
 create mode 100644 gcc/omp-oacc-kernels.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index 80f7efe18b1..a330410f41d 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,15 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+	    Gergö Barany  <gergo@codesourcery.com>
+
+	* Makefile.in: Add...
+	* omp-oacc-kernels.c: ... this new file for the kernels conversion
+	pass.
+	* flag-types.h (enum openacc_kernels): Add "split" style.  Adjust
+	all users.
+	* doc/invoke.texi (-fopenacc-kernels): Update.
+	* passes.def: Add pass_convert_oacc_kernels to pipeline.
+	* tree-pass.h (make_pass_convert_oacc_kernels): Add declaration.
+
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 	    Gergö Barany  <gergo@codesourcery.com>
 
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 20bee0494b1..09685481787 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1420,6 +1420,7 @@ OBJS = \
 	omp-general.o \
 	omp-grid.o \
 	omp-low.o \
+	omp-oacc-kernels.o \
 	omp-simd-clone.o \
 	optabs.o \
 	optabs-libfuncs.o \
@@ -2565,6 +2566,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(srcdir)/coretypes.h \
   $(srcdir)/omp-offload.c \
   $(srcdir)/omp-expand.c \
   $(srcdir)/omp-low.c \
+  $(srcdir)/omp-oacc-kernels.c \
   $(srcdir)/targhooks.c $(out_file) $(srcdir)/passes.c $(srcdir)/cgraphunit.c \
   $(srcdir)/cgraphclones.c \
   $(srcdir)/tree-phinodes.c \
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 73b01598377..12f8f55c50f 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1618,11 +1618,15 @@ C ObjC C++ ObjC++ LTO Joined Var(flag_openacc_dims)
 Specify default OpenACC compute dimensions.
 
 fopenacc-kernels=
-C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS) Undocumented
+C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS)
+-fopenacc-kernels=[split|parloops]	Configure OpenACC 'kernels' constructs handling.
 
 Enum
 Name(openacc_kernels) Type(enum openacc_kernels)
 
+EnumValue
+Enum(openacc_kernels) String(split) Value(OPENACC_KERNELS_SPLIT)
+
 EnumValue
 Enum(openacc_kernels) String(parloops) Value(OPENACC_KERNELS_PARLOOPS)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 59421b84bac..3bbeb8c6839 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -183,7 +183,7 @@ in the following sections.
 -aux-info @var{filename}  -fallow-parameterless-variadic-functions @gol
 -fno-asm  -fno-builtin  -fno-builtin-@var{function}  -fgimple@gol
 -fhosted  -ffreestanding @gol
--fopenacc  -fopenacc-dim=@var{geom} @gol
+-fopenacc  -fopenacc-dim=@var{geom}  -fopenacc-kernels=@var{style} @gol
 -fopenmp  -fopenmp-simd @gol
 -fms-extensions  -fplan9-extensions  -fsso-struct=@var{endianness} @gol
 -fallow-single-precision  -fcond-mismatch  -flax-vector-conversions @gol
@@ -2158,6 +2158,17 @@ to runtime, the environment variable @var{GOMP_OPENACC_DIM} can be set.
 It has the same format as the option value, except that '-' is not
 permitted.
 
+@item -fopenacc-kernels=@var{style}
+@opindex fopenacc-kernels
+@cindex OpenACC accelerator programming
+Configure OpenACC 'kernels' constructs handling.
+With @option{-fopenacc-kernels=split}, OpenACC 'kernels' constructs
+are split into a sequence of compute constructs, each then handled
+individually.
+With @option{-fopenacc-kernels=parloops}, the whole OpenACC
+'kernels' constructs is handled by the @samp{parloops} pass.
+This is the default.
+
 @item -fopenmp
 @opindex fopenmp
 @cindex OpenMP parallel
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 910be7c7fd4..d5c655f648f 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -347,6 +347,7 @@ enum cf_protection_level
 /* OpenACC 'kernels' constructs handling.  */
 enum openacc_kernels
 {
+  OPENACC_KERNELS_SPLIT,
   OPENACC_KERNELS_PARLOOPS
 };
 #endif /* ! GCC_FLAG_TYPES_H */
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index 097c623eb50..b3c9cdb425f 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -643,7 +643,8 @@ Fortran LTO Joined Var(flag_openacc_dims)
 ; Documented in C
 
 fopenacc-kernels=
-Fortran RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS) Undocumented
+Fortran RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS)
+; Documented in C
 
 fopenmp
 Fortran LTO
diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c
new file mode 100644
index 00000000000..d1803774442
--- /dev/null
+++ b/gcc/omp-oacc-kernels.c
@@ -0,0 +1,245 @@
+/* Transformation pass for OpenACC kernels regions.  Converts a kernels
+   region into a series of smaller parallel regions.  There is a parallel
+   region for each parallelizable loop nest, as well as a "gang-single"
+   parallel region for each non-parallelizable piece of code.
+
+   Contributed by Gergö Barany <gergo@codesourcery.com> and
+                  Thomas Schwinge <thomas@codesourcery.com>
+
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "target.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "cgraph.h"
+#include "fold-const.h"
+#include "gimplify.h"
+#include "gimple-iterator.h"
+#include "gimple-walk.h"
+#include "gomp-constants.h"
+
+/* This is a preprocessing pass to be run immediately before lower_omp.  It
+   will convert OpenACC "kernels" regions into sequences of "parallel"
+   regions.
+   For now, the translation is as follows:
+   - The entire kernels region is turned into a data region with clauses
+     taken from the kernels region.  New "create" clauses are added for all
+     variables declared at the top level in the kernels region.  */
+
+/* Transform KERNELS_REGION, which is an OpenACC kernels region, into a data
+   region containing the original kernels region.  */
+
+static gimple *
+transform_kernels_region (gimple *kernels_region)
+{
+  gcc_checking_assert (gimple_omp_target_kind (kernels_region)
+                        == GF_OMP_TARGET_KIND_OACC_KERNELS);
+
+  /* Collect the kernels region's data clauses and create the new data
+     region with those clauses.  */
+  tree kernels_clauses = gimple_omp_target_clauses (kernels_region);
+  tree data_clauses = NULL;
+  for (tree c = kernels_clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      /* Certain map clauses are copied to the enclosing data region.  Any
+         non-data clause remains on the kernels region.  */
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP)
+        {
+          tree decl = OMP_CLAUSE_DECL (c);
+          HOST_WIDE_INT kind = OMP_CLAUSE_MAP_KIND (c);
+          switch (kind)
+            {
+            default:
+              if (kind == GOMP_MAP_ALLOC &&
+                  integer_zerop (OMP_CLAUSE_SIZE (c)))
+                /* ??? This is an alloc clause for mapping a pointer whose
+                   target is already mapped.  We leave these on the inner
+                   parallel regions because moving them to the outer data
+                   region causes runtime errors.  */
+                break;
+
+              /* For non-artificial variables, and for non-declaration
+                 expressions like A[0:n], copy the clause to the data
+                 region.  */
+              if ((DECL_P (decl) && !DECL_ARTIFICIAL (decl))
+                  || !DECL_P (decl))
+                {
+                  tree new_clause = build_omp_clause (OMP_CLAUSE_LOCATION (c),
+                                                      OMP_CLAUSE_MAP);
+                  OMP_CLAUSE_SET_MAP_KIND (new_clause, kind);
+                  /* This must be unshared here to avoid "incorrect sharing
+                     of tree nodes" errors from verify_gimple.  */
+                  OMP_CLAUSE_DECL (new_clause) = unshare_expr (decl);
+                  OMP_CLAUSE_SIZE (new_clause) = OMP_CLAUSE_SIZE (c);
+                  OMP_CLAUSE_CHAIN (new_clause) = data_clauses;
+                  data_clauses = new_clause;
+
+                  /* Now that this data is mapped, the inner data clause on
+                     the kernels region can become a present clause.  */
+                  OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_FORCE_PRESENT);
+                }
+              break;
+
+            case GOMP_MAP_POINTER:
+            case GOMP_MAP_TO_PSET:
+            case GOMP_MAP_FORCE_TOFROM:
+            case GOMP_MAP_FIRSTPRIVATE_POINTER:
+            case GOMP_MAP_FIRSTPRIVATE_REFERENCE:
+              /* ??? Copying these map kinds leads to internal compiler
+                 errors in later passes.  */
+              break;
+            }
+        }
+      else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IF)
+        {
+          /* If there is an if clause, it must also be present on the
+             enclosing data region.  Temporarily remove the if clause's
+             chain to avoid copying it.  */
+          tree saved_chain = OMP_CLAUSE_CHAIN (c);
+          OMP_CLAUSE_CHAIN (c) = NULL;
+          tree new_if_clause = unshare_expr (c);
+          OMP_CLAUSE_CHAIN (c) = saved_chain;
+          OMP_CLAUSE_CHAIN (new_if_clause) = data_clauses;
+          data_clauses = new_if_clause;
+        }
+    }
+  /* Restore the original order of the clauses.  */
+  data_clauses = nreverse (data_clauses);
+
+  gimple *data_region
+    = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_DATA_KERNELS,
+                               data_clauses);
+  gimple_set_location (data_region, gimple_location (kernels_region));
+
+  /* For now, just construct a new parallel region inside the data region.  */
+  gimple *inner_region
+    = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_PARALLEL,
+                               kernels_clauses);
+  gimple_set_location (inner_region, gimple_location (kernels_region));
+  gimple_omp_set_body (inner_region, gimple_omp_body (kernels_region));
+
+  gbind *bind = gimple_build_bind (NULL, NULL, NULL);
+  gimple_bind_add_stmt (bind, inner_region);
+
+  /* Put the transformed pieces together.  The entire body of the region is
+     wrapped in a try-finally statement that calls __builtin_GOACC_data_end
+     for cleanup.  */
+  tree data_end_fn = builtin_decl_explicit (BUILT_IN_GOACC_DATA_END);
+  gimple *call = gimple_build_call (data_end_fn, 0);
+  gimple_seq cleanup = NULL;
+  gimple_seq_add_stmt (&cleanup, call);
+  gimple *try_stmt = gimple_build_try (bind, cleanup, GIMPLE_TRY_FINALLY);
+  gimple_omp_set_body (data_region, try_stmt);
+
+  return data_region;
+}
+
+/* Helper function of convert_oacc_kernels for walking the tree, calling
+   transform_kernels_region on each kernels region found.  */
+
+static tree
+scan_kernels (gimple_stmt_iterator *gsi_p, bool *handled_ops_p,
+              struct walk_stmt_info *)
+{
+  gimple *stmt = gsi_stmt (*gsi_p);
+  *handled_ops_p = false;
+
+  int kind;
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_OMP_TARGET:
+      kind = gimple_omp_target_kind (stmt);
+      if (kind == GF_OMP_TARGET_KIND_OACC_KERNELS)
+        {
+          gimple *new_region = transform_kernels_region (stmt);
+          gsi_replace (gsi_p, new_region, false);
+          *handled_ops_p = true;
+        }
+      break;
+
+    default:
+      break;
+    }
+
+  return NULL;
+}
+
+/* Find and transform OpenACC kernels regions in the current function.  */
+
+static unsigned int
+convert_oacc_kernels (void)
+{
+  struct walk_stmt_info wi;
+  gimple_seq body = gimple_body (current_function_decl);
+
+  memset (&wi, 0, sizeof (wi));
+  walk_gimple_seq_mod (&body, scan_kernels, NULL, &wi);
+
+  gimple_set_body (current_function_decl, body);
+
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_convert_oacc_kernels =
+{
+  GIMPLE_PASS, /* type */
+  "convert_oacc_kernels", /* name */
+  OPTGROUP_OMP, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_gimple_any, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_convert_oacc_kernels : public gimple_opt_pass
+{
+public:
+  pass_convert_oacc_kernels (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_convert_oacc_kernels, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+  {
+    return (flag_openacc
+	    && flag_openacc_kernels == OPENACC_KERNELS_SPLIT);
+  }
+  virtual unsigned int execute (function *)
+  {
+    return convert_oacc_kernels ();
+  }
+
+}; // class pass_convert_oacc_kernels
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_convert_oacc_kernels (gcc::context *ctxt)
+{
+  return new pass_convert_oacc_kernels (ctxt);
+}
diff --git a/gcc/passes.def b/gcc/passes.def
index 3ebcfc30349..4840bb6cff7 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_warn_unused_result);
   NEXT_PASS (pass_diagnose_omp_blocks);
   NEXT_PASS (pass_diagnose_tm_blocks);
+  NEXT_PASS (pass_convert_oacc_kernels);
   NEXT_PASS (pass_lower_omp);
   NEXT_PASS (pass_lower_cf);
   NEXT_PASS (pass_lower_tm);
diff --git a/gcc/testsuite/ChangeLog.openacc b/gcc/testsuite/ChangeLog.openacc
index 4acd174dca9..887011e7d1f 100644
--- a/gcc/testsuite/ChangeLog.openacc
+++ b/gcc/testsuite/ChangeLog.openacc
@@ -1,3 +1,11 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+	    Gergö Barany  <gergo@codesourcery.com>
+
+	* c-c++-common/goacc/kernels-conversion.c: New test.
+	* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
+	* c-c++-common/goacc/if-clause-2.c: Update.
+	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
+
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* c-c++-common/goacc/kernels-1.c: Add
diff --git a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
index 5ab8459d732..e17b5dd1107 100644
--- a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=split" } */
+/* { dg-additional-options "-fdump-tree-convert_oacc_kernels" } */
+
 void
 f (short c)
 {
@@ -9,3 +12,7 @@ f (short c)
   ;
 #pragma acc update device(c) if(c)
 }
+
+/* Verify that the 'if' clause gets duplicated.
+   { dg-final { scan-tree-dump-times "#pragma omp target oacc_data_kernels if\\(" 1 "convert_oacc_kernels" } }
+   { dg-final { scan-tree-dump-times "#pragma omp target oacc_parallel_kernels_gang_single .* if\\(" 1 "convert_oacc_kernels" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
new file mode 100644
index 00000000000..c75db375f26
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
@@ -0,0 +1,36 @@
+/* { dg-additional-options "-fopenacc-kernels=split" } */
+/* { dg-additional-options "-fdump-tree-convert_oacc_kernels" } */
+
+#define N 1024
+
+unsigned int a[N];
+
+int
+main (void)
+{
+  int i;
+  unsigned int sum = 1;
+
+#pragma acc kernels copyin(a[0:N]) copy(sum)
+  {
+    #pragma acc loop
+    for (i = 0; i < N; ++i)
+      sum += a[i];
+
+    sum++;
+
+    #pragma acc loop
+    for (i = 0; i < N; ++i)
+      sum += a[i];
+  }
+
+  return 0;
+}
+
+/* Check that the kernels region is split into a data region and an enclosed
+   parallel region.  */ 
+/* { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 "convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "oacc_parallel" 1 "convert_oacc_kernels" } } */
+
+/* Check that the original kernels region is removed.  */
+/* { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
new file mode 100644
index 00000000000..8c663302a6f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
@@ -0,0 +1,33 @@
+! { dg-additional-options "-fopenacc-kernels=split" }
+! { dg-additional-options "-fdump-tree-convert_oacc_kernels" }
+
+program main
+  implicit none
+  integer, parameter         :: N = 1024
+  integer, dimension (1:N)   :: a
+  integer                    :: i, sum
+
+  !$acc kernels copyin(a(1:N)) copy(sum)
+
+  !$acc loop
+  do i = 1, N
+    sum = sum + a(i)
+  end do
+
+  sum = sum + 1
+
+  !$acc loop
+  do i = 1, N
+    sum = sum + a(i)
+  end do
+
+  !$acc end kernels
+end program main
+
+! Check that the kernels region is split into a data region and an enclosed
+! parallel region.
+! { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 "convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_parallel" 1 "convert_oacc_kernels" } }
+
+! Check that the original kernels region is removed.
+! { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
index a70f1e737bd..b83ca2d8f06 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
@@ -1,5 +1,7 @@
 ! { dg-do compile } 
 ! { dg-additional-options "-fdump-tree-original" } 
+! { dg-additional-options "-fopenacc-kernels=split" }
+! { dg-additional-options "-fdump-tree-convert_oacc_kernels" }
 
 program test
   implicit none
@@ -33,3 +35,7 @@ end program test
 ! { dg-final { scan-tree-dump-times "map\\(alloc:t\\)" 1 "original" } } 
 
 ! { dg-final { scan-tree-dump-times "map\\(force_deviceptr:u\\)" 1 "original" } } 
+
+! Verify that the 'if' clause gets duplicated.
+! { dg-final { scan-tree-dump-times "#pragma omp target oacc_data_kernels if\\(" 1 "convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target oacc_parallel_kernels_gang_single .* if\\(" 1 "convert_oacc_kernels" } }
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 93a6a99eb7a..24f2110bb1d 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -416,6 +416,7 @@ extern gimple_opt_pass *make_pass_lower_switch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_vector (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_vector_ssa (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_convert_oacc_kernels (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc
index bf572ee8922..f2ff2ee32d2 100644
--- a/libgomp/ChangeLog.openacc
+++ b/libgomp/ChangeLog.openacc
@@ -1,5 +1,8 @@
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* testsuite/libgomp.oacc-fortran/initialize_kernels_loops.f90:
+	Update.
+
 	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c:
 	Add "-fopenacc-kernels=parloops".
 	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c:
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/initialize_kernels_loops.f90 b/libgomp/testsuite/libgomp.oacc-fortran/initialize_kernels_loops.f90
index 8eb02b88d25..35e909f8278 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/initialize_kernels_loops.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/initialize_kernels_loops.f90
@@ -1,16 +1,18 @@
 ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=split" }
+! { dg-additional-options "-fopt-info-optimized-omp" }
 
 subroutine kernel(lo, hi, a, b, c)
     implicit none
     integer :: lo, hi, i
     real, dimension(lo:hi) :: a, b, c
 
-!$acc kernels ! { dg-bogus "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "TODO" { xfail { openacc_nvidia_accel_selected && opt_levels_2_plus } } }
-!$acc loop independent
+!$acc kernels
+!$acc loop independent ! { dg-warning "note: parallelized loop nest in OpenACC .kernels. construct" }
     do i = lo, hi
       b(i) = a(i)
     end do
-!$acc loop independent
+!$acc loop independent ! { dg-warning "note: parallelized loop nest in OpenACC .kernels. construct" }
     do i = lo, hi
       c(i) = b(i)
     end do
-- 
2.17.1


[-- Attachment #5: 0004-Turn-OpenACC-kernels-regions-into-a-sequence-of-para.patch --]
[-- Type: text/x-diff, Size: 31628 bytes --]

From 2e63393893fac8e9abcf9665a2055f90dce76e86 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gerg=C3=B6=20Barany?= <gergo@codesourcery.com>
Date: Mon, 21 Jan 2019 07:16:06 -0800
Subject: [PATCH 4/9] Turn OpenACC kernels regions into a sequence of parallel
 regions

This patch decomposes each OpenACC kernels region into a sequence of
parallel regions. Each OpenACC loop nest turns into its own region; any code
between such loop nests is gathered up into a region as well. The loop
regions can be distributed across gangs if the original kernels region had a
num_gangs clause, while the other regions are executed in "gang-single"
mode. The implied default "auto" clause on kernels loops is made explicit
unless there is a conflicting clause.

    gcc/
    * omp-oacc-kernels.c (top_level_omp_for_in_stmt): New function.
    (make_gang_single_region): Likewise.
    (transform_kernels_loop_clauses, make_gang_parallel_loop_region):
    Likewise.
    (flatten_binds): Likewise.
    (make_data_region_try_statement): Likewise.
    (maybe_build_inner_data_region): Likewise.
    (decompose_kernels_region_body): Likewise.
    (transform_kernels_region): Delegate to decompose_kernels_region_body
    and make_data_region_try_statement.

    gcc/testsuite/
    * c-c++-common/goacc/kernels-conversion.c: Test for a gang-single
    region.
    * gfortran.dg/goacc/kernels-conversion.f95: Likewise.
---
 gcc/ChangeLog.openacc                         |  14 +
 gcc/omp-oacc-kernels.c                        | 558 +++++++++++++++++-
 gcc/testsuite/ChangeLog.openacc               |   7 +
 .../c-c++-common/goacc/kernels-conversion.c   |  11 +-
 .../gfortran.dg/goacc/kernels-conversion.f95  |  11 +-
 5 files changed, 578 insertions(+), 23 deletions(-)

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index a330410f41d..6fa92ee2731 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,17 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+	    Gergö Barany  <gergo@codesourcery.com>
+
+	* omp-oacc-kernels.c (top_level_omp_for_in_stmt): New function.
+	(make_gang_single_region): Likewise.
+	(transform_kernels_loop_clauses, make_gang_parallel_loop_region):
+	Likewise.
+	(flatten_binds): Likewise.
+	(make_data_region_try_statement): Likewise.
+	(maybe_build_inner_data_region): Likewise.
+	(decompose_kernels_region_body): Likewise.
+	(transform_kernels_region): Delegate to decompose_kernels_region_body
+	and make_data_region_try_statement.
+
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 	    Gergö Barany  <gergo@codesourcery.com>
 
diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c
index d1803774442..6e083666a17 100644
--- a/gcc/omp-oacc-kernels.c
+++ b/gcc/omp-oacc-kernels.c
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "backend.h"
 #include "target.h"
 #include "tree.h"
+#include "cp/cp-tree.h"
 #include "gimple.h"
 #include "tree-pass.h"
 #include "cgraph.h"
@@ -45,16 +46,548 @@ along with GCC; see the file COPYING3.  If not see
    For now, the translation is as follows:
    - The entire kernels region is turned into a data region with clauses
      taken from the kernels region.  New "create" clauses are added for all
-     variables declared at the top level in the kernels region.  */
+     variables declared at the top level in the kernels region.
+   - Any loop annotated with an OpenACC loop directive is wrapped in a new
+     parallel region.  Gang/worker/vector annotations are copied from the
+     original kernels region if present.
+     * Loops without an explicit "independent" or "seq" annotation get an
+       "auto" annotation; other annotations are preserved on the loop or
+       moved to the new surrounding parallel region.  Which annotations are
+       moved is determined by the constraints in the OpenACC spec; for
+       example, loops in the kernels region may have a gang clause, but
+       such annotations must now be moved to the new parallel region.
+   - Any sequences of other code (non-loops, non-OpenACC loops) are wrapped
+     in new "gang-single" parallel regions: Worker/vector annotations are
+     copied from the original kernels region if present, but num_gangs is
+     explicitly set to 1.  */
+
+/* Helper function for decompose_kernels_region_body.  If STMT contains a
+   "top-level" OMP_FOR statement, returns a pointer to that statement;
+   returns NULL otherwise.
+
+   A "top-level" OMP_FOR statement is one that is possibly accompanied by
+   small snippets of setup code.  Specifically, this function accepts an
+   OMP_FOR possibly wrapped in a singleton bind and a singleton try
+   statement to allow for a local loop variable, but not an OMP_FOR
+   statement nested in any other constructs.  Alternatively, it accepts a
+   non-singleton bind containing only assignments and then an OMP_FOR
+   statement at the very end.  The former style can be generated by the C
+   frontend, the latter by the Fortran frontend.  */
+
+static gimple *
+top_level_omp_for_in_stmt (gimple *stmt)
+{
+  if (gimple_code (stmt) == GIMPLE_OMP_FOR)
+    return stmt;
+
+  if (gimple_code (stmt) == GIMPLE_BIND)
+    {
+      gimple_seq body = gimple_bind_body (as_a <gbind *> (stmt));
+      if (gimple_seq_singleton_p (body))
+        {
+          /* Accept an OMP_FOR statement, or a try statement containing only
+             a single OMP_FOR.  */
+          gimple *maybe_for_or_try = gimple_seq_first_stmt (body);
+          if (gimple_code (maybe_for_or_try) == GIMPLE_OMP_FOR)
+            return maybe_for_or_try;
+          else if (gimple_code (maybe_for_or_try) == GIMPLE_TRY)
+            {
+              gimple_seq try_body = gimple_try_eval (maybe_for_or_try);
+              if (!gimple_seq_singleton_p (try_body))
+                return NULL;
+              gimple *maybe_omp_for_stmt = gimple_seq_first_stmt (try_body);
+              if (gimple_code (maybe_omp_for_stmt) == GIMPLE_OMP_FOR)
+                return maybe_omp_for_stmt;
+            }
+        }
+      else
+        {
+          gimple_stmt_iterator gsi;
+          /* Accept only a block of optional assignments followed by an
+             OMP_FOR at the end.  No other kinds of statements allowed.  */
+          for (gsi = gsi_start (body); !gsi_end_p (gsi); gsi_next (&gsi))
+            {
+              gimple *body_stmt = gsi_stmt (gsi);
+              if (gimple_code (body_stmt) == GIMPLE_ASSIGN)
+                continue;
+              else if (gimple_code (body_stmt) == GIMPLE_OMP_FOR
+                        && gsi_one_before_end_p (gsi))
+                return body_stmt;
+              else
+                return NULL;
+            }
+        }
+    }
+
+  return NULL;
+}
+
+/* Construct a "gang-single" OpenACC parallel region at LOC containing the
+   STMTS.  The newly created region is annotated with CLAUSES, which must
+   not contain a num_gangs clause, and an additional "num_gangs(1)" clause
+   to force gang-single execution.  */
+
+static gimple *
+make_gang_single_region (location_t loc, gimple_seq stmts, tree clauses)
+{
+  /* This correctly unshares the entire clause chain rooted here.  */
+  clauses = unshare_expr (clauses);
+  /* Make a num_gangs(1) clause.  */
+  tree gang_single_clause = build_omp_clause (loc, OMP_CLAUSE_NUM_GANGS);
+  OMP_CLAUSE_OPERAND (gang_single_clause, 0) = integer_one_node;
+  OMP_CLAUSE_CHAIN (gang_single_clause) = clauses;
+
+  /* Build the gang-single region.  */
+  gimple *single_region
+    = gimple_build_omp_target (
+        NULL,
+        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE,
+        gang_single_clause);
+  gimple_set_location (single_region, loc);
+  gbind *single_body = gimple_build_bind (NULL, stmts, make_node (BLOCK));
+  gimple_omp_set_body (single_region, single_body);
+
+  return single_region;
+}
+
+/* Helper for make_region_loop_nest.  Transform OpenACC 'kernels'/'loop'
+   construct clauses into OpenACC 'parallel'/'loop' construct ones.  */
+
+static tree
+transform_kernels_loop_clauses (gimple *omp_for,
+				tree num_gangs_clause,
+				tree clauses)
+{
+  /* If this loop in a kernels region does not have an explicit
+     "independent", "seq", or "auto" clause, we must give it an explicit
+     "auto" clause. */
+  bool add_auto_clause = true;
+  tree loop_clauses = gimple_omp_for_clauses (omp_for);
+  for (tree c = loop_clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_AUTO
+          || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_INDEPENDENT
+          || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SEQ)
+        {
+          add_auto_clause = false;
+          break;
+        }
+    }
+  if (add_auto_clause)
+    {
+      tree auto_clause = build_omp_clause (gimple_location (omp_for),
+                                           OMP_CLAUSE_AUTO);
+      OMP_CLAUSE_CHAIN (auto_clause) = loop_clauses;
+      gimple_omp_for_set_clauses (omp_for, auto_clause);
+    }
+
+  /* If the kernels region had a num_gangs clause, add that to this new
+     parallel region.  */
+  if (num_gangs_clause != NULL)
+    {
+      tree parallel_num_gangs_clause = unshare_expr (num_gangs_clause);
+      OMP_CLAUSE_CHAIN (parallel_num_gangs_clause) = clauses;
+      clauses = parallel_num_gangs_clause;
+    }
+
+  return clauses;
+}
+
+/* Construct a possibly gang-parallel OpenACC parallel region containing the
+   STMT, which must be identical to, or a bind containing, the loop OMP_FOR
+   with OpenACC loop annotations.
+
+   The newly created region is annotated with the optional NUM_GANGS_CLAUSE
+   as well as the other CLAUSES, which must not contain a num_gangs clause.  */
+
+static gimple *
+make_gang_parallel_loop_region (gimple *omp_for, gimple *stmt,
+                                tree num_gangs_clause, tree clauses)
+{
+  /* This correctly unshares the entire clause chain rooted here.  */
+  clauses = unshare_expr (clauses);
+
+  clauses = transform_kernels_loop_clauses (omp_for,
+					    num_gangs_clause,
+					    clauses);
+
+  /* Now build the parallel region containing this loop.  */
+  gimple_seq parallel_body = NULL;
+  gimple_seq_add_stmt (&parallel_body, stmt);
+  gimple *parallel_body_bind
+    = gimple_build_bind (NULL, parallel_body, make_node (BLOCK));
+  gimple *parallel_region
+    = gimple_build_omp_target (
+        parallel_body_bind,
+        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED,
+        clauses);
+  gimple_set_location (parallel_region, gimple_location (stmt));
+
+  return parallel_region;
+}
+
+/* Eliminate any binds directly inside BIND by adding their statements to
+   BIND (i.e., modifying it in place), excluding binds that hold only an
+   OMP_FOR loop and associated setup/cleanup code.  Recurse into binds but
+   not other statements.  Return a chain of the local variables of eliminated
+   binds, i.e., the local variables found in nested binds.  If
+   INCLUDE_TOPLEVEL_VARS is true, this also includes the variables belonging
+   to BIND itself. */
+
+static tree
+flatten_binds (gbind *bind, bool include_toplevel_vars = false)
+{
+  tree vars = NULL, last_var = NULL;
+
+  if (include_toplevel_vars)
+    {
+      vars = gimple_bind_vars (bind);
+      last_var = vars;
+    }
+
+  gimple_seq new_body = NULL;
+  gimple_seq body_sequence = gimple_bind_body (bind);
+  gimple_stmt_iterator gsi, gsi_n;
+  for (gsi = gsi_start (body_sequence); !gsi_end_p (gsi); gsi = gsi_n)
+    {
+      /* Advance the iterator here because otherwise it would be invalidated
+         by moving statements below.  */
+      gsi_n = gsi;
+      gsi_next (&gsi_n);
+
+      gimple *stmt = gsi_stmt (gsi);
+      /* Flatten bind statements, except the ones that contain only an
+         OpenACC for loop.  */
+      if (gimple_code (stmt) == GIMPLE_BIND
+          && !top_level_omp_for_in_stmt (stmt))
+        {
+          gbind *inner_bind = as_a <gbind *> (stmt);
+          /* Flatten recursively, and collect all variables.  */
+          tree inner_vars = flatten_binds (inner_bind, true);
+          gimple_seq inner_sequence = gimple_bind_body (inner_bind);
+          gcc_assert (gimple_code (inner_sequence) != GIMPLE_BIND
+                      || top_level_omp_for_in_stmt (inner_sequence));
+          gimple_seq_add_seq (&new_body, inner_sequence);
+          /* Find the last variable; we will append others to it.  */
+          while (last_var != NULL && TREE_CHAIN (last_var) != NULL)
+            last_var = TREE_CHAIN (last_var);
+          if (last_var != NULL)
+            {
+              TREE_CHAIN (last_var) = inner_vars;
+              last_var = inner_vars;
+            }
+          else
+            {
+              vars = inner_vars;
+              last_var = vars;
+            }
+        }
+      else
+        gimple_seq_add_stmt (&new_body, stmt);
+    }
+
+  /* Put the possibly transformed body back into the bind.  */
+  gimple_bind_set_body (bind, new_body);
+  return vars;
+}
+
+/* Helper function for places where we construct data regions.  Wraps the BODY
+   inside a try-finally construct at LOC that calls __builtin_GOACC_data_end
+   in its cleanup block.  Returns this try statement.  */
+
+static gimple *
+make_data_region_try_statement (location_t loc, gimple *body)
+{
+  tree data_end_fn = builtin_decl_explicit (BUILT_IN_GOACC_DATA_END);
+  gimple *call = gimple_build_call (data_end_fn, 0);
+  gimple_seq cleanup = NULL;
+  gimple_seq_add_stmt (&cleanup, call);
+  gimple *try_stmt = gimple_build_try (body, cleanup, GIMPLE_TRY_FINALLY);
+  gimple_set_location (body, loc);
+  return try_stmt;
+}
+
+/* If INNER_BIND_VARS holds variables, build an OpenACC data region with
+   location LOC containing BODY and having "create(var)" clauses for each
+   variable.  If INNER_CLEANUP is present, add a try-finally statement with
+   this cleanup code in the finally block.  Return the new data region, or
+   the original BODY if no data region was needed.  */
+
+static gimple *
+maybe_build_inner_data_region (location_t loc, gimple *body,
+                               tree inner_bind_vars, gimple *inner_cleanup)
+{
+  /* Build data "create(var)" clauses for these local variables.
+     Below we will add these to a data region enclosing the entire body
+     of the decomposed kernels region.  */
+  tree prev_mapped_var = NULL, next = NULL, artificial_vars = NULL,
+       inner_data_clauses = NULL;
+  for (tree v = inner_bind_vars; v; v = next)
+    {
+      next = TREE_CHAIN (v);
+      if (DECL_ARTIFICIAL (v)
+          || TREE_CODE (v) == CONST_DECL
+          || (DECL_LANG_SPECIFIC (current_function_decl)
+              && DECL_TEMPLATE_INSTANTIATION (current_function_decl)))
+        {
+          /* If this is an artificial temporary, it need not be mapped.  We
+             move its declaration into the bind inside the data region.
+             Also avoid mapping variables if we are inside a template
+             instantiation; the code does not contain all the copies to
+             temporaries that would make this legal.  */
+          TREE_CHAIN (v) = artificial_vars;
+          artificial_vars = v;
+          if (prev_mapped_var != NULL)
+            TREE_CHAIN (prev_mapped_var) = next;
+          else
+            inner_bind_vars = next;
+        }
+      else
+        {
+          /* Otherwise, build the map clause.  */
+          tree new_clause = build_omp_clause (loc, OMP_CLAUSE_MAP);
+          OMP_CLAUSE_SET_MAP_KIND (new_clause, GOMP_MAP_ALLOC);
+          OMP_CLAUSE_DECL (new_clause) = v;
+          OMP_CLAUSE_SIZE (new_clause) = DECL_SIZE_UNIT (v);
+          OMP_CLAUSE_CHAIN (new_clause) = inner_data_clauses;
+          inner_data_clauses = new_clause;
+
+          prev_mapped_var = v;
+        }
+    }
+
+  if (artificial_vars)
+    body = gimple_build_bind (artificial_vars, body, make_node (BLOCK));
+
+  /* If we determined above that there are variables that need to be created
+     on the device, construct a data region for them and wrap the body
+     inside that.  */
+  if (inner_data_clauses != NULL)
+    {
+      gcc_assert (inner_bind_vars != NULL);
+      gimple *inner_data_region
+        = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_DATA_KERNELS,
+                                   inner_data_clauses);
+      gimple_set_location (inner_data_region, loc);
+      /* Make sure __builtin_GOACC_data_end is called at the end.  */
+      gimple *try_stmt = make_data_region_try_statement (loc, body);
+      gimple_omp_set_body (inner_data_region, try_stmt);
+      gimple *bind_body;
+      if (inner_cleanup != NULL)
+          /* Clobber all the inner variables that need to be clobbered.  */
+          bind_body = gimple_build_try (inner_data_region, inner_cleanup,
+                                        GIMPLE_TRY_FINALLY);
+      else
+          bind_body = inner_data_region;
+      body = gimple_build_bind (inner_bind_vars, bind_body, make_node (BLOCK));
+    }
+
+  return body;
+}
+
+/* Decompose the body of the KERNELS_REGION, which was originally annotated
+   with the KERNELS_CLAUSES, into a series of parallel regions.  */
+
+static gimple *
+decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
+{
+  location_t loc = gimple_location (kernels_region);
+
+  /* The kernels clauses will be propagated to the child clauses unmodified,
+     except that that num_gangs clause will only be added to loop regions.
+     The other regions are "gang-single" and get an explicit num_gangs(1)
+     clause.  So separate out the num_gangs clause here.  */
+  tree num_gangs_clause = NULL, prev_clause = NULL;
+  tree parallel_clauses = kernels_clauses;
+  for (tree c = parallel_clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_NUM_GANGS)
+        {
+          /* Cut this clause out of the chain.  */
+          num_gangs_clause = c;
+          if (prev_clause != NULL)
+            OMP_CLAUSE_CHAIN (prev_clause) = OMP_CLAUSE_CHAIN (c);
+          else
+            kernels_clauses = OMP_CLAUSE_CHAIN (c);
+          OMP_CLAUSE_CHAIN (num_gangs_clause) = NULL;
+          break;
+        }
+      else
+        prev_clause = c;
+    }
+
+  gimple *kernels_body = gimple_omp_body (kernels_region);
+  gbind *kernels_bind = as_a <gbind *> (kernels_body);
+
+  /* The body of the region may contain other nested binds declaring inner
+     local variables.  Collapse all these binds into one to ensure that we
+     have a single sequence of statements to iterate over; also, collect all
+     inner variables.  */
+  tree inner_bind_vars = flatten_binds (kernels_bind);
+  gimple_seq body_sequence = gimple_bind_body (kernels_bind);
+
+  /* All these inner variables will get allocated on the device (below, by
+     calling maybe_build_inner_data_region).  Here we create "present"
+     clauses for them and add these clauses to the list of clauses to be
+     attached to each inner parallel region.  */
+  tree present_clauses = kernels_clauses;
+  for (tree var = inner_bind_vars; var; var = TREE_CHAIN (var))
+    {
+      if (!DECL_ARTIFICIAL (var) && TREE_CODE (var) != CONST_DECL)
+        {
+          tree present_clause = build_omp_clause (loc, OMP_CLAUSE_MAP);
+          OMP_CLAUSE_SET_MAP_KIND (present_clause, GOMP_MAP_FORCE_PRESENT);
+          OMP_CLAUSE_DECL (present_clause) = var;
+          OMP_CLAUSE_SIZE (present_clause) = DECL_SIZE_UNIT (var);
+          OMP_CLAUSE_CHAIN (present_clause) = present_clauses;
+          present_clauses = present_clause;
+        }
+    }
+  kernels_clauses = present_clauses;
+
+  /* In addition to nested binds, the "real" body of the region may be
+     nested inside a try-finally block.  Find its cleanup block, which
+     contains code to clobber the local variables that must be clobbered.  */
+  gimple *inner_cleanup = NULL;
+  if (body_sequence != NULL && gimple_code (body_sequence) == GIMPLE_TRY)
+    {
+      if (gimple_seq_singleton_p (body_sequence))
+        {
+          /* The try statement is the only thing inside the bind.  */
+          inner_cleanup = gimple_try_cleanup (body_sequence);
+          body_sequence = gimple_try_eval (body_sequence);
+        }
+      else
+        {
+          /* The bind's body starts with a try statement, but it is followed
+             by other things.  */
+          gimple_stmt_iterator gsi = gsi_start (body_sequence);
+          gimple *try_stmt = gsi_stmt (gsi);
+          inner_cleanup = gimple_try_cleanup (try_stmt);
+          gimple *try_body = gimple_try_eval (try_stmt);
+
+          gsi_remove (&gsi, false);
+          /* Now gsi indicates the sequence of statements after the try
+             statement in the bind.  Append the statement in the try body and
+             the trailing statements from gsi.  */
+          gsi_insert_seq_before (&gsi, try_body, GSI_CONTINUE_LINKING);
+          body_sequence = gsi_stmt (gsi);
+        }
+    }
+
+  /* This sequence will collect all the top-level statements in the body of
+     the data region we are about to construct.  */
+  gimple_seq region_body = NULL;
+  /* This sequence will collect consecutive statements to be put into a
+     gang-single region.  */
+  gimple_seq gang_single_seq = NULL;
+  /* Flag recording whether the gang_single_seq only contains copies to
+     local variables.  These may be loop setup code that should not be
+     separated from the loop.  */
+  bool only_simple_assignments = true;
+
+  /* Iterate over the statements in the kernels region's body.  */
+  gimple_stmt_iterator gsi, gsi_n;
+  for (gsi = gsi_start (body_sequence); !gsi_end_p (gsi); gsi = gsi_n)
+    {
+      /* Advance the iterator here because otherwise it would be invalidated
+         by moving statements below.  */
+      gsi_n = gsi;
+      gsi_next (&gsi_n);
+
+      gimple *stmt = gsi_stmt (gsi);
+      gimple *omp_for = top_level_omp_for_in_stmt (stmt);
+      if (omp_for != NULL)
+        {
+          /* This is an OMP for statement, put it into a parallel region.
+             But first, construct a gang-single region containing any
+             complex sequential statements we may have seen.  */
+          if (gang_single_seq != NULL && !only_simple_assignments)
+            {
+              gimple *single_region
+                = make_gang_single_region (loc, gang_single_seq,
+                                           kernels_clauses);
+              gimple_seq_add_stmt (&region_body, single_region);
+            }
+          else if (gang_single_seq != NULL && only_simple_assignments)
+            {
+              /* There is a sequence of sequential statements preceding this
+                 loop, but they are all simple assignments.  This is
+                 probably setup code for the loop; in particular, Fortran DO
+                 loops are preceded by code to copy the loop limit variable
+                 to a temporary.  Group this code together with the loop
+                 itself.  */
+              gimple_seq_add_stmt (&gang_single_seq, stmt);
+              stmt = gimple_build_bind (NULL, gang_single_seq,
+                                        make_node (BLOCK));
+            }
+          gang_single_seq = NULL;
+          only_simple_assignments = true;
+
+          gimple *parallel_region
+            = make_gang_parallel_loop_region (omp_for, stmt,
+                                              num_gangs_clause,
+                                              kernels_clauses);
+          gimple_seq_add_stmt (&region_body, parallel_region);
+        }
+      else
+        {
+          /* This is not an OMP for statement, so it will be put into a
+             gang-single region.  */
+          gimple_seq_add_stmt (&gang_single_seq, stmt);
+          /* Is this a simple assignment? We call it simple if it is an
+             assignment to an artificial local variable.  This captures
+             Fortran loop setup code computing loop bounds and offsets.  */
+          bool is_simple_assignment
+            = (gimple_code (stmt) == GIMPLE_ASSIGN
+                && TREE_CODE (gimple_assign_lhs (stmt)) == VAR_DECL
+                && DECL_ARTIFICIAL (gimple_assign_lhs (stmt)));
+          if (!is_simple_assignment)
+            only_simple_assignments = false;
+        }
+    }
+
+  /* If we did not emit a new region, and are not going to emit one now
+     (that is, the original region was empty), prepare to emit a dummy so as
+     to preserve the original construct, which other processing (at least
+     test cases) depend on.  */
+  if (region_body == NULL && gang_single_seq == NULL)
+    {
+      gimple *stmt = gimple_build_nop ();
+      gimple_set_location (stmt, loc);
+      gimple_seq_add_stmt (&gang_single_seq, stmt);
+    }
+
+  /* Gather up any remaining gang-single statements.  */
+  if (gang_single_seq != NULL)
+    {
+      gimple *single_region
+        = make_gang_single_region (loc, gang_single_seq, kernels_clauses);
+      gimple_seq_add_stmt (&region_body, single_region);
+    }
+
+  tree kernels_locals = gimple_bind_vars (as_a <gbind *> (kernels_body));
+  gimple *body = gimple_build_bind (kernels_locals, region_body,
+                                    make_node (BLOCK));
+
+  /* If we found variables declared in nested scopes, build a data region to
+     map them to the device.  */
+  body = maybe_build_inner_data_region (loc, body, inner_bind_vars,
+                                        inner_cleanup);
+
+  return body;
+}
 
 /* Transform KERNELS_REGION, which is an OpenACC kernels region, into a data
-   region containing the original kernels region.  */
+   region containing the original kernels region's body cut up into a
+   sequence of parallel regions.  */
 
 static gimple *
 transform_kernels_region (gimple *kernels_region)
 {
   gcc_checking_assert (gimple_omp_target_kind (kernels_region)
                         == GF_OMP_TARGET_KIND_OACC_KERNELS);
+  location_t loc = gimple_location (kernels_region);
 
   /* Collect the kernels region's data clauses and create the new data
      region with those clauses.  */
@@ -130,26 +663,17 @@ transform_kernels_region (gimple *kernels_region)
   gimple *data_region
     = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_DATA_KERNELS,
                                data_clauses);
-  gimple_set_location (data_region, gimple_location (kernels_region));
-
-  /* For now, just construct a new parallel region inside the data region.  */
-  gimple *inner_region
-    = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_PARALLEL,
-                               kernels_clauses);
-  gimple_set_location (inner_region, gimple_location (kernels_region));
-  gimple_omp_set_body (inner_region, gimple_omp_body (kernels_region));
+  gimple_set_location (data_region, loc);
 
-  gbind *bind = gimple_build_bind (NULL, NULL, NULL);
-  gimple_bind_add_stmt (bind, inner_region);
+  /* Transform the body of the kernels region into a sequence of parallel
+     regions.  */
+  gimple *body = decompose_kernels_region_body (kernels_region,
+                                                kernels_clauses);
 
   /* Put the transformed pieces together.  The entire body of the region is
      wrapped in a try-finally statement that calls __builtin_GOACC_data_end
      for cleanup.  */
-  tree data_end_fn = builtin_decl_explicit (BUILT_IN_GOACC_DATA_END);
-  gimple *call = gimple_build_call (data_end_fn, 0);
-  gimple_seq cleanup = NULL;
-  gimple_seq_add_stmt (&cleanup, call);
-  gimple *try_stmt = gimple_build_try (bind, cleanup, GIMPLE_TRY_FINALLY);
+  gimple *try_stmt = make_data_region_try_statement (loc, body);
   gimple_omp_set_body (data_region, try_stmt);
 
   return data_region;
diff --git a/gcc/testsuite/ChangeLog.openacc b/gcc/testsuite/ChangeLog.openacc
index 887011e7d1f..7000b099bad 100644
--- a/gcc/testsuite/ChangeLog.openacc
+++ b/gcc/testsuite/ChangeLog.openacc
@@ -1,3 +1,10 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+	    Gergö Barany  <gergo@codesourcery.com>
+
+	* c-c++-common/goacc/kernels-conversion.c: Test for a gang-single
+	region.
+	* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
+
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 	    Gergö Barany  <gergo@codesourcery.com>
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
index c75db375f26..ec5db0201f3 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
@@ -18,6 +18,7 @@ main (void)
       sum += a[i];
 
     sum++;
+    a[0]++;
 
     #pragma acc loop
     for (i = 0; i < N; ++i)
@@ -27,10 +28,14 @@ main (void)
   return 0;
 }
 
-/* Check that the kernels region is split into a data region and an enclosed
-   parallel region.  */ 
+/* Check that the kernels region is split into a data region and enclosed
+   parallel regions.  */ 
 /* { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 "convert_oacc_kernels" } } */
-/* { dg-final { scan-tree-dump-times "oacc_parallel" 1 "convert_oacc_kernels" } } */
+
+/* The two loop regions are parallelized, the sequential part in between is
+   made gang-single.  */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 2 "convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 1 "convert_oacc_kernels" } } */
 
 /* Check that the original kernels region is removed.  */
 /* { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
index 8c663302a6f..4aba2b1beeb 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
@@ -15,6 +15,7 @@ program main
   end do
 
   sum = sum + 1
+  a(1) = a(1) + 1
 
   !$acc loop
   do i = 1, N
@@ -24,10 +25,14 @@ program main
   !$acc end kernels
 end program main
 
-! Check that the kernels region is split into a data region and an enclosed
-! parallel region.
+! Check that the kernels region is split into a data region and enclosed
+! parallel regions.
 ! { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 "convert_oacc_kernels" } }
-! { dg-final { scan-tree-dump-times "oacc_parallel" 1 "convert_oacc_kernels" } }
+
+! The two loop regions are parallelized, the sequential part in between is
+! made gang-single.
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 2 "convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 1 "convert_oacc_kernels" } }
 
 ! Check that the original kernels region is removed.
 ! { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } }
-- 
2.17.1


[-- Attachment #6: 0005-Handle-conditional-execution-of-loops-in-OpenACC-ker.patch --]
[-- Type: text/x-diff, Size: 16850 bytes --]

From d66b17791ae1fb458834f0dd5a326acc7a5cd51a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gerg=C3=B6=20Barany?= <gergo@codesourcery.com>
Date: Wed, 23 Jan 2019 14:32:57 -0800
Subject: [PATCH 5/9] Handle conditional execution of loops in OpenACC kernels
 regions

Any OpenACC loop controlled by an if statement or a non-OpenACC loop must be
executed in a gang-single region. Detecting such loops is not trivial as
OpenACC kernels expansion is done on GIMPLE but before computation of the
control flow graph. This patch adds an auxiliary analysis for determining
whether a statement is inside a conditionally executed region (relative to
the kernels region's entry).

    gcc/
    * omp-oacc-kernels.c (control_flow_regions): New class.
    (control_flow_regions::control_flow_regions): New constructor.
    (control_flow_regions::is_unconditional_oacc_for_loop): New method.
    (control_flow_regions::find_rep): Likewise.
    (control_flow_regions::union_reps): Likewise.
    (control_flow_regions::compute_regions): Likewise.
    (decompose_kernels_region_body): Use test for conditional execution.

    gcc/testsuite/
    * c-c++-common/goacc/kernels-conversion.c: Add test for conditionally
    executed code.
    * gfortran.dg/goacc/kernels-conversion.f95: Likewise.
---
 gcc/ChangeLog.openacc                         |  11 +
 gcc/omp-oacc-kernels.c                        | 216 +++++++++++++++++-
 gcc/testsuite/ChangeLog.openacc               |   7 +
 .../c-c++-common/goacc/kernels-conversion.c   |  20 +-
 .../gfortran.dg/goacc/kernels-conversion.f95  |  21 +-
 5 files changed, 263 insertions(+), 12 deletions(-)

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index 6fa92ee2731..75eea0b1ef3 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,14 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+	    Gergö Barany  <gergo@codesourcery.com>
+
+	* omp-oacc-kernels.c (control_flow_regions): New class.
+	(control_flow_regions::control_flow_regions): New constructor.
+	(control_flow_regions::is_unconditional_oacc_for_loop): New method.
+	(control_flow_regions::find_rep): Likewise.
+	(control_flow_regions::union_reps): Likewise.
+	(control_flow_regions::compute_regions): Likewise.
+	(decompose_kernels_region_body): Use test for conditional execution.
+
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 	    Gergö Barany  <gergo@codesourcery.com>
 
diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c
index 6e083666a17..80a82fa6e1d 100644
--- a/gcc/omp-oacc-kernels.c
+++ b/gcc/omp-oacc-kernels.c
@@ -385,6 +385,208 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
   return body;
 }
 
+/* Auxiliary analysis of the body of a kernels region, to determine for each
+   OpenACC loop whether it is control-dependent (i.e., not necessarily
+   executed every time the kernels region is entered) or not.
+   We say that a loop is control-dependent if there is some cond, switch, or
+   goto statement that jumps over it, forwards or backwards.  For example,
+   if the loop is controlled by an if statement, then a jump to the true
+   block, the false block, or from one of those blocks to the control flow
+   join point will necessarily jump over the loop.
+   This analysis implements an ad-hoc union-find data structure classifying
+   statements into "control-flow regions" as follows: Most statements are in
+   the same region as their predecessor, except that each OpenACC loop is in
+   a region of its own, and each OpenACC loop's successor starts a new
+   region.  We then unite the regions of any statements linked by jumps,
+   placing any cond, switch, or goto statement in the same region as its
+   target label(s).
+   In the end, control dependence of OpenACC loops can be determined by
+   comparing their immediate predecessor and successor statements' regions.
+   A jump crosses the loop if and only if the predecessor and successor are
+   in the same region.  (If there is no predecessor or successor, the loop
+   is executed unconditionally.)
+   The methods in this class identify statements by their index in the
+   kernels region's body.  */
+
+class control_flow_regions
+{
+  public:
+    /* Initialize an instance and pre-compute the control-flow region
+       information for the statement sequence SEQ.  */
+    control_flow_regions (gimple_seq seq);
+
+    /* Return true if the STMT with the given index IDX in the analyzed
+       statement sequence is an unconditionally executed OpenACC loop.  */
+    bool is_unconditional_oacc_for_loop (gimple *stmt, size_t idx);
+
+  private:
+    /* Find the region representative for the statement identified by index
+       STMT_IDX.  */
+    size_t find_rep (size_t stmt_idx);
+
+    /* Union the regions containing the statements represented by
+       representatives A and B.  */
+    void union_reps (size_t a, size_t b);
+
+    /* Helper for the constructor.  Performs the actual computation of the
+       control-flow regions in the statement sequence SEQ.  */
+    void compute_regions (gimple_seq seq);
+
+    /* The mapping from statement indices to region representatives.  */
+    vec <size_t> representatives;
+
+    /* A cache mapping statement indices to a flag indicating whether the
+       statement is a top level OpenACC for loop.  */
+    vec <bool> omp_for_loops;
+};
+
+control_flow_regions::control_flow_regions (gimple_seq seq)
+{
+  representatives.create (1);
+  omp_for_loops.create (1);
+  compute_regions (seq);
+}
+
+bool
+control_flow_regions::is_unconditional_oacc_for_loop (gimple *stmt, size_t idx)
+{
+  if (top_level_omp_for_in_stmt (stmt) == NULL)
+    /* Not an OpenACC for loop.  */
+    return false;
+  if (idx == 0 || idx == representatives.length () - 1)
+    /* The first or last statement in the kernels region.  This means that
+       there is no room before or after it for a jump or a label.  Thus
+       there cannot be a jump across it, so it is unconditional.  */
+    return true;
+  /* Otherwise, the loop is unconditional if the statements before and after
+     it are in different control flow regions.  Scan forward and backward,
+     skipping over neighboring OpenACC for loops, to find these preceding
+     statements.  */
+  size_t prev_index = idx - 1;
+  while (prev_index > 0 && omp_for_loops [prev_index] == true)
+    prev_index--;
+  /* If all preceding statements are also OpenACC loops, all of these are
+     unconditional.  */
+  if (prev_index == 0)
+    return true;
+  size_t succ_index = idx + 1;
+  while (succ_index < omp_for_loops.length ()
+         && omp_for_loops [succ_index] == true)
+    succ_index++;
+  /* If all following statements are also OpenACC loops, all of these are
+     unconditional.  */
+  if (succ_index == omp_for_loops.length ())
+    return true;
+  return (find_rep (prev_index) != find_rep (succ_index));
+}
+
+size_t
+control_flow_regions::find_rep (size_t stmt_idx)
+{
+  size_t rep = stmt_idx, aux = stmt_idx;
+  /* Find the root representative of this statement.  */
+  while (representatives[rep] != rep)
+    rep = representatives[rep];
+  /* Compress the path from the original statement to the representative.  */
+  while (representatives[aux] != rep)
+    {
+      size_t tmp = representatives[aux];
+      representatives[aux] = rep;
+      aux = tmp;
+    }
+  return rep;
+}
+
+void
+control_flow_regions::union_reps (size_t a, size_t b)
+{
+  a = find_rep (a);
+  b = find_rep (b);
+  representatives[b] = a;
+}
+
+void
+control_flow_regions::compute_regions (gimple_seq seq)
+{
+  hash_map <gimple *, size_t> control_flow_reps;
+  hash_map <tree, size_t> label_reps;
+  size_t current_region = 0, idx = 0;
+
+  /* In a first pass, assign an initial region to each statement.  Except in
+     the case of OpenACC loops, each statement simply gets the same region
+     representative as its predecessor.  */
+  for (gimple_stmt_iterator gsi = gsi_start (seq);
+       !gsi_end_p (gsi);
+       gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+      gimple *omp_for = top_level_omp_for_in_stmt (stmt);
+      omp_for_loops.safe_push (omp_for != NULL);
+      if (omp_for != NULL)
+        {
+          /* Assign a new region to this loop and to its successor.  */
+          current_region = idx;
+          representatives.safe_push (current_region);
+          current_region++;
+        }
+      else
+        {
+          representatives.safe_push (current_region);
+          /* Remember any jumps and labels for the second pass below.  */
+          if (gimple_code (stmt) == GIMPLE_COND
+              || gimple_code (stmt) == GIMPLE_SWITCH
+              || gimple_code (stmt) == GIMPLE_GOTO)
+            control_flow_reps.put (stmt, current_region);
+          else if (gimple_code (stmt) == GIMPLE_LABEL)
+            label_reps.put (gimple_label_label (as_a <glabel *> (stmt)),
+                            current_region);
+        }
+      idx++;
+    }
+  gcc_assert (representatives.length () == omp_for_loops.length ());
+
+  /* Revisit all the control flow statements and union the region of each
+     cond, switch, or goto statement with the target labels' regions.  */
+  for (hash_map <gimple *, size_t>::iterator it = control_flow_reps.begin ();
+       it != control_flow_reps.end ();
+       ++it)
+    {
+      gimple *stmt = (*it).first;
+      size_t stmt_rep = (*it).second;
+      switch (gimple_code (stmt))
+        {
+          tree label;
+          unsigned int n;
+
+        case GIMPLE_COND:
+          label = gimple_cond_true_label (as_a <gcond *> (stmt));
+          union_reps (stmt_rep, *label_reps.get (label));
+          label = gimple_cond_false_label (as_a <gcond *> (stmt));
+          union_reps (stmt_rep, *label_reps.get (label));
+          break;
+
+        case GIMPLE_SWITCH:
+          n = gimple_switch_num_labels (as_a <gswitch *> (stmt));
+          for (unsigned int i = 0; i < n; i++)
+            {
+              tree switch_case
+                = gimple_switch_label (as_a <gswitch *> (stmt), i);
+              label = CASE_LABEL (switch_case);
+              union_reps (stmt_rep, *label_reps.get (label));
+            }
+          break;
+
+        case GIMPLE_GOTO:
+          label = gimple_goto_dest (stmt);
+          union_reps (stmt_rep, *label_reps.get (label));
+          break;
+
+        default:
+          gcc_unreachable ();
+        }
+    }
+}
+
 /* Decompose the body of the KERNELS_REGION, which was originally annotated
    with the KERNELS_CLAUSES, into a series of parallel regions.  */
 
@@ -486,9 +688,14 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
      separated from the loop.  */
   bool only_simple_assignments = true;
 
+  /* Precompute the control flow region information to determine whether an
+     OpenACC loop is executed conditionally or unconditionally.  */
+  control_flow_regions cf_regions (body_sequence);
+
   /* Iterate over the statements in the kernels region's body.  */
+  size_t idx = 0;
   gimple_stmt_iterator gsi, gsi_n;
-  for (gsi = gsi_start (body_sequence); !gsi_end_p (gsi); gsi = gsi_n)
+  for (gsi = gsi_start (body_sequence); !gsi_end_p (gsi); gsi = gsi_n, idx++)
     {
       /* Advance the iterator here because otherwise it would be invalidated
          by moving statements below.  */
@@ -497,7 +704,8 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
 
       gimple *stmt = gsi_stmt (gsi);
       gimple *omp_for = top_level_omp_for_in_stmt (stmt);
-      if (omp_for != NULL)
+      if (omp_for != NULL
+          && cf_regions.is_unconditional_oacc_for_loop (stmt, idx))
         {
           /* This is an OMP for statement, put it into a parallel region.
              But first, construct a gang-single region containing any
@@ -532,8 +740,8 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
         }
       else
         {
-          /* This is not an OMP for statement, so it will be put into a
-             gang-single region.  */
+          /* This is not an unconditional OMP for statement, so it will be
+             put into a gang-single region.  */
           gimple_seq_add_stmt (&gang_single_seq, stmt);
           /* Is this a simple assignment? We call it simple if it is an
              assignment to an artificial local variable.  This captures
diff --git a/gcc/testsuite/ChangeLog.openacc b/gcc/testsuite/ChangeLog.openacc
index 7000b099bad..7091f321ffd 100644
--- a/gcc/testsuite/ChangeLog.openacc
+++ b/gcc/testsuite/ChangeLog.openacc
@@ -1,3 +1,10 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+	    Gergö Barany  <gergo@codesourcery.com>
+
+	* c-c++-common/goacc/kernels-conversion.c: Add test for conditionally
+	executed code.
+	* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
+
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 	    Gergö Barany  <gergo@codesourcery.com>
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
index ec5db0201f3..ed4d6429c65 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
@@ -12,6 +12,7 @@ main (void)
   unsigned int sum = 1;
 
 #pragma acc kernels copyin(a[0:N]) copy(sum)
+  /* { dg-bogus "region contains gang partitoned code but is not gang partitioned" "gang partitioned" { xfail *-*-* } .-1 } */
   {
     #pragma acc loop
     for (i = 0; i < N; ++i)
@@ -23,6 +24,17 @@ main (void)
     #pragma acc loop
     for (i = 0; i < N; ++i)
       sum += a[i];
+
+    if (sum > 10)
+      { 
+        #pragma acc loop
+        for (i = 0; i < N; ++i)
+          sum += a[i];
+      }
+
+    #pragma acc loop
+    for (i = 0; i < N; ++i)
+      sum += a[i];
   }
 
   return 0;
@@ -32,10 +44,10 @@ main (void)
    parallel regions.  */ 
 /* { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 "convert_oacc_kernels" } } */
 
-/* The two loop regions are parallelized, the sequential part in between is
-   made gang-single.  */
-/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 2 "convert_oacc_kernels" } } */
-/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 1 "convert_oacc_kernels" } } */
+/* The three unconditional loop regions are parallelized, the sequential
+   part in between and the conditional loop are made gang-single.  */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 3 "convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 2 "convert_oacc_kernels" } } */
 
 /* Check that the original kernels region is removed.  */
 /* { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
index 4aba2b1beeb..f89e46b4d3b 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
@@ -22,6 +22,19 @@ program main
     sum = sum + a(i)
   end do
 
+  if (sum .gt. 10) then
+    !$acc loop
+    do i = 1, N
+      sum = sum + a(i)
+    end do
+  end if
+
+  !$acc loop
+  ! { dg-bogus "region contains gang partitoned code but is not gang partitioned" "gang partitioned" { xfail *-*-* } .-1 }
+  do i = 1, N
+    sum = sum + a(i)
+  end do
+
   !$acc end kernels
 end program main
 
@@ -29,10 +42,10 @@ end program main
 ! parallel regions.
 ! { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 "convert_oacc_kernels" } }
 
-! The two loop regions are parallelized, the sequential part in between is
-! made gang-single.
-! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 2 "convert_oacc_kernels" } }
-! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 1 "convert_oacc_kernels" } }
+! The three unconditional loop regions are parallelized, the sequential part
+! in between and the conditional loop are made gang-single.
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 3 "convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 2 "convert_oacc_kernels" } }
 
 ! Check that the original kernels region is removed.
 ! { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } }
-- 
2.17.1


[-- Attachment #7: 0006-Adjust-parallelism-of-loops-in-gang-single-parts-of-.patch --]
[-- Type: text/x-diff, Size: 26194 bytes --]

From c7713be32fc5eace2b1e9c20447da849d23f6076 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gerg=C3=B6=20Barany?= <gergo@codesourcery.com>
Date: Wed, 23 Jan 2019 22:11:11 -0800
Subject: [PATCH 6/9] Adjust parallelism of loops in gang-single parts of
 OpenACC kernels regions

Loops in gang-single parts of kernels regions cannot be executed in
gang-redundant mode. If the user specified gang clauses on such loops, emit
an error and remove these clauses. Adjust automatic partitioning to exclude
gang partitioning in gang-single regions.

    gcc/
    * omp-oacc-kernels.c (add_parent_or_loop_num_clause): New function.
    (adjust_nested_loop_clauses): Likewise.
    (transform_kernels_loop_clauses, make_gang_parallel_loop_region):
    Add worker and vector clause parameters, emit error on illegal
    nesting.
    (visit_loops_in_gang_single_region): Emit warning on conditionally
    executed code with a gang clause.
    (make_loops_gang_single): New function.
    (decompose_kernels_region_body): Separate out gang/worker/vector clauses
    for separate handling; add call to make_loops_gang_single.
    * omp-offload.c (oacc_loop_auto_partitions): Add and propagate
    is_oacc_gang_single parameter.
    (oacc_loop_partition): Likewise.
    (execute_oacc_device_lower): Adjust call to oacc_loop_partition.
---
 gcc/ChangeLog.openacc  |  18 ++
 gcc/omp-oacc-kernels.c | 375 +++++++++++++++++++++++++++++++++++++----
 gcc/omp-offload.c      |  22 ++-
 3 files changed, 377 insertions(+), 38 deletions(-)

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index 75eea0b1ef3..f3fcbc88831 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,21 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+	    Gergö Barany  <gergo@codesourcery.com>
+
+	* omp-oacc-kernels.c (add_parent_or_loop_num_clause): New function.
+	(adjust_nested_loop_clauses): Likewise.
+	(transform_kernels_loop_clauses, make_gang_parallel_loop_region):
+	Add worker and vector clause parameters, emit error on illegal
+	nesting.
+	(visit_loops_in_gang_single_region): Emit warning on conditionally
+	executed code with a gang clause.
+	(make_loops_gang_single): New function.
+	(decompose_kernels_region_body): Separate out gang/worker/vector clauses
+	for separate handling; add call to make_loops_gang_single.
+	* omp-offload.c (oacc_loop_auto_partitions): Add and propagate
+	is_oacc_gang_single parameter.
+	(oacc_loop_partition): Likewise.
+	(execute_oacc_device_lower): Adjust call to oacc_loop_partition.
+
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 	    Gergö Barany  <gergo@codesourcery.com>
 
diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c
index 80a82fa6e1d..c334502972c 100644
--- a/gcc/omp-oacc-kernels.c
+++ b/gcc/omp-oacc-kernels.c
@@ -59,7 +59,14 @@ along with GCC; see the file COPYING3.  If not see
    - Any sequences of other code (non-loops, non-OpenACC loops) are wrapped
      in new "gang-single" parallel regions: Worker/vector annotations are
      copied from the original kernels region if present, but num_gangs is
-     explicitly set to 1.  */
+     explicitly set to 1.
+   - Both points above only apply at the topmost level in the region, i.e.,
+     the transformation does not introduce new parallel regions inside
+     nested statement bodies.  In particular, this means that a
+     gang-parallelizable loop inside an if statement is "gang-serialized" by
+     the transformation.
+     The transformation visits loops inside such new gang-single-regions and
+     removes and warns about any gang annotations.  */
 
 /* Helper function for decompose_kernels_region_body.  If STMT contains a
    "top-level" OMP_FOR statement, returns a pointer to that statement;
@@ -122,6 +129,67 @@ top_level_omp_for_in_stmt (gimple *stmt)
   return NULL;
 }
 
+/* Helper function for make_loops_gang_single for walking the tree.  If the
+   statement indicated by GSI_P is an OpenACC for loop with a gang clause,
+   issue a warning and remove the clause.  */
+
+static tree
+visit_loops_in_gang_single_region (gimple_stmt_iterator *gsi_p,
+                                   bool *handled_ops_p,
+                                   struct walk_stmt_info *)
+{
+  gimple *stmt = gsi_stmt (*gsi_p);
+  tree clauses = NULL, prev_clause = NULL;
+  *handled_ops_p = false;
+
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_OMP_FOR:
+      clauses = gimple_omp_for_clauses (stmt);
+      for (tree clause = clauses; clause; clause = OMP_CLAUSE_CHAIN (clause))
+        {
+          if (OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_GANG)
+            {
+              /* It makes no sense to have a gang clause in a gang-single
+                 region, so remove it and warn.  */
+              warning_at (gimple_location (stmt), 0,
+                          "conditionally executed loop in kernels region"
+                          " will be executed in a single gang;"
+                          " ignoring %<gang%> clause");
+              if (prev_clause != NULL)
+                OMP_CLAUSE_CHAIN (prev_clause) = OMP_CLAUSE_CHAIN (clause);
+              else
+                clauses = OMP_CLAUSE_CHAIN (clause);
+
+              break;
+            }
+          prev_clause = clause;
+        }
+      gimple_omp_for_set_clauses (stmt, clauses);
+      /* No need to recurse into nested statements; no loop nested inside
+         this loop can be gang-partitioned.  */
+      *handled_ops_p = true;
+      break;
+
+    default:
+      break;
+    }
+
+  return NULL;
+}
+
+/* Visit all nested OpenACC loops in the statement indicated by GSI.  This
+   statement is expected to be inside a gang-single region.  Issue a warning
+   for any loops inside it that have gang clauses and remove the clauses.  */
+
+static void
+make_loops_gang_single (gimple_stmt_iterator gsi)
+{
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  walk_gimple_stmt (&gsi, visit_loops_in_gang_single_region, NULL, &wi);
+}
+
 /* Construct a "gang-single" OpenACC parallel region at LOC containing the
    STMTS.  The newly created region is annotated with CLAUSES, which must
    not contain a num_gangs clause, and an additional "num_gangs(1)" clause
@@ -150,45 +218,248 @@ make_gang_single_region (location_t loc, gimple_seq stmts, tree clauses)
   return single_region;
 }
 
+/* Helper function for make_gang_parallel_loop_region.  Adds a num_gangs
+   (num_workers, vector_length) clause to the given CLAUSES, either the one
+   from the parent region (PARENT_CLAUSE) or a new one based on the loop's
+   own LOOP_CLAUSE ("gang(num: N)" or similar for workers or vectors) with
+   the given CLAUSE_CODE.  Does nothing if neither PARENT_CLAUSE nor
+   LOOP_CLAUSE exist.  Returns the new clauses.  */
+
+static tree
+add_parent_or_loop_num_clause (tree parent_clause, tree loop_clause,
+                               omp_clause_code clause_code, tree clauses)
+{
+  if (parent_clause != NULL)
+    {
+      tree num_clause = unshare_expr (parent_clause);
+      OMP_CLAUSE_CHAIN (num_clause) = clauses;
+      clauses = num_clause;
+    }
+  else if (loop_clause != NULL)
+    {
+      /* The kernels region does not have a "num_gangs" clause, but the loop
+         itself had a "gang(num: N)" clause.  Honor it by adding a
+         "num_gangs(N)" clause on the parallel region.  */
+      tree num = OMP_CLAUSE_OPERAND (loop_clause, 0);
+      tree new_num_clause
+        = build_omp_clause (OMP_CLAUSE_LOCATION (loop_clause), clause_code);
+      OMP_CLAUSE_OPERAND (new_num_clause, 0) = num;
+      OMP_CLAUSE_CHAIN (new_num_clause) = clauses;
+      clauses = new_num_clause;
+    }
+  return clauses;
+}
+
+/* Helper for make_gang_parallel_loop_region, looking for "worker(num: N)"
+   or "vector(length: N)" clauses in nested loops.  Removes the numeric
+   argument, transferring it to the enclosing parallel region (via
+   WI->INFO).  If numeric arguments within the same loop nest conflict,
+   emits a warning.
+
+   This function also decides whether to add an auto clause on each of these
+   nested loops.  It adds an auto clause unless there is already an
+   independent/seq/auto clause or a gang/worker/vector annotation.  */
+
+static tree
+adjust_nested_loop_clauses (gimple_stmt_iterator *gsi_p, bool *,
+                            struct walk_stmt_info *wi)
+{
+  tree **clauses = (tree **) wi->info;
+  tree *gang_num_clause = clauses[GOMP_DIM_GANG];
+  tree *worker_num_clause = clauses[GOMP_DIM_WORKER];
+  tree *vector_length_clause = clauses[GOMP_DIM_VECTOR];
+  gimple *stmt = gsi_stmt (*gsi_p);
+
+  if (gimple_code (stmt) == GIMPLE_OMP_FOR)
+    {
+      bool add_auto_clause = true;
+      tree loop_clauses = gimple_omp_for_clauses (stmt);
+      tree loop_clause = loop_clauses;
+      for (; loop_clause; loop_clause = OMP_CLAUSE_CHAIN (loop_clause))
+        {
+          tree *outer_clause_ptr = NULL;
+          switch (OMP_CLAUSE_CODE (loop_clause))
+            {
+              case OMP_CLAUSE_GANG:
+                outer_clause_ptr = gang_num_clause;
+                break;
+              case OMP_CLAUSE_WORKER:
+                outer_clause_ptr = worker_num_clause;
+                break;
+              case OMP_CLAUSE_VECTOR:
+                outer_clause_ptr = vector_length_clause;
+                break;
+              case OMP_CLAUSE_INDEPENDENT:
+              case OMP_CLAUSE_SEQ:
+              case OMP_CLAUSE_AUTO:
+                add_auto_clause = false;
+              default:
+                break;
+            }
+          if (outer_clause_ptr != NULL)
+            {
+              if (OMP_CLAUSE_OPERAND (loop_clause, 0) != NULL
+                  && *outer_clause_ptr == NULL)
+                {
+                  /* Transfer the clause to the enclosing parallel region
+                     and remove the numerical argument from the loop.  */
+                  *outer_clause_ptr = unshare_expr (loop_clause);
+                  OMP_CLAUSE_OPERAND (loop_clause, 0) = NULL;
+                }
+              else if (OMP_CLAUSE_OPERAND (loop_clause, 0) != NULL &&
+                       OMP_CLAUSE_OPERAND (*outer_clause_ptr, 0) != NULL)
+                {
+                  /* See if both of these are the same constant.  If they
+                     aren't, emit a warning.  */
+                  tree old_op = OMP_CLAUSE_OPERAND (*outer_clause_ptr, 0);
+                  tree new_op = OMP_CLAUSE_OPERAND (loop_clause, 0);
+                  if (!(cst_and_fits_in_hwi (old_op) &&
+                        cst_and_fits_in_hwi (new_op) &&
+                        int_cst_value (old_op) == int_cst_value (new_op)))
+                    {
+                      const char *clause_name
+                        = omp_clause_code_name[OMP_CLAUSE_CODE (loop_clause)];
+                      error_at (gimple_location (stmt),
+                                "cannot honor conflicting %qs annotation",
+                                clause_name);
+                      inform (OMP_CLAUSE_LOCATION (*outer_clause_ptr),
+                              "location of the previous annotation "
+                              "in the same loop nest");
+                    }
+                  OMP_CLAUSE_OPERAND (loop_clause, 0) = NULL;
+                }
+            }
+        }
+      if (add_auto_clause)
+        {
+          tree auto_clause
+            = build_omp_clause (gimple_location (stmt), OMP_CLAUSE_AUTO);
+          OMP_CLAUSE_CHAIN (auto_clause) = loop_clauses;
+          gimple_omp_for_set_clauses (stmt, auto_clause);
+        }
+    }
+
+  return NULL;
+}
+
 /* Helper for make_region_loop_nest.  Transform OpenACC 'kernels'/'loop'
    construct clauses into OpenACC 'parallel'/'loop' construct ones.  */
 
 static tree
 transform_kernels_loop_clauses (gimple *omp_for,
 				tree num_gangs_clause,
+				tree num_workers_clause,
+				tree vector_length_clause,
 				tree clauses)
 {
   /* If this loop in a kernels region does not have an explicit
      "independent", "seq", or "auto" clause, we must give it an explicit
-     "auto" clause. */
+     "auto" clause.
+     We also check for "gang(num: N)" clauses.  These must not appear in
+     kernels regions that have their own "num_gangs" clause.  Otherwise, they
+     must be converted and put on the region; similarly for workers and
+     vectors.  */
   bool add_auto_clause = true;
+  tree loop_gang_clause = NULL, loop_worker_clause = NULL,
+       loop_vector_clause = NULL;
   tree loop_clauses = gimple_omp_for_clauses (omp_for);
-  for (tree c = loop_clauses; c; c = OMP_CLAUSE_CHAIN (c))
+  for (tree loop_clause = loop_clauses;
+       loop_clause;
+       loop_clause = OMP_CLAUSE_CHAIN (loop_clause))
     {
-      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_AUTO
-          || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_INDEPENDENT
-          || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SEQ)
-        {
-          add_auto_clause = false;
-          break;
+      /* Look for gang, worker, and vector clauses.  */
+      bool found_num_clause = false;
+      tree *clause_ptr, clause_to_check;
+      switch (OMP_CLAUSE_CODE (loop_clause))
+         {
+          case OMP_CLAUSE_GANG:
+            found_num_clause = true;
+            clause_ptr = &loop_gang_clause;
+            clause_to_check = num_gangs_clause;
+            break;
+          case OMP_CLAUSE_WORKER:
+            found_num_clause = true;
+            clause_ptr = &loop_worker_clause;
+            clause_to_check = num_workers_clause;
+            break;
+          case OMP_CLAUSE_VECTOR:
+            found_num_clause = true;
+            clause_ptr = &loop_vector_clause;
+            clause_to_check = vector_length_clause;
+            break;
+          case OMP_CLAUSE_INDEPENDENT:
+          case OMP_CLAUSE_SEQ:
+          case OMP_CLAUSE_AUTO:
+            add_auto_clause = false;
+          default:
+            break;
         }
-    }
+      if (found_num_clause && OMP_CLAUSE_OPERAND (loop_clause, 0) != NULL)
+        {
+          if (clause_to_check)
+            {
+              const char *clause_name
+                = omp_clause_code_name[OMP_CLAUSE_CODE (loop_clause)];
+              const char *parent_clause_name
+                = omp_clause_code_name[OMP_CLAUSE_CODE (clause_to_check)];
+              error_at (OMP_CLAUSE_LOCATION (loop_clause),
+                        "argument not permitted on %qs clause"
+                        " in OpenACC %<kernels%> region with a %qs clause",
+                        clause_name, parent_clause_name);
+              inform (OMP_CLAUSE_LOCATION (clause_to_check),
+                      "location of OpenACC %<kernels%> region");
+            }
+          /* Copy the gang(N)/worker(N)/vector(N) clause to the enclosing
+             parallel region.  */
+          *clause_ptr = unshare_expr (loop_clause);
+          OMP_CLAUSE_CHAIN (*clause_ptr) = NULL;
+          /* Leave a gang/worker/vector clause on the loop, but without a
+             numeric argument.  */
+          OMP_CLAUSE_OPERAND (loop_clause, 0) = NULL;
+         }
+     }
   if (add_auto_clause)
     {
       tree auto_clause = build_omp_clause (gimple_location (omp_for),
                                            OMP_CLAUSE_AUTO);
       OMP_CLAUSE_CHAIN (auto_clause) = loop_clauses;
-      gimple_omp_for_set_clauses (omp_for, auto_clause);
-    }
-
-  /* If the kernels region had a num_gangs clause, add that to this new
-     parallel region.  */
-  if (num_gangs_clause != NULL)
-    {
-      tree parallel_num_gangs_clause = unshare_expr (num_gangs_clause);
-      OMP_CLAUSE_CHAIN (parallel_num_gangs_clause) = clauses;
-      clauses = parallel_num_gangs_clause;
+      loop_clauses = auto_clause;
     }
+  gimple_omp_for_set_clauses (omp_for, loop_clauses);
+  /* We must also recurse into the loop; it might contain nested loops
+     having their own "worker(num: W)" or "vector(length: V)" annotations.
+     Turn these into worker/vector annotations on the parallel region.  */
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  tree *num_clauses[GOMP_DIM_MAX]
+    = { [GOMP_DIM_GANG] = &loop_gang_clause,
+        [GOMP_DIM_WORKER] = &loop_worker_clause,
+        [GOMP_DIM_VECTOR] = &loop_vector_clause };
+  wi.info = num_clauses;
+  gimple *body = gimple_omp_body (omp_for);
+  walk_gimple_seq (body, adjust_nested_loop_clauses, NULL, &wi);
+  /* Check if there were conflicting numbers of workers or vector lanes.  */
+  if (loop_gang_clause != NULL &&
+      OMP_CLAUSE_OPERAND (loop_gang_clause, 0) == NULL)
+    loop_gang_clause = NULL;
+  if (loop_worker_clause != NULL &&
+      OMP_CLAUSE_OPERAND (loop_worker_clause, 0) == NULL)
+    loop_worker_clause = NULL;
+  if (loop_vector_clause != NULL &&
+      OMP_CLAUSE_OPERAND (loop_vector_clause, 0) == NULL)
+    vector_length_clause = NULL;
+
+  /* If the kernels region had num_gangs, num_worker, vector_length clauses,
+     add these to this new parallel region.  */
+  clauses
+    = add_parent_or_loop_num_clause (num_gangs_clause, loop_gang_clause,
+				     OMP_CLAUSE_NUM_GANGS, clauses);
+  clauses
+    = add_parent_or_loop_num_clause (num_workers_clause, loop_worker_clause,
+				     OMP_CLAUSE_NUM_WORKERS, clauses);
+  clauses
+    = add_parent_or_loop_num_clause (vector_length_clause, loop_vector_clause,
+				     OMP_CLAUSE_VECTOR_LENGTH, clauses);
 
   return clauses;
 }
@@ -197,18 +468,33 @@ transform_kernels_loop_clauses (gimple *omp_for,
    STMT, which must be identical to, or a bind containing, the loop OMP_FOR
    with OpenACC loop annotations.
 
-   The newly created region is annotated with the optional NUM_GANGS_CLAUSE
-   as well as the other CLAUSES, which must not contain a num_gangs clause.  */
+   The NUM_GANGS_CLAUSE, NUM_WORKERS_CLAUSE, and VECTOR_LENGTH_CLAUSE are
+   optional clauses from the original kernels region and must not be
+   contained in the other CLAUSES. The newly created region is annotated
+   with the optional NUM_GANGS_CLAUSE as well as the other CLAUSES. If there
+   is no NUM_GANGS_CLAUSE but the loop has a "gang(num: N)" clause, that is
+   converted to a "num_gangs(N)" clause on the new region, and similarly for
+   workers and vectors.
+
+   The outermost loop gets an auto clause unless there already is an
+   independent/seq/auto clause or a gang/worker/vector annotation.  Nested
+   loops inside OMP_FOR are treated similarly by the
+   adjust_nested_loop_clauses function.  */
 
 static gimple *
 make_gang_parallel_loop_region (gimple *omp_for, gimple *stmt,
-                                tree num_gangs_clause, tree clauses)
+                                tree num_gangs_clause,
+                                tree num_workers_clause,
+                                tree vector_length_clause,
+                                tree clauses)
 {
   /* This correctly unshares the entire clause chain rooted here.  */
   clauses = unshare_expr (clauses);
 
   clauses = transform_kernels_loop_clauses (omp_for,
 					    num_gangs_clause,
+					    num_workers_clause,
+					    vector_length_clause,
 					    clauses);
 
   /* Now build the parallel region containing this loop.  */
@@ -596,23 +882,43 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
   location_t loc = gimple_location (kernels_region);
 
   /* The kernels clauses will be propagated to the child clauses unmodified,
-     except that that num_gangs clause will only be added to loop regions.
-     The other regions are "gang-single" and get an explicit num_gangs(1)
-     clause.  So separate out the num_gangs clause here.  */
-  tree num_gangs_clause = NULL, prev_clause = NULL;
+     except that the num_gangs, num_workers, and vector_length clauses will
+     only be added to loop regions.  The other regions are "gang-single" and
+     get an explicit num_gangs(1) clause.  So separate out the num_gangs,
+     num_workers, and vector_length clauses here.  */
+  tree num_gangs_clause = NULL, num_workers_clause = NULL,
+       vector_length_clause = NULL;
+  tree prev_clause = NULL, next_clause = NULL;
   tree parallel_clauses = kernels_clauses;
-  for (tree c = parallel_clauses; c; c = OMP_CLAUSE_CHAIN (c))
+  for (tree c = parallel_clauses; c; c = next_clause)
     {
-      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_NUM_GANGS)
+      /* Preserve this here, as we might NULL it later.  */
+      next_clause = OMP_CLAUSE_CHAIN (c);
+
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_NUM_GANGS
+          || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_NUM_WORKERS
+          || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_VECTOR_LENGTH)
         {
           /* Cut this clause out of the chain.  */
-          num_gangs_clause = c;
           if (prev_clause != NULL)
             OMP_CLAUSE_CHAIN (prev_clause) = OMP_CLAUSE_CHAIN (c);
           else
             kernels_clauses = OMP_CLAUSE_CHAIN (c);
-          OMP_CLAUSE_CHAIN (num_gangs_clause) = NULL;
-          break;
+          OMP_CLAUSE_CHAIN (c) = NULL;
+          switch (OMP_CLAUSE_CODE (c))
+            {
+              case OMP_CLAUSE_NUM_GANGS:
+                num_gangs_clause = c;
+                break;
+              case OMP_CLAUSE_NUM_WORKERS:
+                num_workers_clause = c;
+                break;
+              case OMP_CLAUSE_VECTOR_LENGTH:
+                vector_length_clause = c;
+                break;
+              default:
+                gcc_unreachable ();
+            }
         }
       else
         prev_clause = c;
@@ -735,6 +1041,8 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
           gimple *parallel_region
             = make_gang_parallel_loop_region (omp_for, stmt,
                                               num_gangs_clause,
+                                              num_workers_clause,
+                                              vector_length_clause,
                                               kernels_clauses);
           gimple_seq_add_stmt (&region_body, parallel_region);
         }
@@ -752,6 +1060,9 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
                 && DECL_ARTIFICIAL (gimple_assign_lhs (stmt)));
           if (!is_simple_assignment)
             only_simple_assignments = false;
+          /* Remove and issue warnings about gang clauses on any OpenACC
+             loops nested inside this sequentially executed statement.  */
+          make_loops_gang_single (gsi);
         }
     }
 
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 2d265c22c3c..a8dc57a92c6 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1339,7 +1339,7 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask)
 
 static unsigned
 oacc_loop_auto_partitions (oacc_loop *loop, unsigned outer_mask,
-			   bool outer_assign)
+			   bool outer_assign, bool is_oacc_gang_single)
 {
   bool assign = (loop->flags & OLF_AUTO) && (loop->flags & OLF_INDEPENDENT);
   bool noisy = true;
@@ -1357,6 +1357,10 @@ oacc_loop_auto_partitions (oacc_loop *loop, unsigned outer_mask,
 	 non-innermost available level.  */
       unsigned this_mask = GOMP_DIM_MASK (GOMP_DIM_GANG);
 
+      /* Gang partitioning is not available in a gang-single region.  */
+      if (is_oacc_gang_single)
+        this_mask = GOMP_DIM_MASK (GOMP_DIM_WORKER);
+
       /* Orphan reductions cannot have gang partitioning.  */
       if ((loop->flags & OLF_REDUCTION)
 	  && oacc_get_fn_attrib (current_function_decl)
@@ -1394,7 +1398,8 @@ oacc_loop_auto_partitions (oacc_loop *loop, unsigned outer_mask,
     {
       unsigned tmp_mask = outer_mask | loop->mask | loop->e_mask;
       loop->inner = oacc_loop_auto_partitions (loop->child, tmp_mask,
-					       outer_assign | assign);
+					       outer_assign | assign,
+					       is_oacc_gang_single);
     }
 
   if (assign && (!loop->mask || (tiling && !loop->e_mask) || !outer_assign))
@@ -1455,7 +1460,8 @@ oacc_loop_auto_partitions (oacc_loop *loop, unsigned outer_mask,
 
   if (loop->sibling)
     inner_mask |= oacc_loop_auto_partitions (loop->sibling,
-					     outer_mask, outer_assign);
+					     outer_mask, outer_assign,
+					     is_oacc_gang_single);
 
   inner_mask |= loop->inner | loop->mask | loop->e_mask;
 
@@ -1466,14 +1472,16 @@ oacc_loop_auto_partitions (oacc_loop *loop, unsigned outer_mask,
    axes.  Return mask of partitioning.  */
 
 static unsigned
-oacc_loop_partition (oacc_loop *loop, unsigned outer_mask)
+oacc_loop_partition (oacc_loop *loop, unsigned outer_mask,
+                     bool is_oacc_gang_single)
 {
   unsigned mask_all = oacc_loop_fixed_partitions (loop, outer_mask);
 
   if (mask_all & GOMP_DIM_MASK (GOMP_DIM_MAX))
     {
       mask_all ^= GOMP_DIM_MASK (GOMP_DIM_MAX);
-      mask_all |= oacc_loop_auto_partitions (loop, outer_mask, false);
+      mask_all |= oacc_loop_auto_partitions (loop, outer_mask, false,
+                                             is_oacc_gang_single);
     }
   return mask_all;
 }
@@ -1652,7 +1660,9 @@ execute_oacc_device_lower ()
     }
 
   unsigned outer_mask = fn_level >= 0 ? GOMP_DIM_MASK (fn_level) - 1 : 0;
-  unsigned used_mask = oacc_loop_partition (loops, outer_mask);
+  unsigned used_mask = oacc_loop_partition (loops, outer_mask,
+                                            is_oacc_parallel_kernels_gang_single);
+
   /* OpenACC kernels constructs are special: they currently don't use the
      generic oacc_loop infrastructure and attribute/dimension processing.  */
   if (is_oacc_kernels && is_oacc_kernels_parallelized)
-- 
2.17.1


[-- Attachment #8: 0007-Launch-kernels-asynchronously-in-OpenACC-kernels-reg.patch --]
[-- Type: text/x-diff, Size: 9342 bytes --]

From 5f30851a2d4c1a6e631e5b7870f8533811e50875 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gerg=C3=B6=20Barany?= <gergo@codesourcery.com>
Date: Mon, 21 Jan 2019 12:50:14 -0800
Subject: [PATCH 7/9] Launch kernels asynchronously in OpenACC kernels regions

Kernels regions are decomposed into one or more smaller regions that are to
be executed in sequence. With this patch, all of these regions are launched
asynchronously, and a wait directive is added after them. This means that
the host only waits once for the kernels to complete, not once per kernel.
If the original kernels region was marked async, that asynchronous behavior
is preserved, and no wait is added.

    gcc/
    * omp-oacc-kernels.c (add_async_clauses_and_wait): New function...
    (decompose_kernels_region_body): ... called from here.

    gcc/testsuite/
    * c-c++-common/goacc/kernels-conversion.c: Test automatically generated
    async clauses.
    * gfortran.dg/goacc/kernels-conversion.f95: Likewise.
---
 gcc/ChangeLog.openacc                         |  6 ++
 gcc/omp-oacc-kernels.c                        | 56 ++++++++++++++++++-
 gcc/testsuite/ChangeLog.openacc               |  7 +++
 .../c-c++-common/goacc/kernels-conversion.c   |  5 ++
 .../gfortran.dg/goacc/kernels-conversion.f95  |  5 ++
 5 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index f3fcbc88831..307922a29c0 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,9 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+	    Gergö Barany  <gergo@codesourcery.com>
+
+	* omp-oacc-kernels.c (add_async_clauses_and_wait): New function...
+	(decompose_kernels_region_body): ... called from here.
+
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 	    Gergö Barany  <gergo@codesourcery.com>
 
diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c
index c334502972c..f8553f77708 100644
--- a/gcc/omp-oacc-kernels.c
+++ b/gcc/omp-oacc-kernels.c
@@ -66,7 +66,13 @@ along with GCC; see the file COPYING3.  If not see
      gang-parallelizable loop inside an if statement is "gang-serialized" by
      the transformation.
      The transformation visits loops inside such new gang-single-regions and
-     removes and warns about any gang annotations.  */
+     removes and warns about any gang annotations.
+   - In order to make the host wait only once for the whole region instead
+     of once per kernel launch, the new parallel and serial regions are
+     annotated async.  Unless the original kernels region was marked async,
+     the entire region ends with a wait construct.  If the original kernels
+     region was marked async, the generated async statements use the async
+     queue the kernels region was annotated with (possibly implicitly).  */
 
 /* Helper function for decompose_kernels_region_body.  If STMT contains a
    "top-level" OMP_FOR statement, returns a pointer to that statement;
@@ -671,6 +677,38 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
   return body;
 }
 
+/* Helper function of decompose_kernels_region_body.  The statements in
+   REGION_BODY are expected to be decomposed parallel regions; add an
+   "async" clause to each.  Also add a "wait" pragma at the end of the
+   sequence.  */
+
+static void
+add_async_clauses_and_wait (location_t loc, gimple_seq *region_body)
+{
+  tree default_async_queue
+    = build_int_cst (integer_type_node, GOMP_ASYNC_NOVAL);
+  for (gimple_stmt_iterator gsi = gsi_start (*region_body);
+       !gsi_end_p (gsi);
+       gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+      tree target_clauses = gimple_omp_target_clauses (stmt);
+      tree new_async_clause = build_omp_clause (loc, OMP_CLAUSE_ASYNC);
+      OMP_CLAUSE_OPERAND (new_async_clause, 0) = default_async_queue;
+      OMP_CLAUSE_CHAIN (new_async_clause) = target_clauses;
+      target_clauses = new_async_clause;
+      gimple_omp_target_set_clauses (as_a <gomp_target *> (stmt),
+                                     target_clauses);
+    }
+  /* A "#pragma acc wait" is just a call GOACC_wait (acc_async_sync, 0).  */
+  tree wait_fn = builtin_decl_explicit (BUILT_IN_GOACC_WAIT);
+  tree sync_arg = build_int_cst (integer_type_node, GOMP_ASYNC_SYNC);
+  gimple *wait_call = gimple_build_call (wait_fn, 2,
+                                         sync_arg, integer_zero_node);
+  gimple_set_location (wait_call, loc);
+  gimple_seq_add_stmt (region_body, wait_call);
+}
+
 /* Auxiliary analysis of the body of a kernels region, to determine for each
    OpenACC loop whether it is control-dependent (i.e., not necessarily
    executed every time the kernels region is entered) or not.
@@ -885,10 +923,12 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
      except that the num_gangs, num_workers, and vector_length clauses will
      only be added to loop regions.  The other regions are "gang-single" and
      get an explicit num_gangs(1) clause.  So separate out the num_gangs,
-     num_workers, and vector_length clauses here.  */
+     num_workers, and vector_length clauses here.
+     Also check for the presence of an async clause but do not remove it
+     from the kernels clauses.  */
   tree num_gangs_clause = NULL, num_workers_clause = NULL,
        vector_length_clause = NULL;
-  tree prev_clause = NULL, next_clause = NULL;
+  tree prev_clause = NULL, next_clause = NULL, async_clause = NULL;
   tree parallel_clauses = kernels_clauses;
   for (tree c = parallel_clauses; c; c = next_clause)
     {
@@ -922,6 +962,8 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
         }
       else
         prev_clause = c;
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_ASYNC)
+        async_clause = c;
     }
 
   gimple *kernels_body = gimple_omp_body (kernels_region);
@@ -1085,6 +1127,14 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
       gimple_seq_add_stmt (&region_body, single_region);
     }
 
+  /* We want to launch these kernels asynchronously.  If the original
+     kernels region had an async clause, this is done automatically because
+     that async clause was copied to the individual regions we created.
+     Otherwise, add an async clause to each newly created region, as well as
+     a wait directive at the end.  */
+  if (async_clause == NULL)
+    add_async_clauses_and_wait (loc, &region_body);
+
   tree kernels_locals = gimple_bind_vars (as_a <gbind *> (kernels_body));
   gimple *body = gimple_build_bind (kernels_locals, region_body,
                                     make_node (BLOCK));
diff --git a/gcc/testsuite/ChangeLog.openacc b/gcc/testsuite/ChangeLog.openacc
index 7091f321ffd..84d345fdecc 100644
--- a/gcc/testsuite/ChangeLog.openacc
+++ b/gcc/testsuite/ChangeLog.openacc
@@ -1,3 +1,10 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+	    Gergö Barany  <gergo@codesourcery.com>
+
+	* c-c++-common/goacc/kernels-conversion.c: Test automatically generated
+	async clauses.
+	* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
+
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 	    Gergö Barany  <gergo@codesourcery.com>
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
index ed4d6429c65..3e52ec4f16f 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
@@ -49,5 +49,10 @@ main (void)
 /* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 3 "convert_oacc_kernels" } } */
 /* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 2 "convert_oacc_kernels" } } */
 
+/* Each of the parallel regions is async, and there is a final call to
+   __builtin_GOACC_wait.  */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels.* async\(-1\)" 5 "convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "__builtin_GOACC_wait" 1 "convert_oacc_kernels" } } */
+
 /* Check that the original kernels region is removed.  */
 /* { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
index f89e46b4d3b..559916c2325 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
@@ -47,5 +47,10 @@ end program main
 ! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 3 "convert_oacc_kernels" } }
 ! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 2 "convert_oacc_kernels" } }
 
+! Each of the parallel regions is async, and there is a final call to
+! __builtin_GOACC_wait.
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels.* async\(-1\)" 5 "convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "__builtin_GOACC_wait" 1 "convert_oacc_kernels" } }
+
 ! Check that the original kernels region is removed.
 ! { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } }
-- 
2.17.1


[-- Attachment #9: 0008-New-OpenACC-kernels-region-decompose-algorithm.patch --]
[-- Type: text/x-diff, Size: 39630 bytes --]

From d008a745628e5fc568e34dd959597c48e36e126b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 24 Jan 2019 08:40:03 -0800
Subject: [PATCH 8/9] New OpenACC kernels region decompose algorithm

Previously, OpenACC kernels region bodies were decomposed into a sequence of
alternating gang-single and gang-parallel "parallel" regions. The new
algorithm in this patch introduces a third possibility: Loops that look like
they might benefit from the parloops pass are converted into old "kernels"
regions, exposing them to the parloops pass later on. This has the benefit
that loops that cannot be parallelized are not offloaded to the GPU.

	gcc/
	* omp-oacc-kernels.c (adjust_region_code_walk_stmt_fn)
	(adjust_region_code): New functions.
	(make_loops_gang_single): Update.
	(make_gang_single_region): Rename to...
	(make_region_seq): ... this, and update.
	(make_gang_parallel_loop_region): Rename to...
	(make_region_loop_nest): ... this, and update.
	(is_unconditional_oacc_for_loop): Remove stmt parameter and check.
	(decompose_kernels_region_body): Update.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-conversion.c: Adjust test.
	* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
	* c-c++-common/goacc/kernels-decompose-1.c: New file.
	* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: New
	file.
---
 gcc/ChangeLog.openacc                         |  13 +
 gcc/omp-oacc-kernels.c                        | 287 +++++++++++++++---
 gcc/testsuite/ChangeLog.openacc               |   8 +
 .../c-c++-common/goacc/kernels-conversion.c   |  19 +-
 .../c-c++-common/goacc/kernels-decompose-1.c  | 123 ++++++++
 .../gfortran.dg/goacc/kernels-conversion.f95  |  22 +-
 .../gfortran.dg/goacc/kernels-decompose-1.f95 | 132 ++++++++
 libgomp/ChangeLog.openacc                     |   3 +
 .../kernels-decompose-1.c                     |  30 ++
 9 files changed, 571 insertions(+), 66 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index 307922a29c0..3ef97adef47 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,16 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+	    Gergö Barany  <gergo@codesourcery.com>
+
+	* omp-oacc-kernels.c (adjust_region_code_walk_stmt_fn)
+	(adjust_region_code): New functions.
+	(make_loops_gang_single): Update.
+	(make_gang_single_region): Rename to...
+	(make_region_seq): ... this, and update.
+	(make_gang_parallel_loop_region): Rename to...
+	(make_region_loop_nest): ... this, and update.
+	(is_unconditional_oacc_for_loop): Remove stmt parameter and check.
+	(decompose_kernels_region_body): Update.
+
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 	    Gergö Barany  <gergo@codesourcery.com>
 
diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c
index f8553f77708..a8860c98e11 100644
--- a/gcc/omp-oacc-kernels.c
+++ b/gcc/omp-oacc-kernels.c
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-iterator.h"
 #include "gimple-walk.h"
 #include "gomp-constants.h"
+#include "omp-general.h"
 
 /* This is a preprocessing pass to be run immediately before lower_omp.  It
    will convert OpenACC "kernels" regions into sequences of "parallel"
@@ -135,6 +136,95 @@ top_level_omp_for_in_stmt (gimple *stmt)
   return NULL;
 }
 
+/* Helper for adjust_region_code: evaluate the statement at GSI_P.  */
+
+static tree
+adjust_region_code_walk_stmt_fn (gimple_stmt_iterator *gsi_p,
+				 bool *handled_ops_p,
+				 struct walk_stmt_info *wi)
+{
+  int *region_code = (int *) wi->info;
+
+  gimple *stmt = gsi_stmt (*gsi_p);
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_OMP_FOR:
+      {
+	tree clauses = gimple_omp_for_clauses (stmt);
+	if (omp_find_clause (clauses, OMP_CLAUSE_INDEPENDENT))
+	  {
+	    /* Explicit 'independent' clause.  */
+	    /* Keep going; recurse into loop body.  */
+	    break;
+	  }
+	else if (omp_find_clause (clauses, OMP_CLAUSE_SEQ))
+	  {
+	    /* Explicit 'seq' clause.  */
+	    /* We'll "parallelize" if at some level a loop construct has been
+	       marked up by the user as unparallelizable ('seq' clause; we'll
+	       respect that in the later processing).  Given that the user has
+	       explicitly marked it up, this loop construct cannot be
+	       performance-critical (and we thus don't have to "avoid
+	       offloading"), and in this case it's also fine to "parallelize"
+	       instead of "gang-single", because any outer or inner loops may
+	       still exploit the available parallelism.  */
+	    /* Keep going; recurse into loop body.  */
+	    break;
+	  }
+	else
+	  {
+	    /* Explicit or implicit 'auto' clause.  */
+	    /* The user would like this loop analyzed ('auto' clause) and
+	       typically parallelized, but we don't have available yet the
+	       compiler logic to analyze this, so can't parallelize it here, so
+	       we'd very likely be running into a performance problem if we
+	       were to execute this unparallelized, thus forward the whole loop
+	       nest to "parloops".  */
+	    *region_code = GF_OMP_TARGET_KIND_OACC_KERNELS;
+	    /* Terminate: final decision for this region.  */
+	    *handled_ops_p = true;
+	    return integer_zero_node;
+	  }
+	gcc_unreachable ();
+      }
+
+    case GIMPLE_COND:
+    case GIMPLE_GOTO:
+    case GIMPLE_SWITCH:
+    case GIMPLE_ASM:
+    case GIMPLE_TRANSACTION:
+    case GIMPLE_RETURN:
+      /* Statement that might constitute some looping/control flow pattern.  */
+      /* The user would like this code analyzed (implicit inside a 'kernels'
+	 region) and typically parallelized, but we don't have available yet
+	 the compiler logic to analyze this, so can't parallelize it here, so
+	 we'd very likely be running into a performance problem if we were to
+	 execute this unparallelized, thus forward the whole thing to
+	 "parloops".  */
+      *region_code = GF_OMP_TARGET_KIND_OACC_KERNELS;
+      /* Terminate: final decision for this region.  */
+      *handled_ops_p = true;
+      return integer_zero_node;
+
+    default:
+      /* Keep going.  */
+      break;
+    }
+
+  return NULL;
+}
+
+/* Adjust the REGION_CODE for the region in GS.  */
+
+static void
+adjust_region_code (gimple_seq gs, int *region_code)
+{
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  wi.info = region_code;
+  walk_gimple_seq (gs, adjust_region_code_walk_stmt_fn, NULL, &wi);
+}
+
 /* Helper function for make_loops_gang_single for walking the tree.  If the
    statement indicated by GSI_P is an OpenACC for loop with a gang clause,
    issue a warning and remove the clause.  */
@@ -174,6 +264,7 @@ visit_loops_in_gang_single_region (gimple_stmt_iterator *gsi_p,
       gimple_omp_for_set_clauses (stmt, clauses);
       /* No need to recurse into nested statements; no loop nested inside
          this loop can be gang-partitioned.  */
+      sorry ("'gang' loop in \"gang-single\" region");
       *handled_ops_p = true;
       break;
 
@@ -184,16 +275,16 @@ visit_loops_in_gang_single_region (gimple_stmt_iterator *gsi_p,
   return NULL;
 }
 
-/* Visit all nested OpenACC loops in the statement indicated by GSI.  This
+/* Visit all nested OpenACC loops in the sequence indicated by GS.  This
    statement is expected to be inside a gang-single region.  Issue a warning
    for any loops inside it that have gang clauses and remove the clauses.  */
 
 static void
-make_loops_gang_single (gimple_stmt_iterator gsi)
+make_loops_gang_single (gimple_seq gs)
 {
   struct walk_stmt_info wi;
   memset (&wi, 0, sizeof (wi));
-  walk_gimple_stmt (&gsi, visit_loops_in_gang_single_region, NULL, &wi);
+  walk_gimple_seq (gs, visit_loops_in_gang_single_region, NULL, &wi);
 }
 
 /* Construct a "gang-single" OpenACC parallel region at LOC containing the
@@ -202,21 +293,73 @@ make_loops_gang_single (gimple_stmt_iterator gsi)
    to force gang-single execution.  */
 
 static gimple *
-make_gang_single_region (location_t loc, gimple_seq stmts, tree clauses)
+make_region_seq (location_t loc, gimple_seq stmts,
+		 tree num_gangs_clause,
+		 tree num_workers_clause,
+		 tree vector_length_clause,
+		 tree clauses)
 {
   /* This correctly unshares the entire clause chain rooted here.  */
   clauses = unshare_expr (clauses);
-  /* Make a num_gangs(1) clause.  */
-  tree gang_single_clause = build_omp_clause (loc, OMP_CLAUSE_NUM_GANGS);
-  OMP_CLAUSE_OPERAND (gang_single_clause, 0) = integer_one_node;
-  OMP_CLAUSE_CHAIN (gang_single_clause) = clauses;
+
+  location_t loc_stmts_first = gimple_location (gimple_seq_first (stmts));
+
+  /* Figure out the region code for this region.  */
+  /* Optimistic default: assume "setup code", no looping; thus not
+     performance-critical.  */
+  int region_code = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE;
+  adjust_region_code (stmts, &region_code);
+
+  if (region_code == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE)
+    {
+      dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc_stmts_first,
+		       "beginning \"gang-single\" region in OpenACC 'kernels'"
+		       " construct\n");
+
+      /* Make a num_gangs(1) clause.  */
+      tree gang_single_clause = build_omp_clause (loc, OMP_CLAUSE_NUM_GANGS);
+      OMP_CLAUSE_OPERAND (gang_single_clause, 0) = integer_one_node;
+      OMP_CLAUSE_CHAIN (gang_single_clause) = clauses;
+      clauses = gang_single_clause;
+
+      /* Remove and issue warnings about gang clauses on any OpenACC
+	 loops nested inside this sequentially executed statement.  */
+      make_loops_gang_single (stmts);
+    }
+  else if (region_code == GF_OMP_TARGET_KIND_OACC_KERNELS)
+    {
+      dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc_stmts_first,
+		       "beginning \"parloops\" region in OpenACC 'kernels'"
+		       " construct\n");
+
+      /* As we're transforming a "GF_OMP_TARGET_KIND_OACC_KERNELS" into another
+	 "GF_OMP_TARGET_KIND_OACC_KERNELS", this isn't doing any of the clauses
+	 mangling that "make_region_loop_nest" is doing.  */
+      /* Re-assemble the clauses stripped off earlier.  */
+      if (num_gangs_clause != NULL)
+	{
+	  tree c = unshare_expr (num_gangs_clause);
+	  OMP_CLAUSE_CHAIN (c) = clauses;
+	  clauses = c;
+	}
+      if (num_workers_clause != NULL)
+	{
+	  tree c = unshare_expr (num_workers_clause);
+	  OMP_CLAUSE_CHAIN (c) = clauses;
+	  clauses = c;
+	}
+      if (vector_length_clause != NULL)
+	{
+	  tree c = unshare_expr (vector_length_clause);
+	  OMP_CLAUSE_CHAIN (c) = clauses;
+	  clauses = c;
+	}
+    }
+  else
+    gcc_unreachable ();
 
   /* Build the gang-single region.  */
-  gimple *single_region
-    = gimple_build_omp_target (
-        NULL,
-        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE,
-        gang_single_clause);
+  gimple *single_region = gimple_build_omp_target (NULL, region_code, clauses);
   gimple_set_location (single_region, loc);
   gbind *single_body = gimple_build_bind (NULL, stmts, make_node (BLOCK));
   gimple_omp_set_body (single_region, single_body);
@@ -224,7 +367,7 @@ make_gang_single_region (location_t loc, gimple_seq stmts, tree clauses)
   return single_region;
 }
 
-/* Helper function for make_gang_parallel_loop_region.  Adds a num_gangs
+/* Helper function for make_region_loop_nest.  Adds a num_gangs
    (num_workers, vector_length) clause to the given CLAUSES, either the one
    from the parent region (PARENT_CLAUSE) or a new one based on the loop's
    own LOOP_CLAUSE ("gang(num: N)" or similar for workers or vectors) with
@@ -256,7 +399,7 @@ add_parent_or_loop_num_clause (tree parent_clause, tree loop_clause,
   return clauses;
 }
 
-/* Helper for make_gang_parallel_loop_region, looking for "worker(num: N)"
+/* Helper for make_region_loop_nest, looking for "worker(num: N)"
    or "vector(length: N)" clauses in nested loops.  Removes the numeric
    argument, transferring it to the enclosing parallel region (via
    WI->INFO).  If numeric arguments within the same loop nest conflict,
@@ -488,32 +631,63 @@ transform_kernels_loop_clauses (gimple *omp_for,
    adjust_nested_loop_clauses function.  */
 
 static gimple *
-make_gang_parallel_loop_region (gimple *omp_for, gimple *stmt,
-                                tree num_gangs_clause,
-                                tree num_workers_clause,
-                                tree vector_length_clause,
-                                tree clauses)
+make_region_loop_nest (gimple *omp_for, gimple_seq stmts,
+		       tree num_gangs_clause,
+		       tree num_workers_clause,
+		       tree vector_length_clause,
+		       tree clauses)
 {
   /* This correctly unshares the entire clause chain rooted here.  */
   clauses = unshare_expr (clauses);
 
-  clauses = transform_kernels_loop_clauses (omp_for,
-					    num_gangs_clause,
-					    num_workers_clause,
-					    vector_length_clause,
-					    clauses);
+  /* Figure out the region code for this region.  */
+  /* Optimistic default: assume that the loop nest is parallelizable
+     (essentially, no GIMPLE_OMP_FOR with (explicit or implicit) 'auto' clause,
+     and no un-annotated loops).  */
+  int region_code = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED;
+  adjust_region_code (stmts, &region_code);
+
+  if (region_code == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED)
+    {
+      dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, gimple_location (omp_for),
+		       "parallelized loop nest in OpenACC 'kernels'"
+		       " construct\n");
+
+      clauses = transform_kernels_loop_clauses (omp_for,
+						num_gangs_clause,
+						num_workers_clause,
+						vector_length_clause,
+						clauses);
+    }
+  else if (region_code == GF_OMP_TARGET_KIND_OACC_KERNELS)
+    {
+      dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, gimple_location (omp_for),
+		       "forwarded loop nest in OpenACC 'kernels' construct to"
+		       " \"parloops\" for analysis\n");
+
+      /* We're transforming one "GF_OMP_TARGET_KIND_OACC_KERNELS" into another
+	 "GF_OMP_TARGET_KIND_OACC_KERNELS", so don't have to
+	 "transform_kernels_loop_clauses".  */
+      /* Re-assemble the clauses stripped off earlier.  */
+      clauses
+	= add_parent_or_loop_num_clause (num_gangs_clause, NULL,
+					 OMP_CLAUSE_NUM_GANGS, clauses);
+      clauses
+	= add_parent_or_loop_num_clause (num_workers_clause, NULL,
+					 OMP_CLAUSE_NUM_WORKERS, clauses);
+      clauses
+	= add_parent_or_loop_num_clause (vector_length_clause, NULL,
+					 OMP_CLAUSE_VECTOR_LENGTH, clauses);
+    }
+  else
+    gcc_unreachable ();
 
   /* Now build the parallel region containing this loop.  */
-  gimple_seq parallel_body = NULL;
-  gimple_seq_add_stmt (&parallel_body, stmt);
   gimple *parallel_body_bind
-    = gimple_build_bind (NULL, parallel_body, make_node (BLOCK));
+    = gimple_build_bind (NULL, stmts, make_node (BLOCK));
   gimple *parallel_region
-    = gimple_build_omp_target (
-        parallel_body_bind,
-        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED,
-        clauses);
-  gimple_set_location (parallel_region, gimple_location (stmt));
+    = gimple_build_omp_target (parallel_body_bind, region_code, clauses);
+  gimple_set_location (parallel_region, gimple_location (omp_for));
 
   return parallel_region;
 }
@@ -739,9 +913,9 @@ class control_flow_regions
        information for the statement sequence SEQ.  */
     control_flow_regions (gimple_seq seq);
 
-    /* Return true if the STMT with the given index IDX in the analyzed
+    /* Return true if the statement with the given index IDX in the analyzed
        statement sequence is an unconditionally executed OpenACC loop.  */
-    bool is_unconditional_oacc_for_loop (gimple *stmt, size_t idx);
+    bool is_unconditional_oacc_for_loop (size_t idx);
 
   private:
     /* Find the region representative for the statement identified by index
@@ -772,11 +946,8 @@ control_flow_regions::control_flow_regions (gimple_seq seq)
 }
 
 bool
-control_flow_regions::is_unconditional_oacc_for_loop (gimple *stmt, size_t idx)
+control_flow_regions::is_unconditional_oacc_for_loop (size_t idx)
 {
-  if (top_level_omp_for_in_stmt (stmt) == NULL)
-    /* Not an OpenACC for loop.  */
-    return false;
   if (idx == 0 || idx == representatives.length () - 1)
     /* The first or last statement in the kernels region.  This means that
        there is no room before or after it for a jump or a label.  Thus
@@ -912,7 +1083,7 @@ control_flow_regions::compute_regions (gimple_seq seq)
 }
 
 /* Decompose the body of the KERNELS_REGION, which was originally annotated
-   with the KERNELS_CLAUSES, into a series of parallel regions.  */
+   with the KERNELS_CLAUSES, into a series of regions.  */
 
 static gimple *
 decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
@@ -1052,17 +1223,24 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
 
       gimple *stmt = gsi_stmt (gsi);
       gimple *omp_for = top_level_omp_for_in_stmt (stmt);
+      bool is_unconditional_oacc_for_loop = false;
+      if (omp_for != NULL)
+	is_unconditional_oacc_for_loop
+	  = cf_regions.is_unconditional_oacc_for_loop (idx);
       if (omp_for != NULL
-          && cf_regions.is_unconditional_oacc_for_loop (stmt, idx))
+          && is_unconditional_oacc_for_loop)
         {
-          /* This is an OMP for statement, put it into a parallel region.
+          /* This is an OMP for statement, put it into a separate region.
              But first, construct a gang-single region containing any
              complex sequential statements we may have seen.  */
           if (gang_single_seq != NULL && !only_simple_assignments)
             {
               gimple *single_region
-                = make_gang_single_region (loc, gang_single_seq,
-                                           kernels_clauses);
+                = make_region_seq (loc, gang_single_seq,
+				   num_gangs_clause,
+				   num_workers_clause,
+				   vector_length_clause,
+				   kernels_clauses);
               gimple_seq_add_stmt (&region_body, single_region);
             }
           else if (gang_single_seq != NULL && only_simple_assignments)
@@ -1080,8 +1258,10 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
           gang_single_seq = NULL;
           only_simple_assignments = true;
 
+	  gimple_seq parallel_seq = NULL;
+	  gimple_seq_add_stmt (&parallel_seq, stmt);
           gimple *parallel_region
-            = make_gang_parallel_loop_region (omp_for, stmt,
+	    = make_region_loop_nest (omp_for, parallel_seq,
                                               num_gangs_clause,
                                               num_workers_clause,
                                               vector_length_clause,
@@ -1090,6 +1270,14 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
         }
       else
         {
+	  if (omp_for != NULL)
+	    {
+	      gcc_checking_assert (!is_unconditional_oacc_for_loop);
+	      dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, gimple_location (omp_for),
+			       "unparallelized loop nest in OpenACC 'kernels'"
+			       " region: it's executed conditionally\n");
+	    }
+
           /* This is not an unconditional OMP for statement, so it will be
              put into a gang-single region.  */
           gimple_seq_add_stmt (&gang_single_seq, stmt);
@@ -1102,9 +1290,6 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
                 && DECL_ARTIFICIAL (gimple_assign_lhs (stmt)));
           if (!is_simple_assignment)
             only_simple_assignments = false;
-          /* Remove and issue warnings about gang clauses on any OpenACC
-             loops nested inside this sequentially executed statement.  */
-          make_loops_gang_single (gsi);
         }
     }
 
@@ -1123,7 +1308,11 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
   if (gang_single_seq != NULL)
     {
       gimple *single_region
-        = make_gang_single_region (loc, gang_single_seq, kernels_clauses);
+        = make_region_seq (loc, gang_single_seq,
+			   num_gangs_clause,
+			   num_workers_clause,
+			   vector_length_clause,
+			   kernels_clauses);
       gimple_seq_add_stmt (&region_body, single_region);
     }
 
diff --git a/gcc/testsuite/ChangeLog.openacc b/gcc/testsuite/ChangeLog.openacc
index 84d345fdecc..3b4a9c8370d 100644
--- a/gcc/testsuite/ChangeLog.openacc
+++ b/gcc/testsuite/ChangeLog.openacc
@@ -1,3 +1,11 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+	    Gergö Barany  <gergo@codesourcery.com>
+
+	* c-c++-common/goacc/kernels-conversion.c: Adjust test.
+	* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
+	* c-c++-common/goacc/kernels-decompose-1.c: New file.
+	* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
+
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 	    Gergö Barany  <gergo@codesourcery.com>
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
index 3e52ec4f16f..ea7eec997fb 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
@@ -12,19 +12,22 @@ main (void)
   unsigned int sum = 1;
 
 #pragma acc kernels copyin(a[0:N]) copy(sum)
-  /* { dg-bogus "region contains gang partitoned code but is not gang partitioned" "gang partitioned" { xfail *-*-* } .-1 } */
   {
+    /* converted to "oacc_kernels" */
     #pragma acc loop
     for (i = 0; i < N; ++i)
       sum += a[i];
 
+    /* converted to "oacc_parallel_kernels_gang_single" */
     sum++;
     a[0]++;
 
-    #pragma acc loop
+    /* converted to "oacc_parallel_kernels_parallelized" */
+    #pragma acc loop independent
     for (i = 0; i < N; ++i)
       sum += a[i];
 
+    /* converted to "oacc_kernels" */
     if (sum > 10)
       { 
         #pragma acc loop
@@ -32,7 +35,8 @@ main (void)
           sum += a[i];
       }
 
-    #pragma acc loop
+    /* converted to "oacc_kernels" */
+    #pragma acc loop auto
     for (i = 0; i < N; ++i)
       sum += a[i];
   }
@@ -44,10 +48,11 @@ main (void)
    parallel regions.  */ 
 /* { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 "convert_oacc_kernels" } } */
 
-/* The three unconditional loop regions are parallelized, the sequential
-   part in between and the conditional loop are made gang-single.  */
-/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 3 "convert_oacc_kernels" } } */
-/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 2 "convert_oacc_kernels" } } */
+/* As noted in the comments above, we get one gang-single serial region; one
+   parallelized loop region; and three "old-style" kernel regions. */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 1 "convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 1 "convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "oacc_kernels" 3 "convert_oacc_kernels" } } */
 
 /* Each of the parallel regions is async, and there is a final call to
    __builtin_GOACC_wait.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
new file mode 100644
index 00000000000..b5d58c37200
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
@@ -0,0 +1,123 @@
+/* Test OpenACC 'kernels' construct decomposition.  */
+
+/* { dg-additional-options "-fopenacc-kernels=split" } */
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+/* { dg-additional-options "-O2" } for "parloops".  */
+
+/* See also "../../gfortran.dg/goacc/kernels-decompose-1.f95".  */
+
+#pragma acc routine gang
+extern int
+f_g (int);
+
+#pragma acc routine worker
+extern int
+f_w (int);
+
+#pragma acc routine vector
+extern int
+f_v (int);
+
+#pragma acc routine seq
+extern int
+f_s (int);
+
+int
+main ()
+{
+  int x, y, z;
+#define N 10
+  int a[N], b[N], c[N];
+
+#pragma acc kernels
+  {
+    x = 0; /* { dg-message "note: beginning .gang-single. region in OpenACC .kernels. construct" } */
+    y = x < 10;
+    z = x++;
+    ;
+  }
+
+#pragma acc kernels
+  for (int i = 0; i < N; i++) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+    a[i] = 0;
+
+#pragma acc kernels loop
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (int i = 0; i < N; i++)
+    b[i] = a[N - i - 1];
+
+#pragma acc kernels
+  {
+#pragma acc loop
+    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+    for (int i = 0; i < N; i++)
+      b[i] = a[N - i - 1];
+
+#pragma acc loop
+    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+    for (int i = 0; i < N; i++)
+      c[i] = a[i] * b[i];
+
+    a[z] = 0; /* { dg-message "note: beginning .gang-single. region in OpenACC .kernels. construct" } */
+
+#pragma acc loop
+    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+    for (int i = 0; i < N; i++)
+      c[i] += a[i];
+
+#pragma acc loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+    /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+    for (int i = 0 + 1; i < N; i++)
+      c[i] += c[i - 1];
+  }
+
+#pragma acc kernels
+  {
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang loop parallelism" } */
+    /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+    for (int i = 0; i < N; ++i)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC worker loop parallelism" } */
+      for (int j = 0; j < N; ++j)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+	 /* { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 } */
+	for (int k = 0; k < N; ++k)
+	  a[(i + j + k) % N]
+	    = b[j]
+	    + f_v (c[k]); /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+
+    //TODO Should the following turn into "gang-single" instead of "parloops"?
+    //TODO The problem is that the first STMT is "if (y <= 4) goto <D.2547>; else goto <D.2548>;", thus "parloops".
+    if (y < 5) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+#pragma acc loop independent /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" } */
+      for (int j = 0; j < N; ++j)
+	b[j] = f_w (c[j]);
+  }
+
+#pragma acc kernels /* { dg-warning "region contains gang partitoned code but is not gang partitioned" } */
+  {
+    /* { dg-message "note: beginning .gang-single. region in OpenACC .kernels. construct" "" { target *-*-* } .+1 } */
+    y = f_g (a[5]); /* { dg-message "note: assigned OpenACC gang worker vector loop parallelism" } */
+
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang loop parallelism" } */
+    /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+    for (int j = 0; j < N; ++j)
+      b[j] = y + f_w (c[j]); /* { dg-message "note: assigned OpenACC worker vector loop parallelism" } */
+  }
+
+#pragma acc kernels
+  {
+    y = 3; /* { dg-message "note: beginning .gang-single. region in OpenACC .kernels. construct" } */
+
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang worker loop parallelism" } */
+    /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+    for (int j = 0; j < N; ++j)
+      b[j] = y + f_v (c[j]); /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+
+    z = 2; /* { dg-message "note: beginning .gang-single. region in OpenACC .kernels. construct" } */
+  }
+
+#pragma acc kernels /* { dg-message "note: beginning .gang-single. region in OpenACC .kernels. construct" } */
+  ;
+
+  return 0;
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
index 559916c2325..6604727cf13 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
@@ -9,19 +9,23 @@ program main
 
   !$acc kernels copyin(a(1:N)) copy(sum)
 
+  ! converted to "oacc_kernels"
   !$acc loop
   do i = 1, N
     sum = sum + a(i)
   end do
 
+  ! converted to "oacc_parallel_kernels_gang_single"
   sum = sum + 1
   a(1) = a(1) + 1
 
-  !$acc loop
+  ! converted to "oacc_parallel_kernels_parallelized"
+  !$acc loop independent
   do i = 1, N
     sum = sum + a(i)
   end do
 
+  ! converted to "oacc_kernels"
   if (sum .gt. 10) then
     !$acc loop
     do i = 1, N
@@ -29,8 +33,8 @@ program main
     end do
   end if
 
-  !$acc loop
-  ! { dg-bogus "region contains gang partitoned code but is not gang partitioned" "gang partitioned" { xfail *-*-* } .-1 }
+  ! converted to "oacc_kernels"
+  !$acc loop auto
   do i = 1, N
     sum = sum + a(i)
   end do
@@ -42,15 +46,13 @@ end program main
 ! parallel regions.
 ! { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 "convert_oacc_kernels" } }
 
-! The three unconditional loop regions are parallelized, the sequential part
-! in between and the conditional loop are made gang-single.
-! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 3 "convert_oacc_kernels" } }
-! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 2 "convert_oacc_kernels" } }
+! As noted in the comments above, we get one gang-single serial region; one
+! parallelized loop region; and three "old-style" kernel regions.
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 1 "convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 1 "convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_kernels" 3 "convert_oacc_kernels" } }
 
 ! Each of the parallel regions is async, and there is a final call to
 ! __builtin_GOACC_wait.
 ! { dg-final { scan-tree-dump-times "oacc_parallel_kernels.* async\(-1\)" 5 "convert_oacc_kernels" } }
 ! { dg-final { scan-tree-dump-times "__builtin_GOACC_wait" 1 "convert_oacc_kernels" } }
-
-! Check that the original kernels region is removed.
-! { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
new file mode 100644
index 00000000000..520bf034ac6
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
@@ -0,0 +1,132 @@
+! Test OpenACC 'kernels' construct decomposition.
+
+! { dg-additional-options "-fopenacc-kernels=split" }
+! { dg-additional-options "-fopt-info-optimized-omp" }
+! { dg-additional-options "-O2" } for "parloops".
+
+! See also "../../c-c++-common/goacc/kernels-decompose-1.c".
+
+program main
+  implicit none
+
+  integer, external :: f_g
+  !$acc routine (f_g) gang
+  integer, external :: f_w
+  !$acc routine (f_w) worker
+  integer, external :: f_v
+  !$acc routine (f_v) vector
+  integer, external :: f_s
+  !$acc routine (f_s) seq
+
+  integer :: i, j, k
+  integer :: x, y, z
+  logical :: y_l
+  integer, parameter :: N = 10
+  integer :: a(N), b(N), c(N)
+
+  !$acc kernels
+  x = 0 ! { dg-message "note: beginning .gang-single. region in OpenACC .kernels. construct" }
+  y = 0
+  y_l = x < 10
+  z = x
+  x = x + 1
+  ;
+  !$acc end kernels
+
+  !$acc kernels ! { dg-message "note: assigned OpenACC gang loop parallelism" }
+  do i = 1, N ! { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" }
+     a(i) = 0
+  end do
+  !$acc end kernels
+
+  !$acc kernels loop ! { dg-message "note: assigned OpenACC gang loop parallelism" }
+  ! { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 }
+  do i = 1, N
+     b(i) = a(N - i + 1)
+  end do
+
+  !$acc kernels
+  !$acc loop ! { dg-message "note: assigned OpenACC gang loop parallelism" }
+  ! { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 }
+  do i = 1, N
+     b(i) = a(N - i + 1)
+  end do
+
+  !$acc loop ! { dg-message "note: assigned OpenACC gang loop parallelism" }
+  ! { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 }
+  do i = 1, N
+     c(i) = a(i) * b(i)
+  end do
+
+  a(z) = 0 ! { dg-message "note: beginning .gang-single. region in OpenACC .kernels. construct" }
+
+  !$acc loop ! { dg-message "note: assigned OpenACC gang loop parallelism" }
+  ! { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 }
+  do i = 1, N
+     c(i) = c(i) + a(i)
+  end do
+
+  !$acc loop seq ! { dg-message "note: assigned OpenACC seq loop parallelism" }
+  ! { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 }
+  do i = 1 + 1, N
+     c(i) = c(i) + c(i - 1)
+  end do
+  !$acc end kernels
+
+  !$acc kernels ! { dg-bogus "note: assigned OpenACC seq loop parallelism" "TODO" { xfail *-*-* } }
+  !$acc loop independent ! { dg-message "note: assigned OpenACC gang loop parallelism" }
+  ! { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 }
+  do i = 1, N
+     !$acc loop independent ! { dg-message "note: assigned OpenACC worker loop parallelism" }
+     do j = 1, N
+        !$acc loop independent ! { dg-message "note: assigned OpenACC seq loop parallelism" "TODO" { xfail *-*-* } }
+        ! { dg-warning "insufficient partitioning available to parallelize loop" "TODO" { xfail *-*-* } .-1 }
+        ! { dg-bogus "note: assigned OpenACC vector loop parallelism" "TODO" { xfail *-*-* } .-2 }
+        do k = 1, N
+           a(1 + mod(i + j + k, N)) &
+                = b(j) &
+                + f_v (c(k)) ! { dg-message "note: assigned OpenACC vector loop parallelism" "TODO" { xfail *-*-* } .-1 }
+        end do
+     end do
+  end do
+
+  !TODO Should the following turn into "gang-single" instead of "parloops"?
+  !TODO The problem is that the first STMT is "if (y <= 4) goto <D.2547>; else goto <D.2548>;", thus "parloops".
+  if (y < 5) then ! { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" }
+     !$acc loop independent ! { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" }
+     do j = 1, N
+        b(j) = f_w (c(j))
+     end do
+  end if
+  !$acc end kernels
+
+  !$acc kernels
+  !TODO This refers to the "gang-single" "f_g" call.
+  ! { dg-warning "region contains gang partitoned code but is not gang partitioned" "TODO" { xfail *-*-* } .-2 }
+  ! { dg-message "note: beginning .gang-single. region in OpenACC .kernels. construct" "" { target *-*-* } .+1 }
+  y = f_g (a(5)) ! { dg-message "note: assigned OpenACC gang worker vector loop parallelism" "TODO" { xfail *-*-* } }
+
+  !$acc loop independent ! { dg-message "note: assigned OpenACC gang loop parallelism" "TODO" { xfail *-*-* } }
+  ! { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 }
+  ! { dg-bogus "note: assigned OpenACC gang vector loop parallelism" "TODO" { xfail *-*-* } .-2 }
+  do j = 1, N
+     b(j) = y + f_w (c(j)) ! { dg-message "note: assigned OpenACC worker vector loop parallelism" "TODO" { xfail *-*-* } }
+  end do
+  !$acc end kernels
+
+  !$acc kernels
+  y = 3 ! { dg-message "note: beginning .gang-single. region in OpenACC .kernels. construct" }
+
+  !$acc loop independent ! { dg-message "note: assigned OpenACC gang worker loop parallelism" "TODO" { xfail *-*-* } }
+  ! { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 }
+  ! { dg-bogus "note: assigned OpenACC gang vector loop parallelism" "TODO" { xfail *-*-* } .-2 }
+  do j = 1, N
+     b(j) = y + f_v (c(j)) ! { dg-message "note: assigned OpenACC vector loop parallelism" "TODO" { xfail *-*-* } }
+  end do
+
+  z = 2 ! { dg-message "note: beginning .gang-single. region in OpenACC .kernels. construct" }
+  !$acc end kernels
+
+  !$acc kernels ! { dg-message "note: beginning .gang-single. region in OpenACC .kernels. construct" }
+  !$acc end kernels  
+end program main
diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc
index f2ff2ee32d2..27ce3434e49 100644
--- a/libgomp/ChangeLog.openacc
+++ b/libgomp/ChangeLog.openacc
@@ -1,5 +1,8 @@
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: New
+	file.
+
 	* testsuite/libgomp.oacc-fortran/initialize_kernels_loops.f90:
 	Update.
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
new file mode 100644
index 00000000000..601e543495e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
@@ -0,0 +1,30 @@
+/* { dg-additional-options "-fopenacc-kernels=split" } */
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+#undef NDEBUG
+#include <assert.h>
+
+int main()
+{
+  int a = 0;
+#define N 123
+  int b[N] = { 0 };
+
+#pragma acc kernels
+  {
+    int c = 234; /* { dg-warning "note: beginning .gang-single. region in OpenACC .kernels. construct" } */
+
+#pragma acc loop independent gang /* { dg-warning "note: assigned OpenACC gang loop parallelism" } */
+    /* { dg-warning "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } 17 } */
+    for (int i = 0; i < N; ++i)
+      b[i] = c;
+
+    a = c; /* { dg-warning "note: beginning .gang-single. region in OpenACC .kernels. construct" } */
+  }
+
+  for (int i = 0; i < N; ++i)
+    assert (b[i] == 234);
+  assert (a == 234);
+
+  return 0;
+}
-- 
2.17.1


[-- Attachment #10: 0009-Make-new-OpenACC-kernels-conversion-the-default-adju.patch --]
[-- Type: text/x-diff, Size: 103526 bytes --]

From 53f61640a510d13537709239d523c283881f0755 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 23 Jan 2019 02:40:08 -0800
Subject: [PATCH 9/9] Make new OpenACC kernels conversion the default; adjust
 and add tests

	gcc/c-family/
	* c.opt (fopenacc-kernels): Default to "split".
	gcc/fortran/
	* lang.opt (fopenacc-kernels): Default to "split".
	gcc/
	* doc/invoke.texi (-fopenacc-kernels): Update.
	gcc/testsuite/
	* c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c:
	New file.
	* c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c:
	Likewise.
	* c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c:
	Likewise.
	* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise.
	* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c:
	Likewise.
	* c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c:
	Likewise.
	* c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c:
	Likewise.
	* c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c:
	Likewise.
	* c-c++-common/goacc/note-parallelism-kernels-loop-auto.c:
	Likewise.
	* c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c:
	Likewise.
	* c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise.
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Update.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/classify-parallel.c: Likewise.
	* c-c++-common/goacc/classify-routine.c: Likewise.
	* c-c++-common/goacc/dtype-1.c: Likewise.
	* c-c++-common/goacc/if-clause-2.c: Likewise.
	* c-c++-common/goacc/kernels-conversion.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-1.c: Likewise.
	* c-c++-common/goacc/loop-2-kernels.c: Likewise.
	* c-c++-common/goacc/note-parallelism.c: Likewise.
	* c-c++-common/goacc/routine-1.c: Likewise.
	* c-c++-common/goacc/uninit-dim-clause.c: Likewise.
	* gfortran.dg/goacc/dtype-1.f95: Likewise.
	* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
	* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c:
	Update.
	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/avoid-offloading-1.f: Likewise.
	* testsuite/libgomp.oacc-fortran/avoid-offloading-2.f: Likewise.
	* testsuite/libgomp.oacc-fortran/avoid-offloading-3.f: Likewise.
	* testsuite/libgomp.oacc-fortran/initialize_kernels_loops.f90:
	Likewise.
---
 gcc/ChangeLog.openacc                         |   4 +
 gcc/c-family/ChangeLog.openacc                |   4 +
 gcc/c-family/c.opt                            |   2 +-
 gcc/doc/invoke.texi                           |   2 +-
 gcc/fortran/ChangeLog.openacc                 |   4 +
 gcc/fortran/lang.opt                          |   2 +-
 gcc/testsuite/ChangeLog.openacc               |  40 ++++
 .../goacc/classify-kernels-unparallelized.c   |   7 +-
 .../c-c++-common/goacc/classify-kernels.c     |   2 +-
 .../c-c++-common/goacc/classify-parallel.c    |   2 +-
 .../c-c++-common/goacc/classify-routine.c     |   2 +-
 gcc/testsuite/c-c++-common/goacc/dtype-1.c    |   8 +-
 .../c-c++-common/goacc/if-clause-2.c          |   1 -
 .../c-c++-common/goacc/kernels-conversion.c   |  10 +-
 .../c-c++-common/goacc/kernels-decompose-1.c  |   1 -
 .../c-c++-common/goacc/loop-2-kernels.c       |  14 +-
 ...kernels-conditional-loop-independent_seq.c | 129 +++++++++++
 .../note-parallelism-1-kernels-loop-auto.c    | 126 +++++++++++
 ...rallelism-1-kernels-loop-independent_seq.c | 126 +++++++++++
 .../goacc/note-parallelism-1-kernels-loops.c  |  47 ++++
 ...note-parallelism-1-kernels-straight-line.c |  82 +++++++
 ...e-parallelism-combined-kernels-loop-auto.c | 121 +++++++++++
 ...sm-combined-kernels-loop-independent_seq.c | 121 +++++++++++
 ...kernels-conditional-loop-independent_seq.c | 204 ++++++++++++++++++
 .../note-parallelism-kernels-loop-auto.c      | 138 ++++++++++++
 ...parallelism-kernels-loop-independent_seq.c | 138 ++++++++++++
 .../goacc/note-parallelism-kernels-loops.c    |  50 +++++
 .../c-c++-common/goacc/note-parallelism.c     |   3 +-
 gcc/testsuite/c-c++-common/goacc/routine-1.c  |   2 +-
 .../c-c++-common/goacc/uninit-dim-clause.c    |   6 +-
 gcc/testsuite/gfortran.dg/goacc/dtype-1.f95   |   8 +-
 .../gfortran.dg/goacc/kernels-conversion.f95  |   7 +-
 .../gfortran.dg/goacc/kernels-decompose-1.f95 |   1 -
 .../gfortran.dg/goacc/kernels-tree.f95        |   1 -
 libgomp/ChangeLog.openacc                     |  16 ++
 .../acc_prof-kernels-1.c                      |  17 +-
 .../avoid-offloading-1.c                      |  18 +-
 .../avoid-offloading-2.c                      |  17 +-
 .../avoid-offloading-3.c                      |  14 +-
 .../kernels-decompose-1.c                     |   3 +-
 .../libgomp.oacc-fortran/avoid-offloading-1.f |  18 +-
 .../libgomp.oacc-fortran/avoid-offloading-2.f |  18 +-
 .../libgomp.oacc-fortran/avoid-offloading-3.f |  15 +-
 .../initialize_kernels_loops.f90              |   1 -
 44 files changed, 1494 insertions(+), 58 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-auto.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index 3ef97adef47..433653b2b38 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,7 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* doc/invoke.texi (-fopenacc-kernels): Update.
+
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 	    Gergö Barany  <gergo@codesourcery.com>
 
diff --git a/gcc/c-family/ChangeLog.openacc b/gcc/c-family/ChangeLog.openacc
index 5b60c3a0dee..39d51c53808 100644
--- a/gcc/c-family/ChangeLog.openacc
+++ b/gcc/c-family/ChangeLog.openacc
@@ -1,3 +1,7 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* c.opt (fopenacc-kernels): Default to "split".
+
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* c.opt (fopenacc-kernels): New flag.
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 12f8f55c50f..7b7a90e938a 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1618,7 +1618,7 @@ C ObjC C++ ObjC++ LTO Joined Var(flag_openacc_dims)
 Specify default OpenACC compute dimensions.
 
 fopenacc-kernels=
-C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS)
+C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_SPLIT)
 -fopenacc-kernels=[split|parloops]	Configure OpenACC 'kernels' constructs handling.
 
 Enum
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3bbeb8c6839..569c02d1e96 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2165,9 +2165,9 @@ Configure OpenACC 'kernels' constructs handling.
 With @option{-fopenacc-kernels=split}, OpenACC 'kernels' constructs
 are split into a sequence of compute constructs, each then handled
 individually.
+This is the default.
 With @option{-fopenacc-kernels=parloops}, the whole OpenACC
 'kernels' constructs is handled by the @samp{parloops} pass.
-This is the default.
 
 @item -fopenmp
 @opindex fopenmp
diff --git a/gcc/fortran/ChangeLog.openacc b/gcc/fortran/ChangeLog.openacc
index acb2177f22f..2715d7aa195 100644
--- a/gcc/fortran/ChangeLog.openacc
+++ b/gcc/fortran/ChangeLog.openacc
@@ -1,3 +1,7 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* lang.opt (fopenacc-kernels): Default to "split".
+
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* lang.opt (fopenacc-kernels): New flag.
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index b3c9cdb425f..19199aeccbc 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -643,7 +643,7 @@ Fortran LTO Joined Var(flag_openacc_dims)
 ; Documented in C
 
 fopenacc-kernels=
-Fortran RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS)
+Fortran RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_SPLIT)
 ; Documented in C
 
 fopenmp
diff --git a/gcc/testsuite/ChangeLog.openacc b/gcc/testsuite/ChangeLog.openacc
index 3b4a9c8370d..5dd6d7d656c 100644
--- a/gcc/testsuite/ChangeLog.openacc
+++ b/gcc/testsuite/ChangeLog.openacc
@@ -1,3 +1,43 @@
+2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
+	    Gergö Barany  <gergo@codesourcery.com>
+
+	* c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c:
+	New file.
+	* c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c:
+	Likewise.
+	* c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c:
+	Likewise.
+	* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise.
+	* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c:
+	Likewise.
+	* c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c:
+	Likewise.
+	* c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c:
+	Likewise.
+	* c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c:
+	Likewise.
+	* c-c++-common/goacc/note-parallelism-kernels-loop-auto.c:
+	Likewise.
+	* c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c:
+	Likewise.
+	* c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise.
+	* c-c++-common/goacc/classify-kernels-unparallelized.c: Update.
+	* c-c++-common/goacc/classify-kernels.c: Likewise.
+	* c-c++-common/goacc/classify-parallel.c: Likewise.
+	* c-c++-common/goacc/classify-routine.c: Likewise.
+	* c-c++-common/goacc/dtype-1.c: Likewise.
+	* c-c++-common/goacc/if-clause-2.c: Likewise.
+	* c-c++-common/goacc/kernels-conversion.c: Likewise.
+	* c-c++-common/goacc/kernels-decompose-1.c: Likewise.
+	* c-c++-common/goacc/loop-2-kernels.c: Likewise.
+	* c-c++-common/goacc/note-parallelism.c: Likewise.
+	* c-c++-common/goacc/routine-1.c: Likewise.
+	* c-c++-common/goacc/uninit-dim-clause.c: Likewise.
+	* gfortran.dg/goacc/dtype-1.f95: Likewise.
+	* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
+	* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
+	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
+
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 	    Gergö Barany  <gergo@codesourcery.com>
 
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
index 64467774037..61caa2df236 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
@@ -1,5 +1,5 @@
 /* Check offloaded function's attributes and classification for unparallelized
-   OpenACC kernels.  */
+   OpenACC 'kernels'.  */
 
 /* { dg-additional-options "-O2" }
    { dg-additional-options "-fopt-info-optimized-omp" }
@@ -13,14 +13,15 @@ extern unsigned int *__restrict a;
 extern unsigned int *__restrict b;
 extern unsigned int *__restrict c;
 
-/* An "extern"al mapping of loop iterations/array indices makes the loop
-   unparallelizable.  */
 extern unsigned int f (unsigned int);
+#pragma acc routine (f) seq
 
 void KERNELS ()
 {
 #pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
   for (unsigned int i = 0; i < N; i++)
+    /* An "extern"al mapping of loop iterations/array indices makes the loop
+       unparallelizable.  */
     c[i] = a[f (i)] + b[f (i)];
 }
 
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
index c59a65e1d0f..eb78c8f5e40 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
@@ -1,5 +1,5 @@
 /* Check offloaded function's attributes and classification for OpenACC
-   kernels.  */
+   'kernels'.  */
 
 /* { dg-additional-options "-O2" }
    { dg-additional-options "-fopt-info-optimized-omp" }
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-parallel.c b/gcc/testsuite/c-c++-common/goacc/classify-parallel.c
index b345c225aea..baa0bce004d 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-parallel.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-parallel.c
@@ -1,5 +1,5 @@
 /* Check offloaded function's attributes and classification for OpenACC
-   parallel.  */
+   'parallel'.  */
 
 /* { dg-additional-options "-O2" }
    { dg-additional-options "-fopt-info-optimized-omp" }
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-routine.c b/gcc/testsuite/c-c++-common/goacc/classify-routine.c
index 5ca2ec9c603..094c9a760fa 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-routine.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-routine.c
@@ -1,5 +1,5 @@
 /* Check offloaded function's attributes and classification for OpenACC
-   routine.  */
+   'routine'.  */
 
 /* { dg-additional-options "-O2" }
    { dg-additional-options "-fopt-info-optimized-omp" }
diff --git a/gcc/testsuite/c-c++-common/goacc/dtype-1.c b/gcc/testsuite/c-c++-common/goacc/dtype-1.c
index 6dd6ebd8ae1..ae3de574dfc 100644
--- a/gcc/testsuite/c-c++-common/goacc/dtype-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/dtype-1.c
@@ -96,11 +96,13 @@ test ()
 
 /* { dg-final { scan-tree-dump-times "oacc_parallel device_type\\(\\*\\) \\\[ wait\\(10\\) vector_length\\(10\\) num_workers\\(10\\) num_gangs\\(10\\) async\\(10\\) \\\] device_type\\(nvidia\\) \\\[ wait\\(3\\) vector_length\\(128\\) num_workers\\(300\\) num_gangs\\(300\\) async\\(3\\) \\\] wait\\(1\\) vector_length\\(1\\) num_workers\\(1\\) num_gangs\\(1\\) async\\(1\\)" 1 "omplower" } } */
 
-/* { dg-final { scan-tree-dump-times "oacc_kernels device_type\\(nvidia\\) \\\[ wait\\(-1\\) async\\(-1\\) \\\]" 1 "omplower" } } */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single async\\(-1\\) num_gangs\\(1\\) device_type\\(nvidia\\) \\\[ wait\\(-1\\) async\\(-1\\) \\\]" 1 "omplower" } } */
 
-/* { dg-final { scan-tree-dump-times "oacc_kernels device_type\\(nvidia\\) \\\[ wait\\(1\\) async\\(1\\) \\\] wait\\(-1\\) async\\(-1\\)" 1 "omplower" } } */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single num_gangs\\(1\\) device_type\\(nvidia\\) \\\[ wait\\(1\\) async\\(1\\) \\\] wait\\(-1\\) async\\(-1\\)" 1 "omplower" } } */
 
-/* { dg-final { scan-tree-dump-times "oacc_kernels device_type\\(\\*\\) \\\[ wait\\(0\\) async\\(0\\) \\\] device_type\\(nvidia\\) \\\[ wait\\(2\\) async\\(2\\) \\\] wait\\(-1\\) async\\(-1\\)" 1 "omplower" } } */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single num_gangs\\(1\\) device_type\\(\\*\\) \\\[ wait\\(0\\) async\\(0\\) \\\] device_type\\(nvidia\\) \\\[ wait\\(2\\) async\\(2\\) \\\] wait\\(-1\\) async\\(-1\\)" 1 "omplower" } } */
+
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single num_gangs\\(1\\) device_type\\(\\*\\) \\\[ wait\\(0\\) async\\(0\\) \\\] device_type\\(nvidia_ptx\\) \\\[ wait\\(1\\) async\\(1\\) \\\] wait\\(-1\\) async\\(-1\\)" 1 "omplower" } } */
 
 /* { dg-final { scan-tree-dump-times "acc loop device_type\\(nvidia\\) \\\[ tile\\(1\\) gang \\\] private\\(i1\\.0\\) private\\(i1\\)" 1 "omplower" } } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
index e17b5dd1107..9920b4fd175 100644
--- a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
@@ -1,4 +1,3 @@
-/* { dg-additional-options "-fopenacc-kernels=split" } */
 /* { dg-additional-options "-fdump-tree-convert_oacc_kernels" } */
 
 void
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
index ea7eec997fb..8cb63f00444 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
@@ -1,4 +1,3 @@
-/* { dg-additional-options "-fopenacc-kernels=split" } */
 /* { dg-additional-options "-fdump-tree-convert_oacc_kernels" } */
 
 #define N 1024
@@ -52,12 +51,11 @@ main (void)
    parallelized loop region; and three "old-style" kernel regions. */
 /* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 1 "convert_oacc_kernels" } } */
 /* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 1 "convert_oacc_kernels" } } */
-/* { dg-final { scan-tree-dump-times "oacc_kernels" 3 "convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "oacc_kernels " 3 "convert_oacc_kernels" } } */
 
 /* Each of the parallel regions is async, and there is a final call to
    __builtin_GOACC_wait.  */
-/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels.* async\(-1\)" 5 "convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "oacc_kernels async\\(-1\\)" 3 "convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single async\\(-1\\)" 1 "convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized async\\(-1\\)" 1 "convert_oacc_kernels" } } */
 /* { dg-final { scan-tree-dump-times "__builtin_GOACC_wait" 1 "convert_oacc_kernels" } } */
-
-/* Check that the original kernels region is removed.  */
-/* { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
index b5d58c37200..7255d692e34 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
@@ -1,6 +1,5 @@
 /* Test OpenACC 'kernels' construct decomposition.  */
 
-/* { dg-additional-options "-fopenacc-kernels=split" } */
 /* { dg-additional-options "-fopt-info-optimized-omp" } */
 /* { dg-additional-options "-O2" } for "parloops".  */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c b/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
index 2608c128b43..8b5c5e75c35 100644
--- a/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
+++ b/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
@@ -35,7 +35,7 @@ void K(void)
 	for (j = 0; j < 10; j++)
 	  { }
       }
-#pragma acc loop seq gang // { dg-error "'seq' overrides" }
+#pragma acc loop seq gang // { dg-error "'seq' overrides" "TODO" { xfail *-*-* } }
     for (i = 0; i < 10; i++)
       { }
 
@@ -61,7 +61,7 @@ void K(void)
 	for (j = 0; j < 10; j++)
 	  { }
       }
-#pragma acc loop seq worker // { dg-error "'seq' overrides" }
+#pragma acc loop seq worker // { dg-error "'seq' overrides" "TODO" { xfail *-*-* } }
     for (i = 0; i < 10; i++)
       { }
 #pragma acc loop gang worker
@@ -90,7 +90,7 @@ void K(void)
 	for (j = 1; j < 10; j++)
 	  { }
       }
-#pragma acc loop seq vector // { dg-error "'seq' overrides" }
+#pragma acc loop seq vector // { dg-error "'seq' overrides" "TODO" { xfail *-*-* } }
     for (i = 0; i < 10; i++)
       { }
 #pragma acc loop gang vector
@@ -103,7 +103,7 @@ void K(void)
 #pragma acc loop auto
     for (i = 0; i < 10; i++)
       { }
-#pragma acc loop seq auto // { dg-error "'seq' overrides" }
+#pragma acc loop seq auto // { dg-error "'seq' overrides" "TODO" { xfail *-*-* } }
     for (i = 0; i < 10; i++)
       { }
 #pragma acc loop gang auto
@@ -145,7 +145,7 @@ void K(void)
 #pragma acc kernels loop worker(num:5)
   for (i = 0; i < 10; i++)
     { }
-#pragma acc kernels loop seq worker // { dg-error "'seq' overrides" }
+#pragma acc kernels loop seq worker // { dg-error "'seq' overrides" "TODO" { xfail *-*-* } }
   for (i = 0; i < 10; i++)
     { }
 #pragma acc kernels loop gang worker
@@ -161,7 +161,7 @@ void K(void)
 #pragma acc kernels loop vector(length:5)
   for (i = 0; i < 10; i++)
     { }
-#pragma acc kernels loop seq vector // { dg-error "'seq' overrides" }
+#pragma acc kernels loop seq vector // { dg-error "'seq' overrides" "TODO" { xfail *-*-* } }
   for (i = 0; i < 10; i++)
     { }
 #pragma acc kernels loop gang vector
@@ -174,7 +174,7 @@ void K(void)
 #pragma acc kernels loop auto
   for (i = 0; i < 10; i++)
     { }
-#pragma acc kernels loop seq auto // { dg-error "'seq' overrides" }
+#pragma acc kernels loop seq auto // { dg-error "'seq' overrides" "TODO" { xfail *-*-* } }
   for (i = 0; i < 10; i++)
     { }
 #pragma acc kernels loop gang auto
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c
new file mode 100644
index 00000000000..a81d3559daf
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c
@@ -0,0 +1,129 @@
+/* Test the output of "-fopt-info-optimized-omp" for an OpenACC 'kernels'
+   construct containing conditionally executed 'loop' constructs with
+   'independent' or 'seq' clauses.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+extern int c;
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+  if (c) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+ {
+#pragma acc loop seq
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent worker
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent vector
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang vector
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang worker
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent worker vector
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang worker vector
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent worker
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent vector
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop independent
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc loop independent
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop seq
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop independent
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop seq
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop independent
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop seq
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c
new file mode 100644
index 00000000000..22ac5399a9d
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c
@@ -0,0 +1,126 @@
+/* Test the output of "-fopt-info-optimized-omp" for an OpenACC 'kernels'
+   construct containing 'loop' constructs with explicit or implicit 'auto'
+   clause.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels
+ /* Strangely indented to keep this similar to other test cases.  */
+ {
+#pragma acc loop /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto gang /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto worker /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto vector /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto gang vector /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto gang worker /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto worker vector /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto gang worker vector /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto gang /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto worker
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto vector
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop auto /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc loop auto /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop auto /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop auto /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c
new file mode 100644
index 00000000000..a436cd3f007
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c
@@ -0,0 +1,126 @@
+/* Test the output of "-fopt-info-optimized-omp" for an OpenACC 'kernels'
+   construct containing 'loop' constructs with 'independent' or 'seq'
+   clauses.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels
+ /* Strangely indented to keep this similar to other test cases.  */
+ {
+#pragma acc loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang /* { dg-message "note: assigned OpenACC gang loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent worker /* { dg-message "note: assigned OpenACC worker loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent vector /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang vector /* { dg-message "note: assigned OpenACC gang vector loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang worker /* { dg-message "note: assigned OpenACC gang worker loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent worker vector /* { dg-message "note: assigned OpenACC worker vector loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang worker vector /* { dg-message "note: assigned OpenACC gang worker vector loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang /* { dg-message "note: assigned OpenACC gang loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent worker /* { dg-message "note: assigned OpenACC worker loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent vector /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang vector loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang worker loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC worker loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang worker loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang worker loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang worker loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang vector loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c
new file mode 100644
index 00000000000..e8b994b5be0
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c
@@ -0,0 +1,47 @@
+/* Test the output of "-fopt-info-optimized-omp" for an OpenACC 'kernels'
+   construct containing loops.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ {
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+    ;
+
+  for (x = 0; x < 10; x++)
+    ;
+
+  for (x = 0; x < 10; x++)
+    for (y = 0; y < 10; y++)
+      for (z = 0; z < 10; z++)
+	;
+
+  for (x = 0; x < 10; x++)
+    ;
+
+  for (x = 0; x < 10; x++)
+    for (y = 0; y < 10; y++)
+      ;
+
+  for (x = 0; x < 10; x++)
+    for (y = 0; y < 10; y++)
+      for (z = 0; z < 10; z++)
+	;
+
+  for (x = 0; x < 10; x++)
+    for (y = 0; y < 10; y++)
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c
new file mode 100644
index 00000000000..8e40f6217be
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c
@@ -0,0 +1,82 @@
+/* Test the output of "-fopt-info-optimized-omp" for an OpenACC 'kernels'
+   construct containing straight-line code.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+#pragma acc routine gang
+extern int
+f_g (int);
+
+#pragma acc routine worker
+extern int
+f_w (int);
+
+#pragma acc routine vector
+extern int
+f_v (int);
+
+#pragma acc routine seq
+extern int
+f_s (int);
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels /* { dg-warning "region contains gang partitoned code but is not gang partitioned" } */
+  {
+    x = 0; /* { dg-message "note: beginning .gang-single. region in OpenACC .kernels. construct" } */
+    y = x < 10;
+    z = x++;
+    ;
+
+    y = 0;
+    z = y < 10;
+    x -= f_g (y++); /* { dg-message "note: assigned OpenACC gang worker vector loop parallelism" } */
+    ;
+
+    x = f_w (0); /* { dg-message "note: assigned OpenACC worker vector loop parallelism" } */
+    z = f_v (x < 10); /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+    y -= f_s (x++); /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+    ;
+
+    x = 0;
+    y = x < 10;
+    z = (x++);
+    y = 0;
+    x = y < 10;
+    z += (y++);
+    ;
+
+    x = 0;
+    y += f_s (x < 10); /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+    x++;
+    y = 0;
+    y += f_v (y < 10); /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+    y++;
+    z = 0;
+    y += f_w (z < 10); /* { dg-message "note: assigned OpenACC worker vector loop parallelism" } */
+    z++;
+    ;
+
+    x = 0;
+    y *= f_g ( /* { dg-message "note: assigned OpenACC gang worker vector loop parallelism" } */
+	      f_w (x < 10) /* { dg-message "note: assigned OpenACC worker vector loop parallelism" } */
+	      + f_g (x < 10) /* { dg-message "note: assigned OpenACC gang worker vector loop parallelism" } */
+	      );
+    x++;
+    y = 0;
+    y *= y < 10;
+    y++;
+    z = 0;
+    y *= z < 10;
+    z++;
+    ;
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c
new file mode 100644
index 00000000000..0254036d7af
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c
@@ -0,0 +1,121 @@
+/* Test the output of "-fopt-info-optimized-omp" for combined OpenACC 'kernels
+   loop' constructs with explicit or implicit 'auto' clause.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels loop /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop auto gang /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop auto worker /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop auto vector /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop auto gang vector /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop auto gang worker /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop auto worker vector /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop auto gang worker vector /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop auto gang /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto worker
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto vector
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop auto /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop auto /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc kernels loop auto /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop auto /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop auto /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop
+      for (z = 0; z < 10; z++)
+	;
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c
new file mode 100644
index 00000000000..83602a9414d
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c
@@ -0,0 +1,121 @@
+/* Test the output of "-fopt-info-optimized-omp" for combined OpenACC 'kernels
+   loop' constructs with 'independent' or 'seq' clauses.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop independent gang /* { dg-message "note: assigned OpenACC gang loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop independent worker /* { dg-message "note: assigned OpenACC worker loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop independent vector /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop independent gang vector /* { dg-message "note: assigned OpenACC gang vector loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop independent gang worker /* { dg-message "note: assigned OpenACC gang worker loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop independent worker vector /* { dg-message "note: assigned OpenACC worker vector loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop independent gang worker vector /* { dg-message "note: assigned OpenACC gang worker vector loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop independent gang /* { dg-message "note: assigned OpenACC gang loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent worker /* { dg-message "note: assigned OpenACC worker loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent vector /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop independent /* { dg-message "note: assigned OpenACC gang vector loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop independent /* { dg-message "note: assigned OpenACC gang worker loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc kernels loop independent /* { dg-message "note: assigned OpenACC gang loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC worker loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang worker loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop independent /* { dg-message "note: assigned OpenACC gang worker loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop independent /* { dg-message "note: assigned OpenACC gang worker loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang vector loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c
new file mode 100644
index 00000000000..e12e0fdae52
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c
@@ -0,0 +1,204 @@
+/* Test the output of "-fopt-info-optimized-omp" for OpenACC 'kernels'
+   constructs containing conditionally executed 'loop' constructs with
+   'independent' or 'seq' clauses.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+extern int c;
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+ {
+#pragma acc loop seq
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+ }
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+ {
+#pragma acc loop independent gang
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+ }
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+ {
+#pragma acc loop independent worker
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+ }
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+ {
+#pragma acc loop independent vector
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+ }
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+ {
+#pragma acc loop independent gang vector
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+ }
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+ {
+#pragma acc loop independent gang worker
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+ }
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+ {
+#pragma acc loop independent worker vector
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+ }
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+ {
+#pragma acc loop independent gang worker vector
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+ }
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+ {
+#pragma acc loop independent gang
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent worker
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent vector
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+ {
+#pragma acc loop independent
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+ }
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+ {
+#pragma acc loop independent
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+      ;
+ }
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+ {
+#pragma acc loop independent
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+ {
+#pragma acc loop seq
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+ {
+#pragma acc loop independent
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop seq
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+ {
+#pragma acc loop independent
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+ {
+#pragma acc loop seq
+  /* { dg-message "note: unparallelized loop nest in OpenACC .kernels. region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-auto.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-auto.c
new file mode 100644
index 00000000000..d52b2e860c2
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-auto.c
@@ -0,0 +1,138 @@
+/* Test the output of "-fopt-info-optimized-omp" for OpenACC 'kernels'
+   constructs containing 'loop' constructs with explicit or implicit 'auto'
+   clause.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels
+#pragma acc loop /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop auto gang /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop auto worker /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop auto vector /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop auto gang vector /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop auto gang worker /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop auto worker vector /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop auto gang worker vector /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop auto gang /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto worker
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto vector
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop auto /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop auto /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc kernels
+#pragma acc loop auto /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop auto /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop auto /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop
+      for (z = 0; z < 10; z++)
+	;
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c
new file mode 100644
index 00000000000..661f7122a2c
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c
@@ -0,0 +1,138 @@
+/* Test the output of "-fopt-info-optimized-omp" for OpenACC 'kernels'
+   constructs containing 'loop' constructs with 'independent' or 'seq'
+   clauses.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels
+#pragma acc loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop independent gang /* { dg-message "note: assigned OpenACC gang loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop independent worker /* { dg-message "note: assigned OpenACC worker loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop independent vector /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop independent gang vector /* { dg-message "note: assigned OpenACC gang vector loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop independent gang worker /* { dg-message "note: assigned OpenACC gang worker loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop independent worker vector /* { dg-message "note: assigned OpenACC worker vector loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop independent gang worker vector /* { dg-message "note: assigned OpenACC gang worker vector loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop independent gang /* { dg-message "note: assigned OpenACC gang loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent worker /* { dg-message "note: assigned OpenACC worker loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent vector /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang vector loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang worker loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc kernels
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC worker loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang worker loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang worker loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang worker loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC vector loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang vector loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c
new file mode 100644
index 00000000000..7587d9d2962
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c
@@ -0,0 +1,50 @@
+/* Test the output of "-fopt-info-optimized-omp" for an OpenACC 'kernels'
+   construct containing loops.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+    ;
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+    ;
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+    for (y = 0; y < 10; y++)
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+    ;
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+    for (y = 0; y < 10; y++)
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels /* { dg-message "note: assigned OpenACC seq loop parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "note: beginning .parloops. region in OpenACC .kernels. construct" } */
+    for (y = 0; y < 10; y++)
+      for (z = 0; z < 10; z++)
+	;
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism.c b/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
index 7548fb72d14..c514fae2574 100644
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
@@ -1,4 +1,5 @@
-/* Test the output of "-fopt-info-optimized-omp".  */
+/* Test the output of "-fopt-info-optimized-omp" for OpenACC 'parallel'
+   constructs.  */
 
 /* { dg-additional-options "-fopt-info-optimized-omp" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/routine-1.c b/gcc/testsuite/c-c++-common/goacc/routine-1.c
index 738957501b9..7cf5506e41a 100644
--- a/gcc/testsuite/c-c++-common/goacc/routine-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/routine-1.c
@@ -91,7 +91,7 @@ extern void nohost (void);
 
 int main ()
 {
-#pragma acc kernels num_gangs (32) num_workers (32) vector_length (32)
+#pragma acc kernels num_gangs (32) num_workers (32) vector_length (32) /* { dg-warning "region contains gang partitoned code but is not gang partitioned" } */
   {
     gang ();
     worker ();
diff --git a/gcc/testsuite/c-c++-common/goacc/uninit-dim-clause.c b/gcc/testsuite/c-c++-common/goacc/uninit-dim-clause.c
index 72aacd70f79..6bc35d77765 100644
--- a/gcc/testsuite/c-c++-common/goacc/uninit-dim-clause.c
+++ b/gcc/testsuite/c-c++-common/goacc/uninit-dim-clause.c
@@ -21,12 +21,12 @@ void acc_kernels()
 {
   int i, j, k;
 
-  #pragma acc kernels num_gangs(i) /* { dg-warning "is used uninitialized in this function" } */
+  #pragma acc kernels num_gangs(i) /* { dg-warning "is used uninitialized in this function" "TODO" { xfail *-*-* } } */
   ;
 
-  #pragma acc kernels num_workers(j) /* { dg-warning "is used uninitialized in this function" } */
+  #pragma acc kernels num_workers(j) /* { dg-warning "is used uninitialized in this function" "TODO" { xfail *-*-* } } */
   ;
 
-  #pragma acc kernels vector_length(k) /* { dg-warning "is used uninitialized in this function" } */
+  #pragma acc kernels vector_length(k) /* { dg-warning "is used uninitialized in this function" "TODO" { xfail *-*-* } } */
   ;
 }
diff --git a/gcc/testsuite/gfortran.dg/goacc/dtype-1.f95 b/gcc/testsuite/gfortran.dg/goacc/dtype-1.f95
index 460922a35df..b716d450abf 100644
--- a/gcc/testsuite/gfortran.dg/goacc/dtype-1.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/dtype-1.f95
@@ -175,13 +175,13 @@ end subroutine sr5b
 
 ! { dg-final { scan-tree-dump-times "oacc_parallel device_type\\(\\*\\) \\\[ async\\(10\\) wait\\(10\\) num_gangs\\(10\\) num_workers\\(10\\) vector_length\\(10\\) \\\] device_type\\(nvidia_ptx\\) \\\[ async\\(3\\) wait\\(3\\) num_gangs\\(300\\) num_workers\\(300\\) vector_length\\(128\\) \\\] async\\(1\\) wait\\(1\\) num_gangs\\(1\\) num_workers\\(1\\) vector_length\\(1\\)" 1 "omplower" } }
 
-! { dg-final { scan-tree-dump-times "oacc_kernels device_type\\(nvidia\\) \\\[ async\\(-1\\) wait\\(-1\\) \\\]" 1 "omplower" } }
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single async\\(-1\\) num_gangs\\(1\\) device_type\\(nvidia\\) \\\[ async\\(-1\\) wait\\(-1\\) \\\]" 1 "omplower" } }
 
-! { dg-final { scan-tree-dump-times "oacc_kernels device_type\\(nvidia\\) \\\[ async\\(1\\) wait\\(1\\) \\\]" 1 "omplower" } }
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single num_gangs\\(1\\) device_type\\(nvidia\\) \\\[ async\\(1\\) wait\\(1\\) \\\]" 1 "omplower" } }
 
-! { dg-final { scan-tree-dump-times "oacc_kernels device_type\\(\\*\\) \\\[ async\\(0\\) wait\\(0\\) \\\] device_type\\(nvidia\\) \\\[ async\\(2\\) wait\\(2\\) \\\]" 1 "omplower" } }
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single num_gangs\\(1\\) device_type\\(\\*\\) \\\[ async\\(0\\) wait\\(0\\) \\\] device_type\\(nvidia\\) \\\[ async\\(2\\) wait\\(2\\) \\\]" 1 "omplower" } }
 
-! { dg-final { scan-tree-dump-times "oacc_kernels device_type\\(\\*\\) \\\[ async\\(0\\) wait\\(0\\) \\\] device_type\\(nvidia_ptx\\) \\\[ async\\(1\\) wait\\(1\\) \\\] async\\(-1\\) wait\\(-1\\)" 1 "omplower" } }
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single num_gangs\\(1\\) device_type\\(\\*\\) \\\[ async\\(0\\) wait\\(0\\) \\\] device_type\\(nvidia_ptx\\) \\\[ async\\(1\\) wait\\(1\\) \\\] async\\(-1\\) wait\\(-1\\)" 1 "omplower" } }
 
 ! { dg-final { scan-tree-dump-times "acc loop device_type\\(nvidia\\) \\\[ tile\\(1\\) gang \\\] private\\(i1\\) private\\(i1\\.1\\)" 1 "omplower" } }
 
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
index 6604727cf13..4672d15572e 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
@@ -1,4 +1,3 @@
-! { dg-additional-options "-fopenacc-kernels=split" }
 ! { dg-additional-options "-fdump-tree-convert_oacc_kernels" }
 
 program main
@@ -50,9 +49,11 @@ end program main
 ! parallelized loop region; and three "old-style" kernel regions.
 ! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 1 "convert_oacc_kernels" } }
 ! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 1 "convert_oacc_kernels" } }
-! { dg-final { scan-tree-dump-times "oacc_kernels" 3 "convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_kernels " 3 "convert_oacc_kernels" } }
 
 ! Each of the parallel regions is async, and there is a final call to
 ! __builtin_GOACC_wait.
-! { dg-final { scan-tree-dump-times "oacc_parallel_kernels.* async\(-1\)" 5 "convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_kernels async\\(-1\\)" 3 "convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single async\\(-1\\)" 1 "convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized async\\(-1\\)" 1 "convert_oacc_kernels" } }
 ! { dg-final { scan-tree-dump-times "__builtin_GOACC_wait" 1 "convert_oacc_kernels" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
index 520bf034ac6..8173c3651e1 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
@@ -1,6 +1,5 @@
 ! Test OpenACC 'kernels' construct decomposition.
 
-! { dg-additional-options "-fopenacc-kernels=split" }
 ! { dg-additional-options "-fopt-info-optimized-omp" }
 ! { dg-additional-options "-O2" } for "parloops".
 
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
index b83ca2d8f06..bc9bebac969 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
@@ -1,6 +1,5 @@
 ! { dg-do compile } 
 ! { dg-additional-options "-fdump-tree-original" } 
-! { dg-additional-options "-fopenacc-kernels=split" }
 ! { dg-additional-options "-fdump-tree-convert_oacc_kernels" }
 
 program test
diff --git a/libgomp/ChangeLog.openacc b/libgomp/ChangeLog.openacc
index 27ce3434e49..96908d1a305 100644
--- a/libgomp/ChangeLog.openacc
+++ b/libgomp/ChangeLog.openacc
@@ -1,5 +1,21 @@
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c:
+	Update.
+	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/avoid-offloading-1.f: Likewise.
+	* testsuite/libgomp.oacc-fortran/avoid-offloading-2.f: Likewise.
+	* testsuite/libgomp.oacc-fortran/avoid-offloading-3.f: Likewise.
+	* testsuite/libgomp.oacc-fortran/initialize_kernels_loops.f90:
+	Likewise.
+
 	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: New
 	file.
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
index 24b5718ced6..1a5b5fbaa6d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
@@ -34,6 +34,7 @@ static int state = -1;
 static acc_device_t acc_device_type;
 static int acc_device_num;
 static int num_gangs, num_workers, vector_length;
+static int async;
 
 void cb_enqueue_launch_start (acc_prof_info *prof_info, acc_event_info *event_info, acc_api_info *api_info)
 {
@@ -50,7 +51,7 @@ void cb_enqueue_launch_start (acc_prof_info *prof_info, acc_event_info *event_in
   assert (prof_info->device_type == acc_device_type);
   assert (prof_info->device_number == acc_device_num);
   assert (prof_info->thread_id == -1);
-  assert (prof_info->async == acc_async_sync);
+  assert (prof_info->async == async);
   assert (prof_info->async_queue == prof_info->async);
   assert (prof_info->src_file == NULL);
   assert (prof_info->func_name == NULL);
@@ -142,8 +143,10 @@ int main()
   acc_device_num = acc_get_device_num (acc_device_type);
   assert (state == 0);
 
-  /* Parallelism dimensions: compiler/runtime decides.  */
   STATE_OP (state, = 0);
+  /* Implicit async.  */
+  async = acc_async_noval;
+  /* Parallelism dimensions: compiler/runtime decides.  */
   num_gangs = num_workers = vector_length = 0;
   {
 #define N 100
@@ -172,8 +175,10 @@ int main()
 #undef N
   }
 
-  /* Parallelism dimensions: literal.  */
   STATE_OP (state, = 0);
+  /* Explicit async: without argument.  */
+  async = acc_async_noval;
+  /* Parallelism dimensions: literal.  */
   num_gangs = 30;
   num_workers = 3;
   vector_length = 5;
@@ -181,6 +186,7 @@ int main()
 #define N 100
     int x[N];
 #pragma acc kernels \
+  async \
   num_gangs (30) num_workers (3) vector_length (5)
     /* { dg-prune-output "using vector_length \\(32\\), ignoring 5" } */
     {
@@ -206,8 +212,10 @@ int main()
 #undef N
   }
 
-  /* Parallelism dimensions: variable.  */
   STATE_OP (state, = 0);
+  /* Explicit async: variable.  */
+  async = 123;
+  /* Parallelism dimensions: variable.  */
   num_gangs = 22;
   num_workers = 5;
   vector_length = 7;
@@ -215,6 +223,7 @@ int main()
 #define N 100
     int x[N];
 #pragma acc kernels \
+  async (async) \
   num_gangs (num_gangs) num_workers (num_workers) vector_length (vector_length)
     /* { dg-prune-output "using vector_length \\(32\\), ignoring runtime setting" } */
     {
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
index 72b9ce0ce02..fb57faef054 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-1.c
@@ -1,5 +1,7 @@
 /* Test that the compiler decides to "avoid offloading".  */
 
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
 #include <openacc.h>
 
 int main(void)
@@ -7,8 +9,20 @@ int main(void)
   int x, y;
 
 #pragma acc data copyout(x, y)
-#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target { openacc_nvidia_accel_selected && opt_levels_2_plus } } } */
-  *((volatile int *) &x) = 33, y = acc_on_device (acc_device_host);
+#pragma acc kernels
+  *((volatile int *) &x) = 33, y = acc_on_device (acc_device_host); /* { dg-warning "note: beginning .gang-single. region in OpenACC .kernels. construct" } */
+
+  /* The following will trigger "avoid offloading".  */
+#pragma acc kernels
+  {
+#pragma acc loop auto /* { dg-warning "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" } */
+    /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target { openacc_nvidia_accel_selected && opt_levels_2_plus } } 18 } */
+    /* { dg-warning "note: assigned OpenACC seq loop parallelism" "" { target *-*-* } 18 } */
+    for (int i = 0; i < x; ++i)
+      if (x == 0)
+	x = 1;
+      ;
+  }
 
   if (x != 33)
     __builtin_abort();
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
index 9e05d84d792..00912a3d006 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-2.c
@@ -1,6 +1,8 @@
 /* Test that a user can override the compiler's "avoid offloading"
    decision.  */
 
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
 #include <openacc.h>
 
 int main(void)
@@ -19,8 +21,19 @@ int main(void)
   int x, y;
 
 #pragma acc data copyout(x, y)
-#pragma acc kernels /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target { openacc_nvidia_accel_selected && opt_levels_2_plus } } } */
-  *((volatile int *) &x) = 33, y = acc_on_device (acc_device_host);
+#pragma acc kernels
+  *((volatile int *) &x) = 33, y = acc_on_device (acc_device_host); /* { dg-warning "note: beginning .gang-single. region in OpenACC .kernels. construct" } */
+
+  /* The following will trigger "avoid offloading".  */
+#pragma acc kernels
+  {
+#pragma acc loop auto /* { dg-warning "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" } */
+    /* { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target { openacc_nvidia_accel_selected && opt_levels_2_plus } } 30 } */
+    /* { dg-warning "note: assigned OpenACC seq loop parallelism" "" { target *-*-* } 30 } */
+    for (int i = 0; i < x; ++i)
+      if (x == 0)
+	x = 1;
+  }
 
   if (x != 33)
     __builtin_abort();
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
index f186482026e..1bb6b103b4e 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/avoid-offloading-3.c
@@ -1,6 +1,8 @@
 /* Test that a user can override the compiler's "avoid offloading"
    decision.  */
 
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
 /* Override the compiler's "avoid offloading" decision.
    { dg-additional-options "-foffload-force" } */
 
@@ -12,7 +14,17 @@ int main(void)
 
 #pragma acc data copyout(x, y)
 #pragma acc kernels
-  *((volatile int *) &x) = 33, y = acc_on_device (acc_device_host);
+  *((volatile int *) &x) = 33, y = acc_on_device (acc_device_host); /* { dg-warning "note: beginning .gang-single. region in OpenACC .kernels. construct" } */
+
+  /* The following would trigger "avoid offloading".  */
+#pragma acc kernels
+  {
+#pragma acc loop auto /* { dg-warning "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" } */
+    /* { dg-warning "note: assigned OpenACC seq loop parallelism" "" { target *-*-* } 22 } */
+    for (int i = 0; i < x; ++i)
+      if (x == 0)
+	x = 1;
+  }
 
   if (x != 33)
     __builtin_abort();
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
index 601e543495e..bf58a139d3a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
@@ -1,4 +1,3 @@
-/* { dg-additional-options "-fopenacc-kernels=split" } */
 /* { dg-additional-options "-fopt-info-optimized-omp" } */
 
 #undef NDEBUG
@@ -15,7 +14,7 @@ int main()
     int c = 234; /* { dg-warning "note: beginning .gang-single. region in OpenACC .kernels. construct" } */
 
 #pragma acc loop independent gang /* { dg-warning "note: assigned OpenACC gang loop parallelism" } */
-    /* { dg-warning "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } 17 } */
+    /* { dg-warning "note: parallelized loop nest in OpenACC .kernels. construct" "" { target *-*-* } 16 } */
     for (int i = 0; i < N; ++i)
       b[i] = c;
 
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f b/libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
index fb14be19d8d..275c22f90a6 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
+++ b/libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-1.f
@@ -2,6 +2,7 @@
 
 ! { dg-do run }
 ! { dg-additional-options "-cpp" }
+! { dg-additional-options "-fopt-info-optimized-omp" }
 ! As __OPTIMIZE__ is defined for -O1 and higher, we don't have an (easy) way to
 ! distinguish -O1 (where we will offload) from -O2 (where we won't offload), so
 ! for -O1 testing, we expect to abort.
@@ -10,16 +11,29 @@
       IMPLICIT NONE
       INCLUDE "openacc_lib.h"
 
+      INTEGER :: I
       INTEGER, VOLATILE :: X
       LOGICAL :: Y
 
 !$ACC DATA COPYOUT(X, Y)
-!$ACC KERNELS ! { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target { openacc_nvidia_accel_selected && opt_levels_2_plus } } }
-      X = 33
+!$ACC KERNELS
+      X = 33 ! { dg-warning "note: beginning .gang-single. region in OpenACC .kernels. construct" }
       Y = ACC_ON_DEVICE (ACC_DEVICE_HOST);
 !$ACC END KERNELS
 !$ACC END DATA
 
+      ! The following will trigger "avoid offloading".
+!$ACC KERNELS
+!$ACC LOOP AUTO ! { dg-warning "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" }
+! { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target { openacc_nvidia_accel_selected && opt_levels_2_plus } } 27 }
+! { dg-warning "note: assigned OpenACC seq loop parallelism" "" { target *-*-* } 27 }
+      DO I = 1, X
+         IF (X .EQ. 0) THEN
+            X = 1
+         END IF
+      END DO
+!$ACC END KERNELS
+
       IF (X .NE. 33) CALL ABORT
 #if defined ACC_DEVICE_TYPE_nvidia
 # if !defined __OPTIMIZE__
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f b/libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
index 5a064618e51..a83e1739885 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
+++ b/libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-2.f
@@ -2,11 +2,13 @@
 
 ! { dg-do run }
 ! { dg-additional-options "-cpp" }
+! { dg-additional-options "-fopt-info-optimized-omp" }
 
       IMPLICIT NONE
       INCLUDE "openacc_lib.h"
 
       INTEGER :: D
+      INTEGER :: I
       INTEGER, VOLATILE :: X
       LOGICAL :: Y
 
@@ -21,12 +23,24 @@
       CALL ACC_INIT (D)
 
 !$ACC DATA COPYOUT(X, Y)
-!$ACC KERNELS ! { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target { openacc_nvidia_accel_selected && opt_levels_2_plus } } }
-      X = 33
+!$ACC KERNELS
+      X = 33 ! { dg-warning "note: beginning .gang-single. region in OpenACC .kernels. construct" }
       Y = ACC_ON_DEVICE (ACC_DEVICE_HOST)
 !$ACC END KERNELS
 !$ACC END DATA
 
+      ! The following will trigger "avoid offloading".
+!$ACC KERNELS
+!$ACC LOOP AUTO ! { dg-warning "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" }
+! { dg-warning "OpenACC kernels construct will be executed sequentially; will by default avoid offloading to prevent data copy penalty" "" { target { openacc_nvidia_accel_selected && opt_levels_2_plus } } 34 }
+! { dg-warning "note: assigned OpenACC seq loop parallelism" "" { target *-*-* } 34 }
+      DO I = 1, X
+         IF (X .EQ. 0) THEN
+            X = 1
+         END IF
+      END DO
+!$ACC END KERNELS
+
       IF (X .NE. 33) CALL ABORT
 #if defined ACC_DEVICE_TYPE_nvidia
       IF (Y) CALL ABORT
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f b/libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
index 1c09f83b099..f101186ca19 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
+++ b/libgomp/testsuite/libgomp.oacc-fortran/avoid-offloading-3.f
@@ -2,6 +2,7 @@
 
 ! { dg-do run }
 ! { dg-additional-options "-cpp" }
+! { dg-additional-options "-fopt-info-optimized-omp" }
 !     Override the compiler's "avoid offloading" decision.
 ! { dg-additional-options "-foffload-force" }
 
@@ -9,16 +10,28 @@
       INCLUDE "openacc_lib.h"
 
       INTEGER :: D
+      INTEGER :: I
       INTEGER, VOLATILE :: X
       LOGICAL :: Y
 
 !$ACC DATA COPYOUT(X, Y)
 !$ACC KERNELS
-      X = 33
+      X = 33 ! { dg-warning "note: beginning .gang-single. region in OpenACC .kernels. construct" }
       Y = ACC_ON_DEVICE (ACC_DEVICE_HOST)
 !$ACC END KERNELS
 !$ACC END DATA
 
+      ! The following would trigger "avoid offloading".
+!$ACC KERNELS
+!$ACC LOOP AUTO ! { dg-warning "note: forwarded loop nest in OpenACC .kernels. construct to .parloops. for analysis" }
+! { dg-warning "note: assigned OpenACC seq loop parallelism" "" { target *-*-* } 26 }
+      DO I = 1, X
+         IF (X .EQ. 0) THEN
+            X = 1
+         END IF
+      END DO
+!$ACC END KERNELS
+
       IF (X .NE. 33) CALL ABORT
 #if defined ACC_DEVICE_TYPE_nvidia
       IF (Y) CALL ABORT
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/initialize_kernels_loops.f90 b/libgomp/testsuite/libgomp.oacc-fortran/initialize_kernels_loops.f90
index 35e909f8278..e8b4f3abc05 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/initialize_kernels_loops.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/initialize_kernels_loops.f90
@@ -1,5 +1,4 @@
 ! { dg-do run }
-! { dg-additional-options "-fopenacc-kernels=split" }
 ! { dg-additional-options "-fopt-info-optimized-omp" }
 
 subroutine kernel(lo, hi, a, b, c)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [og8] OpenACC 'kernels' construct changes: splitting of the construct into several regions
  2019-02-01  0:00 [og8] OpenACC 'kernels' construct changes: splitting of the construct into several regions Thomas Schwinge
@ 2019-02-01 19:48 ` Thomas Schwinge
  2019-07-17 21:03 ` [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions Kwok Cheung Yeung
  2020-11-13 22:22 ` Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs (was: [og8] OpenACC 'kernels' construct changes: splitting of the construct into several regions) Thomas Schwinge
  2 siblings, 0 replies; 33+ messages in thread
From: Thomas Schwinge @ 2019-02-01 19:48 UTC (permalink / raw)
  To: gcc-patches; +Cc: Gergö Barany

[-- Attachment #1: Type: text/plain, Size: 1101 bytes --]

Hi!

On Fri, 01 Feb 2019 00:59:30 +0100, I wrote:
> From c7713be32fc5eace2b1e9c20447da849d23f6076 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?Gerg=C3=B6=20Barany?= <gergo@codesourcery.com>
> Date: Wed, 23 Jan 2019 22:11:11 -0800
> Subject: [PATCH 6/9] Adjust parallelism of loops in gang-single parts of
>  OpenACC kernels regions

>  transform_kernels_loop_clauses (gimple *omp_for,

> +  struct walk_stmt_info wi;
> +  memset (&wi, 0, sizeof (wi));
> +  tree *num_clauses[GOMP_DIM_MAX]
> +    = { [GOMP_DIM_GANG] = &loop_gang_clause,
> +        [GOMP_DIM_WORKER] = &loop_worker_clause,
> +        [GOMP_DIM_VECTOR] = &loop_vector_clause };
> +  wi.info = num_clauses;
> +  gimple *body = gimple_omp_body (omp_for);
> +  walk_gimple_seq (body, adjust_nested_loop_clauses, NULL, &wi);

It makes sense to me, but not to GCC 4.6 ;-) -- pushed to
openacc-gcc-8-branch the attached commit
5885db6f8466e13ddfab046bae3149a992a30926 'Adjust parallelism of loops in
gang-single parts of OpenACC kernels regions: "struct
adjust_nested_loop_clauses_wi_info"'.


Grüße
 Thomas



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Adjust-parallelism-of-loops-in-gang-single-parts-of-.patch --]
[-- Type: text/x-diff, Size: 4604 bytes --]

From 5885db6f8466e13ddfab046bae3149a992a30926 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Fri, 1 Feb 2019 18:12:05 +0100
Subject: [PATCH] Adjust parallelism of loops in gang-single parts of OpenACC
 kernels regions: "struct adjust_nested_loop_clauses_wi_info"

The current code apparently is too freaky at least for for GCC 4.6:

    [...]/gcc/omp-oacc-kernels.c: In function 'tree_node* transform_kernels_loop_clauses(gimple*, tree, tree, tree, tree)':
    [...]/gcc/omp-oacc-kernels.c:584:10: error: expected identifier before numeric constant
    [...]/gcc/omp-oacc-kernels.c: In lambda function:
    [...]/gcc/omp-oacc-kernels.c:584:25: error: expected '{' before '=' token
    [...]/gcc/omp-oacc-kernels.c: In function 'tree_node* transform_kernels_loop_clauses(gimple*, tree, tree, tree, tree)':
    [...]/gcc/omp-oacc-kernels.c:584:25: warning: lambda expressions only available with -std=c++0x or -std=gnu++0x [enabled by default]
    [...]/gcc/omp-oacc-kernels.c:584:28: error: no match for 'operator=' in '{} = & loop_gang_clause'
    [...]

	gcc/
	* omp-oacc-kernels.c (struct adjust_nested_loop_clauses_wi_info): New.
	(adjust_nested_loop_clauses, transform_kernels_loop_clauses): Use it.
---
 gcc/ChangeLog.openacc  |  5 +++++
 gcc/omp-oacc-kernels.c | 29 +++++++++++++++++------------
 2 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/gcc/ChangeLog.openacc b/gcc/ChangeLog.openacc
index 433653b2b38..a3472637729 100644
--- a/gcc/ChangeLog.openacc
+++ b/gcc/ChangeLog.openacc
@@ -1,3 +1,8 @@
+2019-02-01  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* omp-oacc-kernels.c (struct adjust_nested_loop_clauses_wi_info): New.
+	(adjust_nested_loop_clauses, transform_kernels_loop_clauses): Use it.
+
 2019-01-31  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* doc/invoke.texi (-fopenacc-kernels): Update.
diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c
index a8860c98e11..d1db4924b1c 100644
--- a/gcc/omp-oacc-kernels.c
+++ b/gcc/omp-oacc-kernels.c
@@ -409,14 +409,19 @@ add_parent_or_loop_num_clause (tree parent_clause, tree loop_clause,
    nested loops.  It adds an auto clause unless there is already an
    independent/seq/auto clause or a gang/worker/vector annotation.  */
 
+struct adjust_nested_loop_clauses_wi_info
+{
+  tree *loop_gang_clause_ptr;
+  tree *loop_worker_clause_ptr;
+  tree *loop_vector_clause_ptr;
+};
+
 static tree
 adjust_nested_loop_clauses (gimple_stmt_iterator *gsi_p, bool *,
                             struct walk_stmt_info *wi)
 {
-  tree **clauses = (tree **) wi->info;
-  tree *gang_num_clause = clauses[GOMP_DIM_GANG];
-  tree *worker_num_clause = clauses[GOMP_DIM_WORKER];
-  tree *vector_length_clause = clauses[GOMP_DIM_VECTOR];
+  struct adjust_nested_loop_clauses_wi_info *wi_info
+    = (struct adjust_nested_loop_clauses_wi_info *) wi->info;
   gimple *stmt = gsi_stmt (*gsi_p);
 
   if (gimple_code (stmt) == GIMPLE_OMP_FOR)
@@ -430,13 +435,13 @@ adjust_nested_loop_clauses (gimple_stmt_iterator *gsi_p, bool *,
           switch (OMP_CLAUSE_CODE (loop_clause))
             {
               case OMP_CLAUSE_GANG:
-                outer_clause_ptr = gang_num_clause;
+                outer_clause_ptr = wi_info->loop_gang_clause_ptr;
                 break;
               case OMP_CLAUSE_WORKER:
-                outer_clause_ptr = worker_num_clause;
+                outer_clause_ptr = wi_info->loop_worker_clause_ptr;
                 break;
               case OMP_CLAUSE_VECTOR:
-                outer_clause_ptr = vector_length_clause;
+                outer_clause_ptr = wi_info->loop_vector_clause_ptr;
                 break;
               case OMP_CLAUSE_INDEPENDENT:
               case OMP_CLAUSE_SEQ:
@@ -580,11 +585,11 @@ transform_kernels_loop_clauses (gimple *omp_for,
      Turn these into worker/vector annotations on the parallel region.  */
   struct walk_stmt_info wi;
   memset (&wi, 0, sizeof (wi));
-  tree *num_clauses[GOMP_DIM_MAX]
-    = { [GOMP_DIM_GANG] = &loop_gang_clause,
-        [GOMP_DIM_WORKER] = &loop_worker_clause,
-        [GOMP_DIM_VECTOR] = &loop_vector_clause };
-  wi.info = num_clauses;
+  struct adjust_nested_loop_clauses_wi_info wi_info;
+  wi_info.loop_gang_clause_ptr = &loop_gang_clause;
+  wi_info.loop_worker_clause_ptr = &loop_worker_clause;
+  wi_info.loop_vector_clause_ptr = &loop_vector_clause;
+  wi.info = &wi_info;
   gimple *body = gimple_omp_body (omp_for);
   walk_gimple_seq (body, adjust_nested_loop_clauses, NULL, &wi);
   /* Check if there were conflicting numbers of workers or vector lanes.  */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions
@ 2019-07-17 21:03 ` Kwok Cheung Yeung
  2019-07-17 21:04   ` [PATCH 01/10, OpenACC] Use "-fopenacc-kernels=parloops" to document "parloops" test cases Kwok Cheung Yeung
                     ` (10 more replies)
  0 siblings, 11 replies; 33+ messages in thread
From: Kwok Cheung Yeung @ 2019-07-17 21:03 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Thomas Schwinge

This series of patches reworks the way that OpenACC kernels regions are 
processed by GCC. Instead of relying on the parloops pass for 
auto-parallelisation of the kernel region, the contents of the region are 
transformed into a sequence of offloaded regions, which are then processed 
individually.

Tested on an x86_64 host, with offloading to a Nvidia Tesla K20c card.

Okay for trunk?

Thanks

Kwok

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/10, OpenACC] Use "-fopenacc-kernels=parloops" to document "parloops" test cases
  2019-07-17 21:03 ` [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions Kwok Cheung Yeung
@ 2019-07-17 21:04   ` Kwok Cheung Yeung
  2019-07-17 21:05   ` [PATCH 02/10, OpenACC] Add OpenACC target kinds for decomposed kernels regions Kwok Cheung Yeung
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: Kwok Cheung Yeung @ 2019-07-17 21:04 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Thomas Schwinge

This patch introduces a new option "-fopenacc-kernels" to control how OpenACC 
kernels are processed. The current behaviour will be equivalent to 
'-fopenacc-kernels=parloops'.

2019-07-16  Thomas Schwinge  <thomas@codesourcery.com>

	gcc/
	* flag-types.h (enum openacc_kernels): New type.

	gcc/c-family/
	* c.opt (fopenacc-kernels): New flag.

	gcc/fortran/
	* lang.opt (fopenacc-kernels): New flag.

	gcc/testsuite/
	* c-c++-common/goacc/kernels-1.c: Add
	"-fopenacc-kernels=parloops".
	* c-c++-common/goacc/kernels-alias-2.c: Likewise.
	* c-c++-common/goacc/kernels-alias-3.c: Likewise.
	* c-c++-common/goacc/kernels-alias-4.c: Likewise.
	* c-c++-common/goacc/kernels-alias-5.c: Likewise.
	* c-c++-common/goacc/kernels-alias-6.c: Likewise.
	* c-c++-common/goacc/kernels-alias-7.c: Likewise.
	* c-c++-common/goacc/kernels-alias-8.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-2.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-3.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta-4.c: Likewise.
	* c-c++-common/goacc/kernels-alias-ipa-pta.c: Likewise.
	* c-c++-common/goacc/kernels-alias.c: Likewise.
	* c-c++-common/goacc/kernels-counter-var-redundant-load.c:
	Likewise.
	* c-c++-common/goacc/kernels-counter-vars-function-scope.c:
	Likewise.
	* c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
	* c-c++-common/goacc/kernels-double-reduction.c: Likewise.
	* c-c++-common/goacc/kernels-loop-2.c: Likewise.
	* c-c++-common/goacc/kernels-loop-3.c: Likewise.
	* c-c++-common/goacc/kernels-loop-data-2.c: Likewise.
	* c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Likewise.
	* c-c++-common/goacc/kernels-loop-data-enter-exit.c: Likewise.
	* c-c++-common/goacc/kernels-loop-data-update.c: Likewise.
	* c-c++-common/goacc/kernels-loop-data.c: Likewise.
	* c-c++-common/goacc/kernels-loop-g.c: Likewise.
	* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
	* c-c++-common/goacc/kernels-loop-n.c: Likewise.
	* c-c++-common/goacc/kernels-loop-nest.c: Likewise.
	* c-c++-common/goacc/kernels-loop.c: Likewise.
	* c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
	* c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c:
	Likewise.
	* c-c++-common/goacc/kernels-reduction.c: Likewise.
	* gfortran.dg/goacc/kernels-alias-2.f95: Likewise.
	* gfortran.dg/goacc/kernels-alias-3.f95: Likewise.
	* gfortran.dg/goacc/kernels-alias-4.f95: Likewise.
	* gfortran.dg/goacc/kernels-alias.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-2.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-data.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-n.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop.f95: Likewise.
	* gfortran.dg/goacc/kernels-loops-adjacent.f95: Likewise.
	* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95:
	Likewise.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c:
	Add "-fopenacc-kernels=parloops".
	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-empty.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop-data.f95: Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-loop.f95: Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95:
	Likewise.
	* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90:
	Likewise.
---
  gcc/c-family/c.opt                                               | 9 +++++++++
  gcc/flag-types.h                                                 | 6 ++++++
  gcc/fortran/lang.opt                                             | 3 +++
  gcc/testsuite/c-c++-common/goacc/kernels-1.c                     | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-alias-2.c               | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-alias-3.c               | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-alias-4.c               | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-alias-5.c               | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-alias-6.c               | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-alias-7.c               | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c               | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-2.c       | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-3.c       | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-4.c       | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta.c         | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-alias.c                 | 2 ++
  .../c-c++-common/goacc/kernels-counter-var-redundant-load.c      | 2 ++
  .../c-c++-common/goacc/kernels-counter-vars-function-scope.c     | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c    | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c      | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c                | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c                | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c           | 2 ++
  .../c-c++-common/goacc/kernels-loop-data-enter-exit-2.c          | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c  | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c      | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c             | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c                | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c     | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c                | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c             | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-loop.c                  | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c       | 2 ++
  .../c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c   | 2 ++
  gcc/testsuite/c-c++-common/goacc/kernels-reduction.c             | 2 ++
  gcc/testsuite/gfortran.dg/goacc/kernels-alias-2.f95              | 2 ++
  gcc/testsuite/gfortran.dg/goacc/kernels-alias-3.f95              | 2 ++
  gcc/testsuite/gfortran.dg/goacc/kernels-alias-4.f95              | 2 ++
  gcc/testsuite/gfortran.dg/goacc/kernels-alias.f95                | 2 ++
  gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95               | 2 ++
  gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95          | 2 ++
  .../gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95         | 2 ++
  gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95 | 2 ++
  gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95     | 2 ++
  gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95            | 2 ++
  gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95           | 2 ++
  gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95               | 2 ++
  gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95                 | 2 ++
  gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95       | 2 ++
  .../gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95  | 2 ++
  libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c        | 3 +++
  .../libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c          | 2 ++
  .../libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c          | 2 ++
  .../testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c  | 2 ++
  libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c      | 3 +++
  libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c     | 3 +++
  libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c     | 3 +++
  .../testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c | 3 +++
  .../testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c | 3 +++
  .../testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c | 3 +++
  .../testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c | 3 +++
  .../testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c | 3 +++
  .../testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c   | 3 +++
  .../testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c  | 3 +++
  .../testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c    | 3 +++
  .../libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c   | 3 +++
  .../libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c     | 3 +++
  .../libgomp.oacc-c-c++-common/kernels-loop-data-update.c         | 3 +++
  libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c  | 3 +++
  libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c     | 2 ++
  .../libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c        | 3 +++
  libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c     | 3 +++
  libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c  | 3 +++
  libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c       | 3 +++
  .../kernels-parallel-loop-data-enter-exit.c                      | 3 +++
  .../testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c    | 3 +++
  libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c  | 3 +++
  libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-2.f95        | 2 ++
  libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95   | 2 ++
  .../libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95      | 2 ++
  .../libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95        | 2 ++
  .../testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95  | 2 ++
  libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data.f95     | 2 ++
  libgomp/testsuite/libgomp.oacc-fortran/kernels-loop.f95          | 2 ++
  .../kernels-parallel-loop-data-enter-exit.f95                    | 2 ++
  libgomp/testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90   | 2 ++
  86 files changed, 207 insertions(+)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 4c8b002..4bdacb6 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1688,6 +1688,15 @@ fopenacc-dim=
  C ObjC C++ ObjC++ LTO Joined Var(flag_openacc_dims)
  Specify default OpenACC compute dimensions.

+fopenacc-kernels=
+C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) 
Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS) Undocumented
+
+Enum
+Name(openacc_kernels) Type(enum openacc_kernels)
+
+EnumValue
+Enum(openacc_kernels) String(parloops) Value(OPENACC_KERNELS_PARLOOPS)
+
  fopenmp
  C ObjC C++ ObjC++ LTO Var(flag_openmp)
  Enable OpenMP (implies -frecursive in Fortran).
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index a210328..24a80858 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -354,4 +354,10 @@ enum cf_protection_level
    CF_FULL = CF_BRANCH | CF_RETURN,
    CF_SET = 1 << 2
  };
+
+/* OpenACC 'kernels' constructs handling.  */
+enum openacc_kernels
+{
+  OPENACC_KERNELS_PARLOOPS
+};
  #endif /* ! GCC_FLAG_TYPES_H */
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index 88674cb..73e88fd 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -662,6 +662,9 @@ fopenacc-dim=
  Fortran LTO Joined Var(flag_openacc_dims)
  ; Documented in C

+fopenacc-kernels=
+Fortran RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) 
Init(OPENACC_KERNELS_PARLOOPS) Undocumented
+
  fopenmp
  Fortran LTO
  ; Documented in C
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-1.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-1.c
index 016abbd..ba3169a 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-1.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-fopt-info-optimized-omp" } */

  int
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-2.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-alias-2.c
index 7576a64..57f1e08 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-2.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-ealias-all" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-3.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-alias-3.c
index 2934f12..fa8afe2 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-3.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-3.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-ealias-all" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-4.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-alias-4.c
index f6ee5b5..e5c264a 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-4.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-4.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-ealias-all" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-5.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-alias-5.c
index 74425fb..9fb3189 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-5.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-5.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-ealias-all" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-6.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-alias-6.c
index 908e1ca..015bded 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-6.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-6.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-ealias-all" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-7.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-alias-7.c
index 923d002..0c828cc 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-7.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-7.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-ealias-all" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c
index 69200cc..902e71d 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-ealias-all" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-2.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-2.c
index f16d698..7a57477c 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-2.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fipa-pta -fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-3.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-3.c
index 1ea0e73..879141d 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-3.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-3.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fipa-pta -fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-4.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-4.c
index 20b21dc..41f90158 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-4.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta-4.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fipa-pta -fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta.c
index 969b466..e587f96 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias-ipa-pta.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fipa-pta -fdump-tree-optimized" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-alias.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-alias.c
index e8ff018..bbc0508 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-alias.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-alias.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-ealias-all" } */

diff --git 
a/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c
index 0304254..dbb04d3 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-counter-var-redundant-load.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-dom3" } */

diff --git 
a/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
index c475333..f40de67 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
  /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
index 8f7f415..d456925 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction-n.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fopt-info-optimized-omp" } */
  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
index c11d36f..caab7c8 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-double-reduction.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fopt-info-optimized-omp" } */
  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
index acef6a1..238956c 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
  /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
index 75e2bb7..2bbb071 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
  /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c
index 7180021..e7830b6 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
  /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c
index 0c9f833..b5c2670 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
  /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c
index 0bd21b6..84f92a9 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
  /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c
index dd5a841..dbdce45 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
  /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c
index a658182..23f4e22 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
  /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
index 73b469d..2cbd6be 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-g" } */
  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
index 5592623..28480ae 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
  /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
index e86be1b..26bc3e0 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
  /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
index 2b0e186..b3fdde1 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
  /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-loop.c
index 9619d53..d0423a8 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
  /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
index 69539b2..15a8d37 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
  /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git 
a/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c
index 81b0fee..457a79a 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
  /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
index 5921b88..7603988 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-reduction.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-O2" } */
  /* { dg-additional-options "-fdump-tree-parloops1-all" } */
  /* { dg-additional-options "-fdump-tree-optimized" } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-alias-2.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-alias-2.f95
index 6a9f241..f365896 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-alias-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-alias-2.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
  ! { dg-additional-options "-O2" }
  ! { dg-additional-options "-fdump-tree-ealias-all" }

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-alias-3.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-alias-3.f95
index 07dc8d6..cd5e539 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-alias-3.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-alias-3.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
  ! { dg-additional-options "-O2" }
  ! { dg-additional-options "-fdump-tree-ealias-all" }

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-alias-4.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-alias-4.f95
index 415eb96..0cda06e 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-alias-4.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-alias-4.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
  ! { dg-additional-options "-O2" }
  ! { dg-additional-options "-fdump-tree-ealias-all" }

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-alias.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-alias.f95
index 62f9a71..d8dcd37 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-alias.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-alias.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
  ! { dg-additional-options "-O2" }
  ! { dg-additional-options "-fdump-tree-ealias-all" }

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
index ef53324..59001e4 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
  ! { dg-additional-options "-O2" }
  ! { dg-additional-options "-fdump-tree-parloops1-all" }
  ! { dg-additional-options "-fdump-tree-optimized" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
index 2f1dcd6..b6f50cb 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
  ! { dg-additional-options "-O2" }
  ! { dg-additional-options "-fdump-tree-parloops1-all" }
  ! { dg-additional-options "-fdump-tree-optimized" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
index 447e85d6..779073a 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
  ! { dg-additional-options "-O2" }
  ! { dg-additional-options "-fdump-tree-parloops1-all" }
  ! { dg-additional-options "-fdump-tree-optimized" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
index 4edb288..30ae2cb 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
  ! { dg-additional-options "-O2" }
  ! { dg-additional-options "-fdump-tree-parloops1-all" }
  ! { dg-additional-options "-fdump-tree-optimized" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
index fc113e1..b68945a 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
  ! { dg-additional-options "-O2" }
  ! { dg-additional-options "-fdump-tree-parloops1-all" }
  ! { dg-additional-options "-fdump-tree-optimized" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
index 94522f5..f5c6688 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
  ! { dg-additional-options "-O2" }
  ! { dg-additional-options "-fdump-tree-parloops1-all" }
  ! { dg-additional-options "-fdump-tree-optimized" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
index a3ad591..18509b2 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-inner.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
  ! { dg-additional-options "-O2" }
  ! { dg-additional-options "-fopt-info-optimized-omp" }

diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95
index b9c4aea..4c43b11 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
  ! { dg-additional-options "-O2" }
  ! { dg-additional-options "-fdump-tree-parloops1-all" }
  ! { dg-additional-options "-fdump-tree-optimized" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
index 6dc7b2e..4da7040 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
  ! { dg-additional-options "-O2" }
  ! { dg-additional-options "-fdump-tree-parloops1-all" }
  ! { dg-additional-options "-fdump-tree-optimized" }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95
index fb92da8..a83ff95 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loops-adjacent.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
  ! { dg-additional-options "-O2" }

  program main
diff --git 
a/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95
index 48c20b9..1586e64 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95
@@ -1,3 +1,5 @@
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.
  ! { dg-additional-options "-O2" }
  ! { dg-additional-options "-fdump-tree-parloops1-all" }
  ! { dg-additional-options "-fdump-tree-optimized" }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
index 0f51bad..e24381d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
@@ -1,5 +1,8 @@
  /* Verify OpenACC 'declare' with VLAs.  */

+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <assert.h>


diff --git 
a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c
index e8d65df..e40d5ab 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-fipa-pta" } */

  #include <stdlib.h>
diff --git 
a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c
index dd8ca87..e2cf3d7 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-fipa-pta" } */

  #include <stdlib.h>
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c
index 50e7dc1..5f89d48 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-fipa-pta" } */

  #include <stdlib.h>
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c
index a68a7cd..8ee0da7 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-empty.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  int
  main (void)
  {
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
index b840888..4aeeed1 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N (1024 * 512)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
index 31114ac..9cbace1 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N (1024 * 512)
diff --git 
a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
index d36592f..cae8439 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N 32
diff --git 
a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
index e622971..d53e393 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N 32
diff --git 
a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
index c731278..7435c85 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N 32
diff --git 
a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
index 67dcce2..32f47fb 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N 32
diff --git 
a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
index b8b5dde..a310de7 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N 32
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
index 9d9308a..e00177e 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N 32
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
index 997d6c7..15e2dc1 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N 100
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c
index 607c350..337ad91 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N (1024 * 512)
diff --git 
a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c
index 8b9dd5f..214dd7e 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N (1024 * 512)
diff --git 
a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c
index 5d5da6f..7d097da 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N (1024 * 512)
diff --git 
a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c
index c111c8f..661cb28 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N (1024 * 512)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c
index 947bcda..2f4f699 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N (1024 * 512)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
index 88258be..e5a556b 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
  /* { dg-additional-options "-g" } */

  #include "kernels-loop.c"
diff --git 
a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
index 147ebb5..eeb318e 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N ((1024 * 512) + 1)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
index 9a3eaca..eeccc1d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N ((1024 * 512) + 1)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c
index 28c725a..c59c47e 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N 1000
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
index 355123c..36eabb9 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N (1024 * 512)
diff --git 
a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c
index ebcc6e1..fe30318 100644
--- 
a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c
+++ 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N (1024 * 512)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c
index 95f1b77..3d02d53 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c
@@ -1,6 +1,9 @@
  /* Verify that a simple, explicit acc loop reduction works inside
   a kernels region.  */

+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define N 100
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c
index 8647a94..e67340c 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+   specifically testing "parloops" handling.  */
+
  #include <stdlib.h>

  #define n 10000
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-2.f95 
b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-2.f95
index 4b69e81..deb9070 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-2.f95
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-2.f95
@@ -1,4 +1,6 @@
  ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.

  program main
    implicit none
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95 
b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95
index 4008743..c718584 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95
@@ -1,4 +1,6 @@
  ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.

  program main
    implicit none
diff --git 
a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95 
b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95
index 11ae17c..ade470a 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95
@@ -1,4 +1,6 @@
  ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.

  program main
    implicit none
diff --git 
a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95 
b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95
index 4fdb862..8eb5a83 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95
@@ -1,4 +1,6 @@
  ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.

  program main
    implicit none
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95 
b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95
index 4bee0e1..3352b76 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95
@@ -1,4 +1,6 @@
  ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.

  program main
    implicit none
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data.f95 
b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data.f95
index 307e433..6697eaf 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data.f95
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop-data.f95
@@ -1,4 +1,6 @@
  ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.

  program main
    implicit none
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop.f95 
b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop.f95
index 0090f43..8ae247f 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop.f95
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-loop.f95
@@ -1,4 +1,6 @@
  ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.

  program main
    implicit none
diff --git 
a/libgomp/testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95 
b/libgomp/testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95
index fe1088c..18daccc 100644
--- 
a/libgomp/testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95
+++ 
b/libgomp/testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95
@@ -1,4 +1,6 @@
  ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.

  program main
    implicit none
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90 
b/libgomp/testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90
index c7a52ed..fe6986d 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90
@@ -1,6 +1,8 @@
  ! Test a simple acc loop reduction inside a kernels region.

  ! { dg-do run }
+! { dg-additional-options "-fopenacc-kernels=parloops" } as this is
+! specifically testing "parloops" handling.

  program reduction
    integer, parameter     :: n = 20
-- 
2.8.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 02/10, OpenACC] Add OpenACC target kinds for decomposed kernels regions
  2019-07-17 21:03 ` [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions Kwok Cheung Yeung
  2019-07-17 21:04   ` [PATCH 01/10, OpenACC] Use "-fopenacc-kernels=parloops" to document "parloops" test cases Kwok Cheung Yeung
@ 2019-07-17 21:05   ` Kwok Cheung Yeung
  2019-07-18  9:28     ` Jakub Jelinek
  2019-07-17 21:06   ` [PATCH 03/10, OpenACC] Separate OpenACC kernels regions in data and parallel parts Kwok Cheung Yeung
                     ` (8 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: Kwok Cheung Yeung @ 2019-07-17 21:05 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Thomas Schwinge

This patch is in preparation for changes that will cut up OpenACC kernels 
regions into individual parts. For the new sub-regions that will be generated, 
this adds the following new kinds of OpenACC regions for internal use:

- GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED for parts of kernels 
regions to be executed in gang-redundant mode
- GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE for parts of kernels 
regions to be executed in gang-single mode
- GF_OMP_TARGET_KIND_OACC_DATA_KERNELS for data regions generated around the 
body of a kernels region

2019-07-16  Thomas Schwinge  <thomas@codesourcery.com>

	gcc/
	* gimple.h (enum gf_mask): Add new target kinds
	GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED,
	GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE, and
	GF_OMP_TARGET_KIND_OACC_DATA_KERNELS.
	(is_gimple_omp_oacc): Handle new target kinds.
	(is_gimple_omp_offloaded): Likewise.
	* gimple-pretty-print.c (dump_gimple_omp_target): Likewise.
	* omp-expand.c (expand_omp_target): Likewise.
	(build_omp_regions_1): Likewise.
	(omp_make_gimple_edges): Likewise.
	* omp-low.c (is_oacc_parallel_or_serial): Likewise.
	(was_originally_oacc_kernels): New function.
	(scan_omp_for): Update check for illegal nesting.
	(check_omp_nesting_restrictions): Handle new target kinds.
	(lower_oacc_reductions): Likewise.
	(lower_omp_target): Likewise.
	* omp-offload.c (execute_oacc_device_lower): Likewise.
---
  gcc/gimple-pretty-print.c |  9 +++++++++
  gcc/gimple.h              | 14 +++++++++++++
  gcc/omp-expand.c          | 34 ++++++++++++++++++++++++++++----
  gcc/omp-low.c             | 50 ++++++++++++++++++++++++++++++++++++++++++-----
  gcc/omp-offload.c         | 20 +++++++++++++++++++
  5 files changed, 118 insertions(+), 9 deletions(-)

diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index ce339ee..cf4d0e0 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -1691,6 +1691,15 @@ dump_gimple_omp_target (pretty_printer *buffer, 
gomp_target *gs,
      case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
        kind = " oacc_host_data";
        break;
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+      kind = " oacc_parallel_kernels_parallelized";
+      break;
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+      kind = " oacc_parallel_kernels_gang_single";
+      break;
+    case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
+      kind = " oacc_data_kernels";
+      break;
      default:
        gcc_unreachable ();
      }
diff --git a/gcc/gimple.h b/gcc/gimple.h
index 47070e7..d8423be 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -184,6 +184,15 @@ enum gf_mask {
      GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA = 9,
      GF_OMP_TARGET_KIND_OACC_DECLARE = 10,
      GF_OMP_TARGET_KIND_OACC_HOST_DATA = 11,
+    /* A GF_OMP_TARGET_KIND_OACC_PARALLEL that originates from a 'kernels'
+       construct, parallelized.  */
+    GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED = 12,
+    /* A GF_OMP_TARGET_KIND_OACC_PARALLEL that originates from a 'kernels'
+       construct, "gang-single".  */
+    GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE = 13,
+    /* A GF_OMP_TARGET_KIND_OACC_DATA that originates from a 'kernels'
+       construct.  */
+    GF_OMP_TARGET_KIND_OACC_DATA_KERNELS = 14,
      GF_OMP_TEAMS_GRID_PHONY	= 1 << 0,
      GF_OMP_TEAMS_HOST		= 1 << 1,

@@ -6479,6 +6488,9 @@ is_gimple_omp_oacc (const gimple *stmt)
  	case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
  	case GF_OMP_TARGET_KIND_OACC_DECLARE:
  	case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+	case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
  	  return true;
  	default:
  	  return false;
@@ -6503,6 +6515,8 @@ is_gimple_omp_offloaded (const gimple *stmt)
  	case GF_OMP_TARGET_KIND_REGION:
  	case GF_OMP_TARGET_KIND_OACC_PARALLEL:
  	case GF_OMP_TARGET_KIND_OACC_KERNELS:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
  	  return true;
  	default:
  	  return false;
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index c007ec1..7e4d5a8 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -7914,6 +7914,8 @@ expand_omp_target (struct omp_region *region)
      case GF_OMP_TARGET_KIND_ENTER_DATA:
      case GF_OMP_TARGET_KIND_EXIT_DATA:
      case GF_OMP_TARGET_KIND_OACC_PARALLEL:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
      case GF_OMP_TARGET_KIND_OACC_KERNELS:
      case GF_OMP_TARGET_KIND_OACC_UPDATE:
      case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
@@ -7923,6 +7925,7 @@ expand_omp_target (struct omp_region *region)
      case GF_OMP_TARGET_KIND_DATA:
      case GF_OMP_TARGET_KIND_OACC_DATA:
      case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+    case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
        data_region = true;
        break;
      default:
@@ -7945,16 +7948,30 @@ expand_omp_target (struct omp_region *region)
    entry_bb = region->entry;
    exit_bb = region->exit;

-  if (gimple_omp_target_kind (entry_stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS)
+  /* Further down, all OpenACC compute constructs will be mapped to
+     BUILT_IN_GOACC_PARALLEL, and to distinguish between them, we now attach
+     attributes.  */
+  switch (gimple_omp_target_kind (entry_stmt))
      {
+    case GF_OMP_TARGET_KIND_OACC_KERNELS:
        mark_loops_in_oacc_kernels_region (region->entry, region->exit);

-      /* Further down, both OpenACC kernels and OpenACC parallel constructs
-	 will be mappted to BUILT_IN_GOACC_PARALLEL, and to distinguish the
-	 two, there is an "oacc kernels" attribute set for OpenACC kernels.  */
        DECL_ATTRIBUTES (child_fn)
  	= tree_cons (get_identifier ("oacc kernels"),
  		     NULL_TREE, DECL_ATTRIBUTES (child_fn));
+      break;
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+      DECL_ATTRIBUTES (child_fn)
+	= tree_cons (get_identifier ("oacc parallel_kernels_parallelized"),
+		     NULL_TREE, DECL_ATTRIBUTES (child_fn));
+      break;
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+      DECL_ATTRIBUTES (child_fn)
+	= tree_cons (get_identifier ("oacc parallel_kernels_gang_single"),
+		     NULL_TREE, DECL_ATTRIBUTES (child_fn));
+      break;
+    default:
+      break;
      }

    if (offloaded)
@@ -8159,10 +8176,13 @@ expand_omp_target (struct omp_region *region)
        break;
      case GF_OMP_TARGET_KIND_OACC_KERNELS:
      case GF_OMP_TARGET_KIND_OACC_PARALLEL:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
        start_ix = BUILT_IN_GOACC_PARALLEL;
        break;
      case GF_OMP_TARGET_KIND_OACC_DATA:
      case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+    case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
        start_ix = BUILT_IN_GOACC_DATA_START;
        break;
      case GF_OMP_TARGET_KIND_OACC_UPDATE:
@@ -8916,6 +8936,9 @@ build_omp_regions_1 (basic_block bb, struct omp_region 
*parent,
  		case GF_OMP_TARGET_KIND_OACC_KERNELS:
  		case GF_OMP_TARGET_KIND_OACC_DATA:
  		case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+		case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+		case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+		case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
  		  break;
  		case GF_OMP_TARGET_KIND_UPDATE:
  		case GF_OMP_TARGET_KIND_ENTER_DATA:
@@ -9170,6 +9193,9 @@ omp_make_gimple_edges (basic_block bb, struct omp_region 
**region,
  	case GF_OMP_TARGET_KIND_OACC_KERNELS:
  	case GF_OMP_TARGET_KIND_OACC_DATA:
  	case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+	case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
  	  break;
  	case GF_OMP_TARGET_KIND_UPDATE:
  	case GF_OMP_TARGET_KIND_ENTER_DATA:
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index a855c5b..623da18 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -178,8 +178,12 @@ is_oacc_parallel (omp_context *ctx)
  {
    enum gimple_code outer_type = gimple_code (ctx->stmt);
    return ((outer_type == GIMPLE_OMP_TARGET)
-	  && (gimple_omp_target_kind (ctx->stmt)
-	      == GF_OMP_TARGET_KIND_OACC_PARALLEL));
+	  && ((gimple_omp_target_kind (ctx->stmt)
+	       == GF_OMP_TARGET_KIND_OACC_PARALLEL)
+	      || (gimple_omp_target_kind (ctx->stmt)
+		  == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED)
+	      || (gimple_omp_target_kind (ctx->stmt)
+		  == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE)));
  }

  /* Return true if CTX corresponds to an oacc kernels region.  */
@@ -193,6 +197,22 @@ is_oacc_kernels (omp_context *ctx)
  	      == GF_OMP_TARGET_KIND_OACC_KERNELS));
  }

+/* Return true if CTX corresponds to an oacc region that was generated from
+   an original kernels region that has been lowered to parallel regions.  */
+
+static bool
+was_originally_oacc_kernels (omp_context *ctx)
+{
+  enum gimple_code outer_type = gimple_code (ctx->stmt);
+  return ((outer_type == GIMPLE_OMP_TARGET)
+	  && ((gimple_omp_target_kind (ctx->stmt)
+	       == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED)
+	      || (gimple_omp_target_kind (ctx->stmt)
+		  == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE)
+	      || (gimple_omp_target_kind (ctx->stmt)
+		  == GF_OMP_TARGET_KIND_OACC_DATA_KERNELS)));
+}
+
  /* If DECL is the artificial dummy VAR_DECL created for non-static
     data member privatization, return the underlying "this" parameter,
     otherwise return NULL.  */
@@ -2319,7 +2339,8 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
      {
        omp_context *tgt = enclosing_target_ctx (outer_ctx);

-      if (!tgt || is_oacc_parallel (tgt))
+      if (!tgt || (is_oacc_parallel (tgt)
+                    && !was_originally_oacc_kernels (tgt)))
  	for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
  	  {
  	    char const *check = NULL;
@@ -2752,6 +2773,8 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context 
*ctx)
  		  {
  		  case GF_OMP_TARGET_KIND_OACC_PARALLEL:
  		  case GF_OMP_TARGET_KIND_OACC_KERNELS:
+		  case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+		  case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
  		    ok = true;
  		    break;

@@ -3207,6 +3230,11 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context 
*ctx)
  	    case GF_OMP_TARGET_KIND_OACC_DECLARE: stmt_name = "declare"; break;
  	    case GF_OMP_TARGET_KIND_OACC_HOST_DATA: stmt_name = "host_data";
  	      break;
+	    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+	    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+	    case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
+	      /* These three cases arise from kernels conversion.  */
+	      stmt_name = "kernels"; break;
  	    default: gcc_unreachable ();
  	    }
  	  switch (gimple_omp_target_kind (ctx->stmt))
@@ -3220,6 +3248,11 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context 
*ctx)
  	    case GF_OMP_TARGET_KIND_OACC_DATA: ctx_stmt_name = "data"; break;
  	    case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
  	      ctx_stmt_name = "host_data"; break;
+	    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+	    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+	    case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
+	      /* These three cases arise from kernels conversion.  */
+	      ctx_stmt_name = "kernels"; break;
  	    default: gcc_unreachable ();
  	    }

@@ -6375,8 +6408,12 @@ lower_oacc_reductions (location_t loc, tree clauses, tree 
level, bool inner,
  		    break;

  		  case GIMPLE_OMP_TARGET:
-		    if (gimple_omp_target_kind (probe->stmt)
-			!= GF_OMP_TARGET_KIND_OACC_PARALLEL)
+		    if ((gimple_omp_target_kind (probe->stmt)
+			 != GF_OMP_TARGET_KIND_OACC_PARALLEL)
+			&& (gimple_omp_target_kind (probe->stmt)
+			    != GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED)
+			&& (gimple_omp_target_kind (probe->stmt)
+			    != GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE))
  		      goto do_lookup;

  		    cls = gimple_omp_target_clauses (probe->stmt);
@@ -11027,11 +11064,14 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, 
omp_context *ctx)
      case GF_OMP_TARGET_KIND_OACC_UPDATE:
      case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
      case GF_OMP_TARGET_KIND_OACC_DECLARE:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
        data_region = false;
        break;
      case GF_OMP_TARGET_KIND_DATA:
      case GF_OMP_TARGET_KIND_OACC_DATA:
      case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+    case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
        data_region = true;
        break;
      default:
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index da788d9..4ebfa83 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1525,6 +1525,20 @@ execute_oacc_device_lower ()
    bool is_oacc_kernels_parallelized
      = (lookup_attribute ("oacc kernels parallelized",
  			 DECL_ATTRIBUTES (current_function_decl)) != NULL);
+  if (is_oacc_kernels_parallelized)
+    gcc_checking_assert (is_oacc_kernels);
+  bool is_oacc_parallel_kernels_parallelized
+    = (lookup_attribute ("oacc parallel_kernels_parallelized",
+			 DECL_ATTRIBUTES (current_function_decl)) != NULL);
+  if (is_oacc_parallel_kernels_parallelized)
+    gcc_checking_assert (!is_oacc_kernels);
+  bool is_oacc_parallel_kernels_gang_single
+    = (lookup_attribute ("oacc parallel_kernels_gang_single",
+			 DECL_ATTRIBUTES (current_function_decl)) != NULL);
+  if (is_oacc_parallel_kernels_gang_single)
+    gcc_checking_assert (!is_oacc_kernels);
+  gcc_checking_assert (!(is_oacc_parallel_kernels_parallelized
+			 && is_oacc_parallel_kernels_gang_single));

    /* Unparallelized OpenACC kernels constructs must get launched as 1 x 1 x 1
       kernels, so remove the parallelism dimensions function attributes
@@ -1548,6 +1562,12 @@ execute_oacc_device_lower ()
  	fprintf (dump_file, "Function is %s OpenACC kernels offload\n",
  		 (is_oacc_kernels_parallelized
  		  ? "parallelized" : "unparallelized"));
+      else if (is_oacc_parallel_kernels_parallelized)
+	fprintf (dump_file, "Function is %s OpenACC kernels offload\n",
+		 "parallel_kernels_parallelized");
+      else if (is_oacc_parallel_kernels_gang_single)
+	fprintf (dump_file, "Function is %s OpenACC kernels offload\n",
+		 "parallel_kernels_gang_single");
        else
  	fprintf (dump_file, "Function is OpenACC parallel offload\n");
      }
-- 
2.8.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 03/10, OpenACC] Separate OpenACC kernels regions in data and parallel parts
  2019-07-17 21:03 ` [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions Kwok Cheung Yeung
  2019-07-17 21:04   ` [PATCH 01/10, OpenACC] Use "-fopenacc-kernels=parloops" to document "parloops" test cases Kwok Cheung Yeung
  2019-07-17 21:05   ` [PATCH 02/10, OpenACC] Add OpenACC target kinds for decomposed kernels regions Kwok Cheung Yeung
@ 2019-07-17 21:06   ` Kwok Cheung Yeung
  2019-07-17 21:11   ` [PATCH 04/10, OpenACC] Turn OpenACC kernels regions into a sequence of, parallel regions Kwok Cheung Yeung
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: Kwok Cheung Yeung @ 2019-07-17 21:06 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Thomas Schwinge

In the future, kernels regions will be transformed into data regions containing 
a sequence of serial and parallel offloaded regions. This first patch sets up a 
new pass that is responsible for this transformation, and in a first step 
constructs the new data region containing a parallel region with the original 
kernels region's body.

2019-07-16  Gergö Barany  <gergo@codesourcery.com>

	gcc/
	* Makefile.in: Add...
	* omp-oacc-kernels.c: ... this new file for the kernels conversion
	pass.
	* flag-types.h (enum openacc_kernels): Add "split" style.  Adjust
	all users.
	* doc/invoke.texi (-fopenacc-kernels): Update.
	* passes.def: Add pass_convert_oacc_kernels to pipeline.
	* tree-pass.h (make_pass_convert_oacc_kernels): Add declaration.

	gcc/c-family/
	* c.opt (fopenacc-kernels): Document.  Add 'split' option.

	gcc/fortran/
	* lang.opt (fopenacc-kernels): Document.

	gcc/testsuite/
	* c-c++-common/goacc/kernels-conversion.c: New test.
	* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
	* c-c++-common/goacc/if-clause-2.c: Update.
	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
---
  gcc/Makefile.in                                    |   2 +
  gcc/c-family/c.opt                                 |   6 +-
  gcc/doc/invoke.texi                                |  13 +-
  gcc/flag-types.h                                   |   1 +
  gcc/fortran/lang.opt                               |   3 +-
  gcc/omp-oacc-kernels.c                             | 245 +++++++++++++++++++++
  gcc/passes.def                                     |   1 +
  gcc/testsuite/c-c++-common/goacc/if-clause-2.c     |   7 +
  .../c-c++-common/goacc/kernels-conversion.c        |  36 +++
  .../gfortran.dg/goacc/kernels-conversion.f95       |  33 +++
  gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95   |   6 +
  gcc/tree-pass.h                                    |   1 +
  12 files changed, 351 insertions(+), 3 deletions(-)
  create mode 100644 gcc/omp-oacc-kernels.c
  create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
  create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 597dc01..82537f6 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1432,6 +1432,7 @@ OBJS = \
  	omp-general.o \
  	omp-grid.o \
  	omp-low.o \
+	omp-oacc-kernels.o \
  	omp-simd-clone.o \
  	opt-problem.o \
  	optabs.o \
@@ -2560,6 +2561,7 @@ GTFILES = $(CPPLIB_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
    $(srcdir)/omp-offload.c \
    $(srcdir)/omp-expand.c \
    $(srcdir)/omp-low.c \
+  $(srcdir)/omp-oacc-kernels.c \
    $(srcdir)/targhooks.c $(out_file) $(srcdir)/passes.c $(srcdir)/cgraphunit.c \
    $(srcdir)/cgraphclones.c \
    $(srcdir)/tree-phinodes.c \
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 4bdacb6..a193875 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1689,12 +1689,16 @@ C ObjC C++ ObjC++ LTO Joined Var(flag_openacc_dims)
  Specify default OpenACC compute dimensions.

  fopenacc-kernels=
-C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) 
Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS) Undocumented
+C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) 
Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS)
+-fopenacc-kernels=[split|parloops]	Configure OpenACC 'kernels' constructs handling.

  Enum
  Name(openacc_kernels) Type(enum openacc_kernels)

  EnumValue
+Enum(openacc_kernels) String(split) Value(OPENACC_KERNELS_SPLIT)
+
+EnumValue
  Enum(openacc_kernels) String(parloops) Value(OPENACC_KERNELS_PARLOOPS)

  fopenmp
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 0c20cb6..ec98ab6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -198,7 +198,7 @@ in the following sections.
  -aux-info @var{filename}  -fallow-parameterless-variadic-functions @gol
  -fno-asm  -fno-builtin  -fno-builtin-@var{function}  -fgimple@gol
  -fhosted  -ffreestanding @gol
--fopenacc  -fopenacc-dim=@var{geom} @gol
+-fopenacc  -fopenacc-dim=@var{geom}  -fopenacc-kernels=@var{style} @gol
  -fopenmp  -fopenmp-simd @gol
  -fms-extensions  -fplan9-extensions  -fsso-struct=@var{endianness} @gol
  -fallow-single-precision  -fcond-mismatch  -flax-vector-conversions @gol
@@ -2193,6 +2193,17 @@ not explicitly specify.  The @var{geom} value is a triple of
  ':'-separated sizes, in order 'gang', 'worker' and, 'vector'.  A size
  can be omitted, to use a target-specific default value.

+@item -fopenacc-kernels=@var{style}
+@opindex fopenacc-kernels
+@cindex OpenACC accelerator programming
+Configure OpenACC 'kernels' constructs handling.
+With @option{-fopenacc-kernels=split}, OpenACC 'kernels' constructs
+are split into a sequence of compute constructs, each then handled
+individually.
+With @option{-fopenacc-kernels=parloops}, the whole OpenACC
+'kernels' constructs is handled by the @samp{parloops} pass.
+This is the default.
+
  @item -fopenmp
  @opindex fopenmp
  @cindex OpenMP parallel
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 24a80858..ce32607 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -358,6 +358,7 @@ enum cf_protection_level
  /* OpenACC 'kernels' constructs handling.  */
  enum openacc_kernels
  {
+  OPENACC_KERNELS_SPLIT,
    OPENACC_KERNELS_PARLOOPS
  };
  #endif /* ! GCC_FLAG_TYPES_H */
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index 73e88fd..e7e277a 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -663,7 +663,8 @@ Fortran LTO Joined Var(flag_openacc_dims)
  ; Documented in C

  fopenacc-kernels=
-Fortran RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) 
Init(OPENACC_KERNELS_PARLOOPS) Undocumented
+Fortran RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) 
Init(OPENACC_KERNELS_PARLOOPS)
+; Documented in C

  fopenmp
  Fortran LTO
diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c
new file mode 100644
index 0000000..d180377
--- /dev/null
+++ b/gcc/omp-oacc-kernels.c
@@ -0,0 +1,245 @@
+/* Transformation pass for OpenACC kernels regions.  Converts a kernels
+   region into a series of smaller parallel regions.  There is a parallel
+   region for each parallelizable loop nest, as well as a "gang-single"
+   parallel region for each non-parallelizable piece of code.
+
+   Contributed by Gergö Barany <gergo@codesourcery.com> and
+                  Thomas Schwinge <thomas@codesourcery.com>
+
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "target.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "cgraph.h"
+#include "fold-const.h"
+#include "gimplify.h"
+#include "gimple-iterator.h"
+#include "gimple-walk.h"
+#include "gomp-constants.h"
+
+/* This is a preprocessing pass to be run immediately before lower_omp.  It
+   will convert OpenACC "kernels" regions into sequences of "parallel"
+   regions.
+   For now, the translation is as follows:
+   - The entire kernels region is turned into a data region with clauses
+     taken from the kernels region.  New "create" clauses are added for all
+     variables declared at the top level in the kernels region.  */
+
+/* Transform KERNELS_REGION, which is an OpenACC kernels region, into a data
+   region containing the original kernels region.  */
+
+static gimple *
+transform_kernels_region (gimple *kernels_region)
+{
+  gcc_checking_assert (gimple_omp_target_kind (kernels_region)
+                        == GF_OMP_TARGET_KIND_OACC_KERNELS);
+
+  /* Collect the kernels region's data clauses and create the new data
+     region with those clauses.  */
+  tree kernels_clauses = gimple_omp_target_clauses (kernels_region);
+  tree data_clauses = NULL;
+  for (tree c = kernels_clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      /* Certain map clauses are copied to the enclosing data region.  Any
+         non-data clause remains on the kernels region.  */
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP)
+        {
+          tree decl = OMP_CLAUSE_DECL (c);
+          HOST_WIDE_INT kind = OMP_CLAUSE_MAP_KIND (c);
+          switch (kind)
+            {
+            default:
+              if (kind == GOMP_MAP_ALLOC &&
+                  integer_zerop (OMP_CLAUSE_SIZE (c)))
+                /* ??? This is an alloc clause for mapping a pointer whose
+                   target is already mapped.  We leave these on the inner
+                   parallel regions because moving them to the outer data
+                   region causes runtime errors.  */
+                break;
+
+              /* For non-artificial variables, and for non-declaration
+                 expressions like A[0:n], copy the clause to the data
+                 region.  */
+              if ((DECL_P (decl) && !DECL_ARTIFICIAL (decl))
+                  || !DECL_P (decl))
+                {
+                  tree new_clause = build_omp_clause (OMP_CLAUSE_LOCATION (c),
+                                                      OMP_CLAUSE_MAP);
+                  OMP_CLAUSE_SET_MAP_KIND (new_clause, kind);
+                  /* This must be unshared here to avoid "incorrect sharing
+                     of tree nodes" errors from verify_gimple.  */
+                  OMP_CLAUSE_DECL (new_clause) = unshare_expr (decl);
+                  OMP_CLAUSE_SIZE (new_clause) = OMP_CLAUSE_SIZE (c);
+                  OMP_CLAUSE_CHAIN (new_clause) = data_clauses;
+                  data_clauses = new_clause;
+
+                  /* Now that this data is mapped, the inner data clause on
+                     the kernels region can become a present clause.  */
+                  OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_FORCE_PRESENT);
+                }
+              break;
+
+            case GOMP_MAP_POINTER:
+            case GOMP_MAP_TO_PSET:
+            case GOMP_MAP_FORCE_TOFROM:
+            case GOMP_MAP_FIRSTPRIVATE_POINTER:
+            case GOMP_MAP_FIRSTPRIVATE_REFERENCE:
+              /* ??? Copying these map kinds leads to internal compiler
+                 errors in later passes.  */
+              break;
+            }
+        }
+      else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IF)
+        {
+          /* If there is an if clause, it must also be present on the
+             enclosing data region.  Temporarily remove the if clause's
+             chain to avoid copying it.  */
+          tree saved_chain = OMP_CLAUSE_CHAIN (c);
+          OMP_CLAUSE_CHAIN (c) = NULL;
+          tree new_if_clause = unshare_expr (c);
+          OMP_CLAUSE_CHAIN (c) = saved_chain;
+          OMP_CLAUSE_CHAIN (new_if_clause) = data_clauses;
+          data_clauses = new_if_clause;
+        }
+    }
+  /* Restore the original order of the clauses.  */
+  data_clauses = nreverse (data_clauses);
+
+  gimple *data_region
+    = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_DATA_KERNELS,
+                               data_clauses);
+  gimple_set_location (data_region, gimple_location (kernels_region));
+
+  /* For now, just construct a new parallel region inside the data region.  */
+  gimple *inner_region
+    = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_PARALLEL,
+                               kernels_clauses);
+  gimple_set_location (inner_region, gimple_location (kernels_region));
+  gimple_omp_set_body (inner_region, gimple_omp_body (kernels_region));
+
+  gbind *bind = gimple_build_bind (NULL, NULL, NULL);
+  gimple_bind_add_stmt (bind, inner_region);
+
+  /* Put the transformed pieces together.  The entire body of the region is
+     wrapped in a try-finally statement that calls __builtin_GOACC_data_end
+     for cleanup.  */
+  tree data_end_fn = builtin_decl_explicit (BUILT_IN_GOACC_DATA_END);
+  gimple *call = gimple_build_call (data_end_fn, 0);
+  gimple_seq cleanup = NULL;
+  gimple_seq_add_stmt (&cleanup, call);
+  gimple *try_stmt = gimple_build_try (bind, cleanup, GIMPLE_TRY_FINALLY);
+  gimple_omp_set_body (data_region, try_stmt);
+
+  return data_region;
+}
+
+/* Helper function of convert_oacc_kernels for walking the tree, calling
+   transform_kernels_region on each kernels region found.  */
+
+static tree
+scan_kernels (gimple_stmt_iterator *gsi_p, bool *handled_ops_p,
+              struct walk_stmt_info *)
+{
+  gimple *stmt = gsi_stmt (*gsi_p);
+  *handled_ops_p = false;
+
+  int kind;
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_OMP_TARGET:
+      kind = gimple_omp_target_kind (stmt);
+      if (kind == GF_OMP_TARGET_KIND_OACC_KERNELS)
+        {
+          gimple *new_region = transform_kernels_region (stmt);
+          gsi_replace (gsi_p, new_region, false);
+          *handled_ops_p = true;
+        }
+      break;
+
+    default:
+      break;
+    }
+
+  return NULL;
+}
+
+/* Find and transform OpenACC kernels regions in the current function.  */
+
+static unsigned int
+convert_oacc_kernels (void)
+{
+  struct walk_stmt_info wi;
+  gimple_seq body = gimple_body (current_function_decl);
+
+  memset (&wi, 0, sizeof (wi));
+  walk_gimple_seq_mod (&body, scan_kernels, NULL, &wi);
+
+  gimple_set_body (current_function_decl, body);
+
+  return 0;
+}
+
+namespace {
+
+const pass_data pass_data_convert_oacc_kernels =
+{
+  GIMPLE_PASS, /* type */
+  "convert_oacc_kernels", /* name */
+  OPTGROUP_OMP, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_gimple_any, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_convert_oacc_kernels : public gimple_opt_pass
+{
+public:
+  pass_convert_oacc_kernels (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_convert_oacc_kernels, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+  {
+    return (flag_openacc
+	    && flag_openacc_kernels == OPENACC_KERNELS_SPLIT);
+  }
+  virtual unsigned int execute (function *)
+  {
+    return convert_oacc_kernels ();
+  }
+
+}; // class pass_convert_oacc_kernels
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_convert_oacc_kernels (gcc::context *ctxt)
+{
+  return new pass_convert_oacc_kernels (ctxt);
+}
diff --git a/gcc/passes.def b/gcc/passes.def
index 1a7fd14..7cee52b 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
    NEXT_PASS (pass_warn_unused_result);
    NEXT_PASS (pass_diagnose_omp_blocks);
    NEXT_PASS (pass_diagnose_tm_blocks);
+  NEXT_PASS (pass_convert_oacc_kernels);
    NEXT_PASS (pass_lower_omp);
    NEXT_PASS (pass_lower_cf);
    NEXT_PASS (pass_lower_tm);
diff --git a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c 
b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
index 5ab8459..e17b5dd 100644
--- a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=split" } */
+/* { dg-additional-options "-fdump-tree-convert_oacc_kernels" } */
+
  void
  f (short c)
  {
@@ -9,3 +12,7 @@ f (short c)
    ;
  #pragma acc update device(c) if(c)
  }
+
+/* Verify that the 'if' clause gets duplicated.
+   { dg-final { scan-tree-dump-times "#pragma omp target oacc_data_kernels 
if\\(" 1 "convert_oacc_kernels" } }
+   { dg-final { scan-tree-dump-times "#pragma omp target 
oacc_parallel_kernels_gang_single .* if\\(" 1 "convert_oacc_kernels" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
new file mode 100644
index 0000000..c75db37
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
@@ -0,0 +1,36 @@
+/* { dg-additional-options "-fopenacc-kernels=split" } */
+/* { dg-additional-options "-fdump-tree-convert_oacc_kernels" } */
+
+#define N 1024
+
+unsigned int a[N];
+
+int
+main (void)
+{
+  int i;
+  unsigned int sum = 1;
+
+#pragma acc kernels copyin(a[0:N]) copy(sum)
+  {
+    #pragma acc loop
+    for (i = 0; i < N; ++i)
+      sum += a[i];
+
+    sum++;
+
+    #pragma acc loop
+    for (i = 0; i < N; ++i)
+      sum += a[i];
+  }
+
+  return 0;
+}
+
+/* Check that the kernels region is split into a data region and an enclosed
+   parallel region.  */
+/* { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 
"convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "oacc_parallel" 1 "convert_oacc_kernels" } 
} */
+
+/* Check that the original kernels region is removed.  */
+/* { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
new file mode 100644
index 0000000..8c66330
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
@@ -0,0 +1,33 @@
+! { dg-additional-options "-fopenacc-kernels=split" }
+! { dg-additional-options "-fdump-tree-convert_oacc_kernels" }
+
+program main
+  implicit none
+  integer, parameter         :: N = 1024
+  integer, dimension (1:N)   :: a
+  integer                    :: i, sum
+
+  !$acc kernels copyin(a(1:N)) copy(sum)
+
+  !$acc loop
+  do i = 1, N
+    sum = sum + a(i)
+  end do
+
+  sum = sum + 1
+
+  !$acc loop
+  do i = 1, N
+    sum = sum + a(i)
+  end do
+
+  !$acc end kernels
+end program main
+
+! Check that the kernels region is split into a data region and an enclosed
+! parallel region.
+! { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 
"convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_parallel" 1 "convert_oacc_kernels" } }
+
+! Check that the original kernels region is removed.
+! { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
index a70f1e7..b83ca2d 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
@@ -1,5 +1,7 @@
  ! { dg-do compile }
  ! { dg-additional-options "-fdump-tree-original" }
+! { dg-additional-options "-fopenacc-kernels=split" }
+! { dg-additional-options "-fdump-tree-convert_oacc_kernels" }

  program test
    implicit none
@@ -33,3 +35,7 @@ end program test
  ! { dg-final { scan-tree-dump-times "map\\(alloc:t\\)" 1 "original" } }

  ! { dg-final { scan-tree-dump-times "map\\(force_deviceptr:u\\)" 1 "original" } }
+
+! Verify that the 'if' clause gets duplicated.
+! { dg-final { scan-tree-dump-times "#pragma omp target oacc_data_kernels 
if\\(" 1 "convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "#pragma omp target 
oacc_parallel_kernels_gang_single .* if\\(" 1 "convert_oacc_kernels" } }
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 1c8df3d..5fd8c2c 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -412,6 +412,7 @@ extern gimple_opt_pass *make_pass_lower_switch_O0 
(gcc::context *ctxt);
  extern gimple_opt_pass *make_pass_lower_vector (gcc::context *ctxt);
  extern gimple_opt_pass *make_pass_lower_vector_ssa (gcc::context *ctxt);
  extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_convert_oacc_kernels (gcc::context *ctxt);
  extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
  extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
  extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
-- 
2.8.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 04/10, OpenACC] Turn OpenACC kernels regions into a sequence of, parallel regions
  2019-07-17 21:03 ` [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions Kwok Cheung Yeung
                     ` (2 preceding siblings ...)
  2019-07-17 21:06   ` [PATCH 03/10, OpenACC] Separate OpenACC kernels regions in data and parallel parts Kwok Cheung Yeung
@ 2019-07-17 21:11   ` Kwok Cheung Yeung
  2019-07-18 10:09     ` Jakub Jelinek
  2019-07-17 21:12   ` [PATCH 05/10, OpenACC] Handle conditional execution of loops in OpenACC, kernels regions Kwok Cheung Yeung
                     ` (6 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: Kwok Cheung Yeung @ 2019-07-17 21:11 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Thomas Schwinge

This patch decomposes each OpenACC kernels region into a sequence of
parallel regions. Each OpenACC loop nest turns into its own region; any code 
between such loop nests is gathered up into a region as well. The loop regions 
can be distributed across gangs if the original kernels region had a num_gangs 
clause, while the other regions are executed in "gang-single" mode. The implied 
default "auto" clause on kernels loops is made explicit unless there is a 
conflicting clause.

2019-07-16  Gergö Barany  <gergo@codesourcery.com>

	gcc/
	* omp-oacc-kernels.c (top_level_omp_for_in_stmt): New function.
	(make_gang_single_region): Likewise.
	(transform_kernels_loop_clauses, make_gang_parallel_loop_region):
	Likewise.
	(flatten_binds): Likewise.
	(make_data_region_try_statement): Likewise.
	(maybe_build_inner_data_region): Likewise.
	(decompose_kernels_region_body): Likewise.
	(transform_kernels_region): Delegate to decompose_kernels_region_body
	and make_data_region_try_statement.

	gcc/testsuite/
	* c-c++-common/goacc/kernels-conversion.c: Test for a gang-single
	region.
	* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
---
  gcc/omp-oacc-kernels.c                             | 558 ++++++++++++++++++++-
  .../c-c++-common/goacc/kernels-conversion.c        |  11 +-
  .../gfortran.dg/goacc/kernels-conversion.f95       |  11 +-
  3 files changed, 557 insertions(+), 23 deletions(-)

diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c
index d180377..6e08366 100644
--- a/gcc/omp-oacc-kernels.c
+++ b/gcc/omp-oacc-kernels.c
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "backend.h"
  #include "target.h"
  #include "tree.h"
+#include "cp/cp-tree.h"
  #include "gimple.h"
  #include "tree-pass.h"
  #include "cgraph.h"
@@ -45,16 +46,548 @@ along with GCC; see the file COPYING3.  If not see
     For now, the translation is as follows:
     - The entire kernels region is turned into a data region with clauses
       taken from the kernels region.  New "create" clauses are added for all
-     variables declared at the top level in the kernels region.  */
+     variables declared at the top level in the kernels region.
+   - Any loop annotated with an OpenACC loop directive is wrapped in a new
+     parallel region.  Gang/worker/vector annotations are copied from the
+     original kernels region if present.
+     * Loops without an explicit "independent" or "seq" annotation get an
+       "auto" annotation; other annotations are preserved on the loop or
+       moved to the new surrounding parallel region.  Which annotations are
+       moved is determined by the constraints in the OpenACC spec; for
+       example, loops in the kernels region may have a gang clause, but
+       such annotations must now be moved to the new parallel region.
+   - Any sequences of other code (non-loops, non-OpenACC loops) are wrapped
+     in new "gang-single" parallel regions: Worker/vector annotations are
+     copied from the original kernels region if present, but num_gangs is
+     explicitly set to 1.  */
+
+/* Helper function for decompose_kernels_region_body.  If STMT contains a
+   "top-level" OMP_FOR statement, returns a pointer to that statement;
+   returns NULL otherwise.
+
+   A "top-level" OMP_FOR statement is one that is possibly accompanied by
+   small snippets of setup code.  Specifically, this function accepts an
+   OMP_FOR possibly wrapped in a singleton bind and a singleton try
+   statement to allow for a local loop variable, but not an OMP_FOR
+   statement nested in any other constructs.  Alternatively, it accepts a
+   non-singleton bind containing only assignments and then an OMP_FOR
+   statement at the very end.  The former style can be generated by the C
+   frontend, the latter by the Fortran frontend.  */
+
+static gimple *
+top_level_omp_for_in_stmt (gimple *stmt)
+{
+  if (gimple_code (stmt) == GIMPLE_OMP_FOR)
+    return stmt;
+
+  if (gimple_code (stmt) == GIMPLE_BIND)
+    {
+      gimple_seq body = gimple_bind_body (as_a <gbind *> (stmt));
+      if (gimple_seq_singleton_p (body))
+        {
+          /* Accept an OMP_FOR statement, or a try statement containing only
+             a single OMP_FOR.  */
+          gimple *maybe_for_or_try = gimple_seq_first_stmt (body);
+          if (gimple_code (maybe_for_or_try) == GIMPLE_OMP_FOR)
+            return maybe_for_or_try;
+          else if (gimple_code (maybe_for_or_try) == GIMPLE_TRY)
+            {
+              gimple_seq try_body = gimple_try_eval (maybe_for_or_try);
+              if (!gimple_seq_singleton_p (try_body))
+                return NULL;
+              gimple *maybe_omp_for_stmt = gimple_seq_first_stmt (try_body);
+              if (gimple_code (maybe_omp_for_stmt) == GIMPLE_OMP_FOR)
+                return maybe_omp_for_stmt;
+            }
+        }
+      else
+        {
+          gimple_stmt_iterator gsi;
+          /* Accept only a block of optional assignments followed by an
+             OMP_FOR at the end.  No other kinds of statements allowed.  */
+          for (gsi = gsi_start (body); !gsi_end_p (gsi); gsi_next (&gsi))
+            {
+              gimple *body_stmt = gsi_stmt (gsi);
+              if (gimple_code (body_stmt) == GIMPLE_ASSIGN)
+                continue;
+              else if (gimple_code (body_stmt) == GIMPLE_OMP_FOR
+                        && gsi_one_before_end_p (gsi))
+                return body_stmt;
+              else
+                return NULL;
+            }
+        }
+    }
+
+  return NULL;
+}
+
+/* Construct a "gang-single" OpenACC parallel region at LOC containing the
+   STMTS.  The newly created region is annotated with CLAUSES, which must
+   not contain a num_gangs clause, and an additional "num_gangs(1)" clause
+   to force gang-single execution.  */
+
+static gimple *
+make_gang_single_region (location_t loc, gimple_seq stmts, tree clauses)
+{
+  /* This correctly unshares the entire clause chain rooted here.  */
+  clauses = unshare_expr (clauses);
+  /* Make a num_gangs(1) clause.  */
+  tree gang_single_clause = build_omp_clause (loc, OMP_CLAUSE_NUM_GANGS);
+  OMP_CLAUSE_OPERAND (gang_single_clause, 0) = integer_one_node;
+  OMP_CLAUSE_CHAIN (gang_single_clause) = clauses;
+
+  /* Build the gang-single region.  */
+  gimple *single_region
+    = gimple_build_omp_target (
+        NULL,
+        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE,
+        gang_single_clause);
+  gimple_set_location (single_region, loc);
+  gbind *single_body = gimple_build_bind (NULL, stmts, make_node (BLOCK));
+  gimple_omp_set_body (single_region, single_body);
+
+  return single_region;
+}
+
+/* Helper for make_region_loop_nest.  Transform OpenACC 'kernels'/'loop'
+   construct clauses into OpenACC 'parallel'/'loop' construct ones.  */
+
+static tree
+transform_kernels_loop_clauses (gimple *omp_for,
+				tree num_gangs_clause,
+				tree clauses)
+{
+  /* If this loop in a kernels region does not have an explicit
+     "independent", "seq", or "auto" clause, we must give it an explicit
+     "auto" clause. */
+  bool add_auto_clause = true;
+  tree loop_clauses = gimple_omp_for_clauses (omp_for);
+  for (tree c = loop_clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_AUTO
+          || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_INDEPENDENT
+          || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SEQ)
+        {
+          add_auto_clause = false;
+          break;
+        }
+    }
+  if (add_auto_clause)
+    {
+      tree auto_clause = build_omp_clause (gimple_location (omp_for),
+                                           OMP_CLAUSE_AUTO);
+      OMP_CLAUSE_CHAIN (auto_clause) = loop_clauses;
+      gimple_omp_for_set_clauses (omp_for, auto_clause);
+    }
+
+  /* If the kernels region had a num_gangs clause, add that to this new
+     parallel region.  */
+  if (num_gangs_clause != NULL)
+    {
+      tree parallel_num_gangs_clause = unshare_expr (num_gangs_clause);
+      OMP_CLAUSE_CHAIN (parallel_num_gangs_clause) = clauses;
+      clauses = parallel_num_gangs_clause;
+    }
+
+  return clauses;
+}
+
+/* Construct a possibly gang-parallel OpenACC parallel region containing the
+   STMT, which must be identical to, or a bind containing, the loop OMP_FOR
+   with OpenACC loop annotations.
+
+   The newly created region is annotated with the optional NUM_GANGS_CLAUSE
+   as well as the other CLAUSES, which must not contain a num_gangs clause.  */
+
+static gimple *
+make_gang_parallel_loop_region (gimple *omp_for, gimple *stmt,
+                                tree num_gangs_clause, tree clauses)
+{
+  /* This correctly unshares the entire clause chain rooted here.  */
+  clauses = unshare_expr (clauses);
+
+  clauses = transform_kernels_loop_clauses (omp_for,
+					    num_gangs_clause,
+					    clauses);
+
+  /* Now build the parallel region containing this loop.  */
+  gimple_seq parallel_body = NULL;
+  gimple_seq_add_stmt (&parallel_body, stmt);
+  gimple *parallel_body_bind
+    = gimple_build_bind (NULL, parallel_body, make_node (BLOCK));
+  gimple *parallel_region
+    = gimple_build_omp_target (
+        parallel_body_bind,
+        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED,
+        clauses);
+  gimple_set_location (parallel_region, gimple_location (stmt));
+
+  return parallel_region;
+}
+
+/* Eliminate any binds directly inside BIND by adding their statements to
+   BIND (i.e., modifying it in place), excluding binds that hold only an
+   OMP_FOR loop and associated setup/cleanup code.  Recurse into binds but
+   not other statements.  Return a chain of the local variables of eliminated
+   binds, i.e., the local variables found in nested binds.  If
+   INCLUDE_TOPLEVEL_VARS is true, this also includes the variables belonging
+   to BIND itself. */
+
+static tree
+flatten_binds (gbind *bind, bool include_toplevel_vars = false)
+{
+  tree vars = NULL, last_var = NULL;
+
+  if (include_toplevel_vars)
+    {
+      vars = gimple_bind_vars (bind);
+      last_var = vars;
+    }
+
+  gimple_seq new_body = NULL;
+  gimple_seq body_sequence = gimple_bind_body (bind);
+  gimple_stmt_iterator gsi, gsi_n;
+  for (gsi = gsi_start (body_sequence); !gsi_end_p (gsi); gsi = gsi_n)
+    {
+      /* Advance the iterator here because otherwise it would be invalidated
+         by moving statements below.  */
+      gsi_n = gsi;
+      gsi_next (&gsi_n);
+
+      gimple *stmt = gsi_stmt (gsi);
+      /* Flatten bind statements, except the ones that contain only an
+         OpenACC for loop.  */
+      if (gimple_code (stmt) == GIMPLE_BIND
+          && !top_level_omp_for_in_stmt (stmt))
+        {
+          gbind *inner_bind = as_a <gbind *> (stmt);
+          /* Flatten recursively, and collect all variables.  */
+          tree inner_vars = flatten_binds (inner_bind, true);
+          gimple_seq inner_sequence = gimple_bind_body (inner_bind);
+          gcc_assert (gimple_code (inner_sequence) != GIMPLE_BIND
+                      || top_level_omp_for_in_stmt (inner_sequence));
+          gimple_seq_add_seq (&new_body, inner_sequence);
+          /* Find the last variable; we will append others to it.  */
+          while (last_var != NULL && TREE_CHAIN (last_var) != NULL)
+            last_var = TREE_CHAIN (last_var);
+          if (last_var != NULL)
+            {
+              TREE_CHAIN (last_var) = inner_vars;
+              last_var = inner_vars;
+            }
+          else
+            {
+              vars = inner_vars;
+              last_var = vars;
+            }
+        }
+      else
+        gimple_seq_add_stmt (&new_body, stmt);
+    }
+
+  /* Put the possibly transformed body back into the bind.  */
+  gimple_bind_set_body (bind, new_body);
+  return vars;
+}
+
+/* Helper function for places where we construct data regions.  Wraps the BODY
+   inside a try-finally construct at LOC that calls __builtin_GOACC_data_end
+   in its cleanup block.  Returns this try statement.  */
+
+static gimple *
+make_data_region_try_statement (location_t loc, gimple *body)
+{
+  tree data_end_fn = builtin_decl_explicit (BUILT_IN_GOACC_DATA_END);
+  gimple *call = gimple_build_call (data_end_fn, 0);
+  gimple_seq cleanup = NULL;
+  gimple_seq_add_stmt (&cleanup, call);
+  gimple *try_stmt = gimple_build_try (body, cleanup, GIMPLE_TRY_FINALLY);
+  gimple_set_location (body, loc);
+  return try_stmt;
+}
+
+/* If INNER_BIND_VARS holds variables, build an OpenACC data region with
+   location LOC containing BODY and having "create(var)" clauses for each
+   variable.  If INNER_CLEANUP is present, add a try-finally statement with
+   this cleanup code in the finally block.  Return the new data region, or
+   the original BODY if no data region was needed.  */
+
+static gimple *
+maybe_build_inner_data_region (location_t loc, gimple *body,
+                               tree inner_bind_vars, gimple *inner_cleanup)
+{
+  /* Build data "create(var)" clauses for these local variables.
+     Below we will add these to a data region enclosing the entire body
+     of the decomposed kernels region.  */
+  tree prev_mapped_var = NULL, next = NULL, artificial_vars = NULL,
+       inner_data_clauses = NULL;
+  for (tree v = inner_bind_vars; v; v = next)
+    {
+      next = TREE_CHAIN (v);
+      if (DECL_ARTIFICIAL (v)
+          || TREE_CODE (v) == CONST_DECL
+          || (DECL_LANG_SPECIFIC (current_function_decl)
+              && DECL_TEMPLATE_INSTANTIATION (current_function_decl)))
+        {
+          /* If this is an artificial temporary, it need not be mapped.  We
+             move its declaration into the bind inside the data region.
+             Also avoid mapping variables if we are inside a template
+             instantiation; the code does not contain all the copies to
+             temporaries that would make this legal.  */
+          TREE_CHAIN (v) = artificial_vars;
+          artificial_vars = v;
+          if (prev_mapped_var != NULL)
+            TREE_CHAIN (prev_mapped_var) = next;
+          else
+            inner_bind_vars = next;
+        }
+      else
+        {
+          /* Otherwise, build the map clause.  */
+          tree new_clause = build_omp_clause (loc, OMP_CLAUSE_MAP);
+          OMP_CLAUSE_SET_MAP_KIND (new_clause, GOMP_MAP_ALLOC);
+          OMP_CLAUSE_DECL (new_clause) = v;
+          OMP_CLAUSE_SIZE (new_clause) = DECL_SIZE_UNIT (v);
+          OMP_CLAUSE_CHAIN (new_clause) = inner_data_clauses;
+          inner_data_clauses = new_clause;
+
+          prev_mapped_var = v;
+        }
+    }
+
+  if (artificial_vars)
+    body = gimple_build_bind (artificial_vars, body, make_node (BLOCK));
+
+  /* If we determined above that there are variables that need to be created
+     on the device, construct a data region for them and wrap the body
+     inside that.  */
+  if (inner_data_clauses != NULL)
+    {
+      gcc_assert (inner_bind_vars != NULL);
+      gimple *inner_data_region
+        = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_DATA_KERNELS,
+                                   inner_data_clauses);
+      gimple_set_location (inner_data_region, loc);
+      /* Make sure __builtin_GOACC_data_end is called at the end.  */
+      gimple *try_stmt = make_data_region_try_statement (loc, body);
+      gimple_omp_set_body (inner_data_region, try_stmt);
+      gimple *bind_body;
+      if (inner_cleanup != NULL)
+          /* Clobber all the inner variables that need to be clobbered.  */
+          bind_body = gimple_build_try (inner_data_region, inner_cleanup,
+                                        GIMPLE_TRY_FINALLY);
+      else
+          bind_body = inner_data_region;
+      body = gimple_build_bind (inner_bind_vars, bind_body, make_node (BLOCK));
+    }
+
+  return body;
+}
+
+/* Decompose the body of the KERNELS_REGION, which was originally annotated
+   with the KERNELS_CLAUSES, into a series of parallel regions.  */
+
+static gimple *
+decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
+{
+  location_t loc = gimple_location (kernels_region);
+
+  /* The kernels clauses will be propagated to the child clauses unmodified,
+     except that that num_gangs clause will only be added to loop regions.
+     The other regions are "gang-single" and get an explicit num_gangs(1)
+     clause.  So separate out the num_gangs clause here.  */
+  tree num_gangs_clause = NULL, prev_clause = NULL;
+  tree parallel_clauses = kernels_clauses;
+  for (tree c = parallel_clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_NUM_GANGS)
+        {
+          /* Cut this clause out of the chain.  */
+          num_gangs_clause = c;
+          if (prev_clause != NULL)
+            OMP_CLAUSE_CHAIN (prev_clause) = OMP_CLAUSE_CHAIN (c);
+          else
+            kernels_clauses = OMP_CLAUSE_CHAIN (c);
+          OMP_CLAUSE_CHAIN (num_gangs_clause) = NULL;
+          break;
+        }
+      else
+        prev_clause = c;
+    }
+
+  gimple *kernels_body = gimple_omp_body (kernels_region);
+  gbind *kernels_bind = as_a <gbind *> (kernels_body);
+
+  /* The body of the region may contain other nested binds declaring inner
+     local variables.  Collapse all these binds into one to ensure that we
+     have a single sequence of statements to iterate over; also, collect all
+     inner variables.  */
+  tree inner_bind_vars = flatten_binds (kernels_bind);
+  gimple_seq body_sequence = gimple_bind_body (kernels_bind);
+
+  /* All these inner variables will get allocated on the device (below, by
+     calling maybe_build_inner_data_region).  Here we create "present"
+     clauses for them and add these clauses to the list of clauses to be
+     attached to each inner parallel region.  */
+  tree present_clauses = kernels_clauses;
+  for (tree var = inner_bind_vars; var; var = TREE_CHAIN (var))
+    {
+      if (!DECL_ARTIFICIAL (var) && TREE_CODE (var) != CONST_DECL)
+        {
+          tree present_clause = build_omp_clause (loc, OMP_CLAUSE_MAP);
+          OMP_CLAUSE_SET_MAP_KIND (present_clause, GOMP_MAP_FORCE_PRESENT);
+          OMP_CLAUSE_DECL (present_clause) = var;
+          OMP_CLAUSE_SIZE (present_clause) = DECL_SIZE_UNIT (var);
+          OMP_CLAUSE_CHAIN (present_clause) = present_clauses;
+          present_clauses = present_clause;
+        }
+    }
+  kernels_clauses = present_clauses;
+
+  /* In addition to nested binds, the "real" body of the region may be
+     nested inside a try-finally block.  Find its cleanup block, which
+     contains code to clobber the local variables that must be clobbered.  */
+  gimple *inner_cleanup = NULL;
+  if (body_sequence != NULL && gimple_code (body_sequence) == GIMPLE_TRY)
+    {
+      if (gimple_seq_singleton_p (body_sequence))
+        {
+          /* The try statement is the only thing inside the bind.  */
+          inner_cleanup = gimple_try_cleanup (body_sequence);
+          body_sequence = gimple_try_eval (body_sequence);
+        }
+      else
+        {
+          /* The bind's body starts with a try statement, but it is followed
+             by other things.  */
+          gimple_stmt_iterator gsi = gsi_start (body_sequence);
+          gimple *try_stmt = gsi_stmt (gsi);
+          inner_cleanup = gimple_try_cleanup (try_stmt);
+          gimple *try_body = gimple_try_eval (try_stmt);
+
+          gsi_remove (&gsi, false);
+          /* Now gsi indicates the sequence of statements after the try
+             statement in the bind.  Append the statement in the try body and
+             the trailing statements from gsi.  */
+          gsi_insert_seq_before (&gsi, try_body, GSI_CONTINUE_LINKING);
+          body_sequence = gsi_stmt (gsi);
+        }
+    }
+
+  /* This sequence will collect all the top-level statements in the body of
+     the data region we are about to construct.  */
+  gimple_seq region_body = NULL;
+  /* This sequence will collect consecutive statements to be put into a
+     gang-single region.  */
+  gimple_seq gang_single_seq = NULL;
+  /* Flag recording whether the gang_single_seq only contains copies to
+     local variables.  These may be loop setup code that should not be
+     separated from the loop.  */
+  bool only_simple_assignments = true;
+
+  /* Iterate over the statements in the kernels region's body.  */
+  gimple_stmt_iterator gsi, gsi_n;
+  for (gsi = gsi_start (body_sequence); !gsi_end_p (gsi); gsi = gsi_n)
+    {
+      /* Advance the iterator here because otherwise it would be invalidated
+         by moving statements below.  */
+      gsi_n = gsi;
+      gsi_next (&gsi_n);
+
+      gimple *stmt = gsi_stmt (gsi);
+      gimple *omp_for = top_level_omp_for_in_stmt (stmt);
+      if (omp_for != NULL)
+        {
+          /* This is an OMP for statement, put it into a parallel region.
+             But first, construct a gang-single region containing any
+             complex sequential statements we may have seen.  */
+          if (gang_single_seq != NULL && !only_simple_assignments)
+            {
+              gimple *single_region
+                = make_gang_single_region (loc, gang_single_seq,
+                                           kernels_clauses);
+              gimple_seq_add_stmt (&region_body, single_region);
+            }
+          else if (gang_single_seq != NULL && only_simple_assignments)
+            {
+              /* There is a sequence of sequential statements preceding this
+                 loop, but they are all simple assignments.  This is
+                 probably setup code for the loop; in particular, Fortran DO
+                 loops are preceded by code to copy the loop limit variable
+                 to a temporary.  Group this code together with the loop
+                 itself.  */
+              gimple_seq_add_stmt (&gang_single_seq, stmt);
+              stmt = gimple_build_bind (NULL, gang_single_seq,
+                                        make_node (BLOCK));
+            }
+          gang_single_seq = NULL;
+          only_simple_assignments = true;
+
+          gimple *parallel_region
+            = make_gang_parallel_loop_region (omp_for, stmt,
+                                              num_gangs_clause,
+                                              kernels_clauses);
+          gimple_seq_add_stmt (&region_body, parallel_region);
+        }
+      else
+        {
+          /* This is not an OMP for statement, so it will be put into a
+             gang-single region.  */
+          gimple_seq_add_stmt (&gang_single_seq, stmt);
+          /* Is this a simple assignment? We call it simple if it is an
+             assignment to an artificial local variable.  This captures
+             Fortran loop setup code computing loop bounds and offsets.  */
+          bool is_simple_assignment
+            = (gimple_code (stmt) == GIMPLE_ASSIGN
+                && TREE_CODE (gimple_assign_lhs (stmt)) == VAR_DECL
+                && DECL_ARTIFICIAL (gimple_assign_lhs (stmt)));
+          if (!is_simple_assignment)
+            only_simple_assignments = false;
+        }
+    }
+
+  /* If we did not emit a new region, and are not going to emit one now
+     (that is, the original region was empty), prepare to emit a dummy so as
+     to preserve the original construct, which other processing (at least
+     test cases) depend on.  */
+  if (region_body == NULL && gang_single_seq == NULL)
+    {
+      gimple *stmt = gimple_build_nop ();
+      gimple_set_location (stmt, loc);
+      gimple_seq_add_stmt (&gang_single_seq, stmt);
+    }
+
+  /* Gather up any remaining gang-single statements.  */
+  if (gang_single_seq != NULL)
+    {
+      gimple *single_region
+        = make_gang_single_region (loc, gang_single_seq, kernels_clauses);
+      gimple_seq_add_stmt (&region_body, single_region);
+    }
+
+  tree kernels_locals = gimple_bind_vars (as_a <gbind *> (kernels_body));
+  gimple *body = gimple_build_bind (kernels_locals, region_body,
+                                    make_node (BLOCK));
+
+  /* If we found variables declared in nested scopes, build a data region to
+     map them to the device.  */
+  body = maybe_build_inner_data_region (loc, body, inner_bind_vars,
+                                        inner_cleanup);
+
+  return body;
+}

  /* Transform KERNELS_REGION, which is an OpenACC kernels region, into a data
-   region containing the original kernels region.  */
+   region containing the original kernels region's body cut up into a
+   sequence of parallel regions.  */

  static gimple *
  transform_kernels_region (gimple *kernels_region)
  {
    gcc_checking_assert (gimple_omp_target_kind (kernels_region)
                          == GF_OMP_TARGET_KIND_OACC_KERNELS);
+  location_t loc = gimple_location (kernels_region);

    /* Collect the kernels region's data clauses and create the new data
       region with those clauses.  */
@@ -130,26 +663,17 @@ transform_kernels_region (gimple *kernels_region)
    gimple *data_region
      = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_DATA_KERNELS,
                                 data_clauses);
-  gimple_set_location (data_region, gimple_location (kernels_region));
-
-  /* For now, just construct a new parallel region inside the data region.  */
-  gimple *inner_region
-    = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_PARALLEL,
-                               kernels_clauses);
-  gimple_set_location (inner_region, gimple_location (kernels_region));
-  gimple_omp_set_body (inner_region, gimple_omp_body (kernels_region));
+  gimple_set_location (data_region, loc);

-  gbind *bind = gimple_build_bind (NULL, NULL, NULL);
-  gimple_bind_add_stmt (bind, inner_region);
+  /* Transform the body of the kernels region into a sequence of parallel
+     regions.  */
+  gimple *body = decompose_kernels_region_body (kernels_region,
+                                                kernels_clauses);

    /* Put the transformed pieces together.  The entire body of the region is
       wrapped in a try-finally statement that calls __builtin_GOACC_data_end
       for cleanup.  */
-  tree data_end_fn = builtin_decl_explicit (BUILT_IN_GOACC_DATA_END);
-  gimple *call = gimple_build_call (data_end_fn, 0);
-  gimple_seq cleanup = NULL;
-  gimple_seq_add_stmt (&cleanup, call);
-  gimple *try_stmt = gimple_build_try (bind, cleanup, GIMPLE_TRY_FINALLY);
+  gimple *try_stmt = make_data_region_try_statement (loc, body);
    gimple_omp_set_body (data_region, try_stmt);

    return data_region;
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
index c75db37..ec5db02 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
@@ -18,6 +18,7 @@ main (void)
        sum += a[i];

      sum++;
+    a[0]++;

      #pragma acc loop
      for (i = 0; i < N; ++i)
@@ -27,10 +28,14 @@ main (void)
    return 0;
  }

-/* Check that the kernels region is split into a data region and an enclosed
-   parallel region.  */
+/* Check that the kernels region is split into a data region and enclosed
+   parallel regions.  */
  /* { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 
"convert_oacc_kernels" } } */
-/* { dg-final { scan-tree-dump-times "oacc_parallel" 1 "convert_oacc_kernels" } 
} */
+
+/* The two loop regions are parallelized, the sequential part in between is
+   made gang-single.  */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 2 
"convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 1 
"convert_oacc_kernels" } } */

  /* Check that the original kernels region is removed.  */
  /* { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
index 8c66330..4aba2b1 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
@@ -15,6 +15,7 @@ program main
    end do

    sum = sum + 1
+  a(1) = a(1) + 1

    !$acc loop
    do i = 1, N
@@ -24,10 +25,14 @@ program main
    !$acc end kernels
  end program main

-! Check that the kernels region is split into a data region and an enclosed
-! parallel region.
+! Check that the kernels region is split into a data region and enclosed
+! parallel regions.
  ! { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 
"convert_oacc_kernels" } }
-! { dg-final { scan-tree-dump-times "oacc_parallel" 1 "convert_oacc_kernels" } }
+
+! The two loop regions are parallelized, the sequential part in between is
+! made gang-single.
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 2 
"convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 1 
"convert_oacc_kernels" } }

  ! Check that the original kernels region is removed.
  ! { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } }
-- 
2.8.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 05/10, OpenACC] Handle conditional execution of loops in OpenACC, kernels regions
  2019-07-17 21:03 ` [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions Kwok Cheung Yeung
                     ` (3 preceding siblings ...)
  2019-07-17 21:11   ` [PATCH 04/10, OpenACC] Turn OpenACC kernels regions into a sequence of, parallel regions Kwok Cheung Yeung
@ 2019-07-17 21:12   ` Kwok Cheung Yeung
  2019-07-17 21:13   ` [PATCH 06/10, OpenACC] Adjust parallelism of loops in gang-single parts of OpenACC " Kwok Cheung Yeung
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: Kwok Cheung Yeung @ 2019-07-17 21:12 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Thomas Schwinge

Any OpenACC loop controlled by an if statement or a non-OpenACC loop must be 
executed in a gang-single region. Detecting such loops is not trivial as OpenACC 
kernels expansion is done on GIMPLE but before computation of the control flow 
graph. This patch adds an auxiliary analysis for determining whether a statement 
is inside a conditionally executed region (relative to the kernels region'sentry).

2019-07-16  Gergö Barany  <gergo@codesourcery.com>

	gcc/
	* omp-oacc-kernels.c (control_flow_regions): New class.
	(control_flow_regions::control_flow_regions): New constructor.
	(control_flow_regions::is_unconditional_oacc_for_loop): New method.
	(control_flow_regions::find_rep): Likewise.
	(control_flow_regions::union_reps): Likewise.
	(control_flow_regions::compute_regions): Likewise.
	(decompose_kernels_region_body): Use test for conditional execution.

	gcc/testsuite/
	* c-c++-common/goacc/kernels-conversion.c: Add test for conditionally
	executed code.
	* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
---
  gcc/omp-oacc-kernels.c                             | 216 ++++++++++++++++++++-
  .../c-c++-common/goacc/kernels-conversion.c        |  20 +-
  .../gfortran.dg/goacc/kernels-conversion.f95       |  21 +-
  3 files changed, 245 insertions(+), 12 deletions(-)

diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c
index 6e08366..80a82fa 100644
--- a/gcc/omp-oacc-kernels.c
+++ b/gcc/omp-oacc-kernels.c
@@ -385,6 +385,208 @@ maybe_build_inner_data_region (location_t loc, gimple*body,
    return body;
  }

+/* Auxiliary analysis of the body of a kernels region, to determine for each
+   OpenACC loop whether it is control-dependent (i.e., not necessarily
+   executed every time the kernels region is entered) or not.
+   We say that a loop is control-dependent if there is some cond, switch, or
+   goto statement that jumps over it, forwards or backwards.  For example,
+   if the loop is controlled by an if statement, then a jump to the true
+   block, the false block, or from one of those blocks to the control flow
+   join point will necessarily jump over the loop.
+   This analysis implements an ad-hoc union-find data structure classifying
+   statements into "control-flow regions" as follows: Most statements are in
+   the same region as their predecessor, except that each OpenACC loop is in
+   a region of its own, and each OpenACC loop's successor starts a new
+   region.  We then unite the regions of any statements linked by jumps,
+   placing any cond, switch, or goto statement in the same region as its
+   target label(s).
+   In the end, control dependence of OpenACC loops can be determined by
+   comparing their immediate predecessor and successor statements' regions.
+   A jump crosses the loop if and only if the predecessor and successor are
+   in the same region.  (If there is no predecessor or successor, the loop
+   is executed unconditionally.)
+   The methods in this class identify statements by their index in the
+   kernels region's body.  */
+
+class control_flow_regions
+{
+  public:
+    /* Initialize an instance and pre-compute the control-flow region
+       information for the statement sequence SEQ.  */
+    control_flow_regions (gimple_seq seq);
+
+    /* Return true if the STMT with the given index IDX in the analyzed
+       statement sequence is an unconditionally executed OpenACC loop.  */
+    bool is_unconditional_oacc_for_loop (gimple *stmt, size_t idx);
+
+  private:
+    /* Find the region representative for the statement identified by index
+       STMT_IDX.  */
+    size_t find_rep (size_t stmt_idx);
+
+    /* Union the regions containing the statements represented by
+       representatives A and B.  */
+    void union_reps (size_t a, size_t b);
+
+    /* Helper for the constructor.  Performs the actual computation of the
+       control-flow regions in the statement sequence SEQ.  */
+    void compute_regions (gimple_seq seq);
+
+    /* The mapping from statement indices to region representatives.  */
+    vec <size_t> representatives;
+
+    /* A cache mapping statement indices to a flag indicating whether the
+       statement is a top level OpenACC for loop.  */
+    vec <bool> omp_for_loops;
+};
+
+control_flow_regions::control_flow_regions (gimple_seq seq)
+{
+  representatives.create (1);
+  omp_for_loops.create (1);
+  compute_regions (seq);
+}
+
+bool
+control_flow_regions::is_unconditional_oacc_for_loop (gimple *stmt, size_tidx)
+{
+  if (top_level_omp_for_in_stmt (stmt) == NULL)
+    /* Not an OpenACC for loop.  */
+    return false;
+  if (idx == 0 || idx == representatives.length () - 1)
+    /* The first or last statement in the kernels region.  This means that
+       there is no room before or after it for a jump or a label.  Thus
+       there cannot be a jump across it, so it is unconditional.  */
+    return true;
+  /* Otherwise, the loop is unconditional if the statements before and after
+     it are in different control flow regions.  Scan forward and backward,
+     skipping over neighboring OpenACC for loops, to find these preceding
+     statements.  */
+  size_t prev_index = idx - 1;
+  while (prev_index > 0 && omp_for_loops [prev_index] == true)
+    prev_index--;
+  /* If all preceding statements are also OpenACC loops, all of these are
+     unconditional.  */
+  if (prev_index == 0)
+    return true;
+  size_t succ_index = idx + 1;
+  while (succ_index < omp_for_loops.length ()
+         && omp_for_loops [succ_index] == true)
+    succ_index++;
+  /* If all following statements are also OpenACC loops, all of these are
+     unconditional.  */
+  if (succ_index == omp_for_loops.length ())
+    return true;
+  return (find_rep (prev_index) != find_rep (succ_index));
+}
+
+size_t
+control_flow_regions::find_rep (size_t stmt_idx)
+{
+  size_t rep = stmt_idx, aux = stmt_idx;
+  /* Find the root representative of this statement.  */
+  while (representatives[rep] != rep)
+    rep = representatives[rep];
+  /* Compress the path from the original statement to the representative. */
+  while (representatives[aux] != rep)
+    {
+      size_t tmp = representatives[aux];
+      representatives[aux] = rep;
+      aux = tmp;
+    }
+  return rep;
+}
+
+void
+control_flow_regions::union_reps (size_t a, size_t b)
+{
+  a = find_rep (a);
+  b = find_rep (b);
+  representatives[b] = a;
+}
+
+void
+control_flow_regions::compute_regions (gimple_seq seq)
+{
+  hash_map <gimple *, size_t> control_flow_reps;
+  hash_map <tree, size_t> label_reps;
+  size_t current_region = 0, idx = 0;
+
+  /* In a first pass, assign an initial region to each statement. Except in
+     the case of OpenACC loops, each statement simply gets the same region
+     representative as its predecessor.  */
+  for (gimple_stmt_iterator gsi = gsi_start (seq);
+       !gsi_end_p (gsi);
+       gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+      gimple *omp_for = top_level_omp_for_in_stmt (stmt);
+      omp_for_loops.safe_push (omp_for != NULL);
+      if (omp_for != NULL)
+        {
+          /* Assign a new region to this loop and to its successor.  */
+          current_region = idx;
+          representatives.safe_push (current_region);
+          current_region++;
+        }
+      else
+        {
+          representatives.safe_push (current_region);
+          /* Remember any jumps and labels for the second pass below.  */
+          if (gimple_code (stmt) == GIMPLE_COND
+              || gimple_code (stmt) == GIMPLE_SWITCH
+              || gimple_code (stmt) == GIMPLE_GOTO)
+            control_flow_reps.put (stmt, current_region);
+          else if (gimple_code (stmt) == GIMPLE_LABEL)
+            label_reps.put (gimple_label_label (as_a <glabel *> (stmt)),
+                            current_region);
+        }
+      idx++;
+    }
+  gcc_assert (representatives.length () == omp_for_loops.length ());
+
+  /* Revisit all the control flow statements and union the region of each
+     cond, switch, or goto statement with the target labels' regions.  */
+  for (hash_map <gimple *, size_t>::iterator it = control_flow_reps.begin ();
+       it != control_flow_reps.end ();
+       ++it)
+    {
+      gimple *stmt = (*it).first;
+      size_t stmt_rep = (*it).second;
+      switch (gimple_code (stmt))
+        {
+          tree label;
+          unsigned int n;
+
+        case GIMPLE_COND:
+          label = gimple_cond_true_label (as_a <gcond *> (stmt));
+          union_reps (stmt_rep, *label_reps.get (label));
+          label = gimple_cond_false_label (as_a <gcond *> (stmt));
+          union_reps (stmt_rep, *label_reps.get (label));
+          break;
+
+        case GIMPLE_SWITCH:
+          n = gimple_switch_num_labels (as_a <gswitch *> (stmt));
+          for (unsigned int i = 0; i < n; i++)
+            {
+              tree switch_case
+                = gimple_switch_label (as_a <gswitch *> (stmt), i);
+              label = CASE_LABEL (switch_case);
+              union_reps (stmt_rep, *label_reps.get (label));
+            }
+          break;
+
+        case GIMPLE_GOTO:
+          label = gimple_goto_dest (stmt);
+          union_reps (stmt_rep, *label_reps.get (label));
+          break;
+
+        default:
+          gcc_unreachable ();
+        }
+    }
+}
+
  /* Decompose the body of the KERNELS_REGION, which was originally annotated
     with the KERNELS_CLAUSES, into a series of parallel regions.  */

@@ -486,9 +688,14 @@ decompose_kernels_region_body (gimple *kernels_region,tree 
kernels_clauses)
       separated from the loop.  */
    bool only_simple_assignments = true;

+  /* Precompute the control flow region information to determine whether an
+     OpenACC loop is executed conditionally or unconditionally.  */
+  control_flow_regions cf_regions (body_sequence);
+
    /* Iterate over the statements in the kernels region's body.  */
+  size_t idx = 0;
    gimple_stmt_iterator gsi, gsi_n;
-  for (gsi = gsi_start (body_sequence); !gsi_end_p (gsi); gsi = gsi_n)
+  for (gsi = gsi_start (body_sequence); !gsi_end_p (gsi); gsi = gsi_n,idx++)
      {
        /* Advance the iterator here because otherwise it would be invalidated
           by moving statements below.  */
@@ -497,7 +704,8 @@ decompose_kernels_region_body (gimple *kernels_region, tree 
kernels_clauses)

        gimple *stmt = gsi_stmt (gsi);
        gimple *omp_for = top_level_omp_for_in_stmt (stmt);
-      if (omp_for != NULL)
+      if (omp_for != NULL
+          && cf_regions.is_unconditional_oacc_for_loop (stmt, idx))
          {
            /* This is an OMP for statement, put it into a parallel region.
               But first, construct a gang-single region containing any
@@ -532,8 +740,8 @@ decompose_kernels_region_body (gimple *kernels_region, tree 
kernels_clauses)
          }
        else
          {
-          /* This is not an OMP for statement, so it will be put into a
-             gang-single region.  */
+          /* This is not an unconditional OMP for statement, so it will be
+             put into a gang-single region.  */
            gimple_seq_add_stmt (&gang_single_seq, stmt);
            /* Is this a simple assignment? We call it simple if it is an
               assignment to an artificial local variable.  This captures
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
index ec5db02..ed4d642 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
@@ -12,6 +12,7 @@ main (void)
    unsigned int sum = 1;

  #pragma acc kernels copyin(a[0:N]) copy(sum)
+  /* { dg-bogus "region contains gang partitoned code but is not gang 
partitioned" "gang partitioned" { xfail *-*-* } .-1 } */
    {
      #pragma acc loop
      for (i = 0; i < N; ++i)
@@ -23,6 +24,17 @@ main (void)
      #pragma acc loop
      for (i = 0; i < N; ++i)
        sum += a[i];
+
+    if (sum > 10)
+      {
+        #pragma acc loop
+        for (i = 0; i < N; ++i)
+          sum += a[i];
+      }
+
+    #pragma acc loop
+    for (i = 0; i < N; ++i)
+      sum += a[i];
    }

    return 0;
@@ -32,10 +44,10 @@ main (void)
     parallel regions.  */
  /* { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 
"convert_oacc_kernels" } } */

-/* The two loop regions are parallelized, the sequential part in between is
-   made gang-single.  */
-/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 2 
"convert_oacc_kernels" } } */
-/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 1 
"convert_oacc_kernels" } } */
+/* The three unconditional loop regions are parallelized, the sequential
+   part in between and the conditional loop are made gang-single.  */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 3 
"convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 2 
"convert_oacc_kernels" } } */

  /* Check that the original kernels region is removed.  */
  /* { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
index 4aba2b1..f89e46b 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
@@ -22,6 +22,19 @@ program main
      sum = sum + a(i)
    end do

+  if (sum .gt. 10) then
+    !$acc loop
+    do i = 1, N
+      sum = sum + a(i)
+    end do
+  end if
+
+  !$acc loop
+  ! { dg-bogus "region contains gang partitoned code but is not gang 
partitioned" "gang partitioned" { xfail *-*-* } .-1 }
+  do i = 1, N
+    sum = sum + a(i)
+  end do
+
    !$acc end kernels
  end program main

@@ -29,10 +42,10 @@ end program main
  ! parallel regions.
  ! { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 
"convert_oacc_kernels" } }

-! The two loop regions are parallelized, the sequential part in between is
-! made gang-single.
-! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 2 
"convert_oacc_kernels" } }
-! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 1 
"convert_oacc_kernels" } }
+! The three unconditional loop regions are parallelized, the sequential part
+! in between and the conditional loop are made gang-single.
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 3 
"convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 2 
"convert_oacc_kernels" } }

  ! Check that the original kernels region is removed.
  ! { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" }}
-- 
2.8.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 06/10, OpenACC] Adjust parallelism of loops in gang-single parts of OpenACC kernels regions
  2019-07-17 21:03 ` [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions Kwok Cheung Yeung
                     ` (4 preceding siblings ...)
  2019-07-17 21:12   ` [PATCH 05/10, OpenACC] Handle conditional execution of loops in OpenACC, kernels regions Kwok Cheung Yeung
@ 2019-07-17 21:13   ` Kwok Cheung Yeung
  2019-08-05 22:17     ` Kwok Cheung Yeung
  2019-07-17 21:24   ` [PATCH 07/10, OpenACC] Launch kernels asynchronously in " Kwok Cheung Yeung
                     ` (4 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: Kwok Cheung Yeung @ 2019-07-17 21:13 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Thomas Schwinge

Loops in gang-single parts of kernels regions cannot be executed in
gang-redundant mode. If the user specified gang clauses on such loops, emit an 
error and remove these clauses. Adjust automatic partitioning to exclude gang 
partitioning in gang-single regions.

2019-07-16  Gergö Barany  <gergo@codesourcery.com>

	gcc/
	* omp-oacc-kernels.c (visit_loops_in_gang_single_region): Emit warning on
	conditionally executed code with a gang clause.
	(make_loops_gang_single): New function.
	(add_parent_or_loop_num_clause): New function.
	(adjust_nested_loop_clauses_wi_info): New struct.
	(adjust_nested_loop_clauses): New function.
	(transform_kernels_loop_clauses): Add worker and vector clause parameters,
	emit error on illegal nesting.
	(make_gang_parallel_loop_region): Likewise.
	(decompose_kernels_region_body): Separate out gang/worker/vector clauses
	for separate handling; add call to make_loops_gang_single.
	* omp-offload.c (oacc_loop_auto_partitions): Add and propagate
	is_oacc_gang_single parameter.
	(oacc_loop_partition): Likewise.
	(execute_oacc_device_lower): Adjust call to oacc_loop_partition.
---
  gcc/omp-oacc-kernels.c | 380 ++++++++++++++++++++++++++++++++++++++++++++-----
  gcc/omp-offload.c      |  22 ++-
  2 files changed, 364 insertions(+), 38 deletions(-)

diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c
index 80a82fa..11a960c 100644
--- a/gcc/omp-oacc-kernels.c
+++ b/gcc/omp-oacc-kernels.c
@@ -59,7 +59,14 @@ along with GCC; see the file COPYING3.  If not see
     - Any sequences of other code (non-loops, non-OpenACC loops) are wrapped
       in new "gang-single" parallel regions: Worker/vector annotations are
       copied from the original kernels region if present, but num_gangs is
-     explicitly set to 1.  */
+     explicitly set to 1.
+   - Both points above only apply at the topmost level in the region, i.e.,
+     the transformation does not introduce new parallel regions inside
+     nested statement bodies.  In particular, this means that a
+     gang-parallelizable loop inside an if statement is "gang-serialized" by
+     the transformation.
+     The transformation visits loops inside such new gang-single-regions and
+     removes and warns about any gang annotations.  */

  /* Helper function for decompose_kernels_region_body.  If STMT contains a
     "top-level" OMP_FOR statement, returns a pointer to that statement;
@@ -122,6 +129,67 @@ top_level_omp_for_in_stmt (gimple *stmt)
    return NULL;
  }

+/* Helper function for make_loops_gang_single for walking the tree.  If the
+   statement indicated by GSI_P is an OpenACC for loop with a gang clause,
+   issue a warning and remove the clause.  */
+
+static tree
+visit_loops_in_gang_single_region (gimple_stmt_iterator *gsi_p,
+                                   bool *handled_ops_p,
+                                   struct walk_stmt_info *)
+{
+  gimple *stmt = gsi_stmt (*gsi_p);
+  tree clauses = NULL, prev_clause = NULL;
+  *handled_ops_p = false;
+
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_OMP_FOR:
+      clauses = gimple_omp_for_clauses (stmt);
+      for (tree clause = clauses; clause; clause = OMP_CLAUSE_CHAIN (clause))
+        {
+          if (OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_GANG)
+            {
+              /* It makes no sense to have a gang clause in a gang-single
+                 region, so remove it and warn.  */
+              warning_at (gimple_location (stmt), 0,
+                          "conditionally executed loop in kernels region"
+                          " will be executed in a single gang;"
+                          " ignoring %<gang%> clause");
+              if (prev_clause != NULL)
+                OMP_CLAUSE_CHAIN (prev_clause) = OMP_CLAUSE_CHAIN (clause);
+              else
+                clauses = OMP_CLAUSE_CHAIN (clause);
+
+              break;
+            }
+          prev_clause = clause;
+        }
+      gimple_omp_for_set_clauses (stmt, clauses);
+      /* No need to recurse into nested statements; no loop nested inside
+         this loop can be gang-partitioned.  */
+      *handled_ops_p = true;
+      break;
+
+    default:
+      break;
+    }
+
+  return NULL;
+}
+
+/* Visit all nested OpenACC loops in the statement indicated by GSI.  This
+   statement is expected to be inside a gang-single region.  Issue a warning
+   for any loops inside it that have gang clauses and remove the clauses.  */
+
+static void
+make_loops_gang_single (gimple_stmt_iterator gsi)
+{
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  walk_gimple_stmt (&gsi, visit_loops_in_gang_single_region, NULL, &wi);
+}
+
  /* Construct a "gang-single" OpenACC parallel region at LOC containing the
     STMTS.  The newly created region is annotated with CLAUSES, which must
     not contain a num_gangs clause, and an additional "num_gangs(1)" clause
@@ -150,45 +218,253 @@ make_gang_single_region (location_t loc, gimple_seq 
stmts, tree clauses)
    return single_region;
  }

+/* Helper function for make_gang_parallel_loop_region.  Adds a num_gangs
+   (num_workers, vector_length) clause to the given CLAUSES, either the one
+   from the parent region (PARENT_CLAUSE) or a new one based on the loop's
+   own LOOP_CLAUSE ("gang(num: N)" or similar for workers or vectors) with
+   the given CLAUSE_CODE.  Does nothing if neither PARENT_CLAUSE nor
+   LOOP_CLAUSE exist.  Returns the new clauses.  */
+
+static tree
+add_parent_or_loop_num_clause (tree parent_clause, tree loop_clause,
+                               omp_clause_code clause_code, tree clauses)
+{
+  if (parent_clause != NULL)
+    {
+      tree num_clause = unshare_expr (parent_clause);
+      OMP_CLAUSE_CHAIN (num_clause) = clauses;
+      clauses = num_clause;
+    }
+  else if (loop_clause != NULL)
+    {
+      /* The kernels region does not have a "num_gangs" clause, but the loop
+         itself had a "gang(num: N)" clause.  Honor it by adding a
+         "num_gangs(N)" clause on the parallel region.  */
+      tree num = OMP_CLAUSE_OPERAND (loop_clause, 0);
+      tree new_num_clause
+        = build_omp_clause (OMP_CLAUSE_LOCATION (loop_clause), clause_code);
+      OMP_CLAUSE_OPERAND (new_num_clause, 0) = num;
+      OMP_CLAUSE_CHAIN (new_num_clause) = clauses;
+      clauses = new_num_clause;
+    }
+  return clauses;
+}
+
+/* Helper for make_gang_parallel_loop_region, looking for "worker(num: N)"
+   or "vector(length: N)" clauses in nested loops.  Removes the numeric
+   argument, transferring it to the enclosing parallel region (via
+   WI->INFO).  If numeric arguments within the same loop nest conflict,
+   emits a warning.
+
+   This function also decides whether to add an auto clause on each of these
+   nested loops.  It adds an auto clause unless there is already an
+   independent/seq/auto clause or a gang/worker/vector annotation.  */
+
+struct adjust_nested_loop_clauses_wi_info
+{
+  tree *loop_gang_clause_ptr;
+  tree *loop_worker_clause_ptr;
+  tree *loop_vector_clause_ptr;
+};
+
+static tree
+adjust_nested_loop_clauses (gimple_stmt_iterator *gsi_p, bool *,
+                            struct walk_stmt_info *wi)
+{
+  struct adjust_nested_loop_clauses_wi_info *wi_info
+    = (struct adjust_nested_loop_clauses_wi_info *) wi->info;
+  gimple *stmt = gsi_stmt (*gsi_p);
+
+  if (gimple_code (stmt) == GIMPLE_OMP_FOR)
+    {
+      bool add_auto_clause = true;
+      tree loop_clauses = gimple_omp_for_clauses (stmt);
+      tree loop_clause = loop_clauses;
+      for (; loop_clause; loop_clause = OMP_CLAUSE_CHAIN (loop_clause))
+        {
+          tree *outer_clause_ptr = NULL;
+          switch (OMP_CLAUSE_CODE (loop_clause))
+            {
+              case OMP_CLAUSE_GANG:
+                outer_clause_ptr = wi_info->loop_gang_clause_ptr;
+                break;
+              case OMP_CLAUSE_WORKER:
+                outer_clause_ptr = wi_info->loop_worker_clause_ptr;
+                break;
+              case OMP_CLAUSE_VECTOR:
+                outer_clause_ptr = wi_info->loop_vector_clause_ptr;
+                break;
+              case OMP_CLAUSE_INDEPENDENT:
+              case OMP_CLAUSE_SEQ:
+              case OMP_CLAUSE_AUTO:
+                add_auto_clause = false;
+              default:
+                break;
+            }
+          if (outer_clause_ptr != NULL)
+            {
+              if (OMP_CLAUSE_OPERAND (loop_clause, 0) != NULL
+                  && *outer_clause_ptr == NULL)
+                {
+                  /* Transfer the clause to the enclosing parallel region
+                     and remove the numerical argument from the loop.  */
+                  *outer_clause_ptr = unshare_expr (loop_clause);
+                  OMP_CLAUSE_OPERAND (loop_clause, 0) = NULL;
+                }
+              else if (OMP_CLAUSE_OPERAND (loop_clause, 0) != NULL &&
+                       OMP_CLAUSE_OPERAND (*outer_clause_ptr, 0) != NULL)
+                {
+                  /* See if both of these are the same constant.  If they
+                     aren't, emit a warning.  */
+                  tree old_op = OMP_CLAUSE_OPERAND (*outer_clause_ptr, 0);
+                  tree new_op = OMP_CLAUSE_OPERAND (loop_clause, 0);
+                  if (!(cst_and_fits_in_hwi (old_op) &&
+                        cst_and_fits_in_hwi (new_op) &&
+                        int_cst_value (old_op) == int_cst_value (new_op)))
+                    {
+                      const char *clause_name
+                        = omp_clause_code_name[OMP_CLAUSE_CODE (loop_clause)];
+                      error_at (gimple_location (stmt),
+                                "cannot honor conflicting %qs annotation",
+                                clause_name);
+                      inform (OMP_CLAUSE_LOCATION (*outer_clause_ptr),
+                              "location of the previous annotation "
+                              "in the same loop nest");
+                    }
+                  OMP_CLAUSE_OPERAND (loop_clause, 0) = NULL;
+                }
+            }
+        }
+      if (add_auto_clause)
+        {
+          tree auto_clause
+            = build_omp_clause (gimple_location (stmt), OMP_CLAUSE_AUTO);
+          OMP_CLAUSE_CHAIN (auto_clause) = loop_clauses;
+          gimple_omp_for_set_clauses (stmt, auto_clause);
+        }
+    }
+
+  return NULL;
+}
+
  /* Helper for make_region_loop_nest.  Transform OpenACC 'kernels'/'loop'
     construct clauses into OpenACC 'parallel'/'loop' construct ones.  */

  static tree
  transform_kernels_loop_clauses (gimple *omp_for,
  				tree num_gangs_clause,
+				tree num_workers_clause,
+				tree vector_length_clause,
  				tree clauses)
  {
    /* If this loop in a kernels region does not have an explicit
       "independent", "seq", or "auto" clause, we must give it an explicit
-     "auto" clause. */
+     "auto" clause.
+     We also check for "gang(num: N)" clauses.  These must not appear in
+     kernels regions that have their own "num_gangs" clause. Otherwise, they
+     must be converted and put on the region; similarly for workers and
+     vectors.  */
    bool add_auto_clause = true;
+  tree loop_gang_clause = NULL, loop_worker_clause = NULL,
+       loop_vector_clause = NULL;
    tree loop_clauses = gimple_omp_for_clauses (omp_for);
-  for (tree c = loop_clauses; c; c = OMP_CLAUSE_CHAIN (c))
+  for (tree loop_clause = loop_clauses;
+       loop_clause;
+       loop_clause = OMP_CLAUSE_CHAIN (loop_clause))
      {
-      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_AUTO
-          || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_INDEPENDENT
-          || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SEQ)
-        {
-          add_auto_clause = false;
-          break;
+      /* Look for gang, worker, and vector clauses.  */
+      bool found_num_clause = false;
+      tree *clause_ptr, clause_to_check;
+      switch (OMP_CLAUSE_CODE (loop_clause))
+         {
+          case OMP_CLAUSE_GANG:
+            found_num_clause = true;
+            clause_ptr = &loop_gang_clause;
+            clause_to_check = num_gangs_clause;
+            break;
+          case OMP_CLAUSE_WORKER:
+            found_num_clause = true;
+            clause_ptr = &loop_worker_clause;
+            clause_to_check = num_workers_clause;
+            break;
+          case OMP_CLAUSE_VECTOR:
+            found_num_clause = true;
+            clause_ptr = &loop_vector_clause;
+            clause_to_check = vector_length_clause;
+            break;
+          case OMP_CLAUSE_INDEPENDENT:
+          case OMP_CLAUSE_SEQ:
+          case OMP_CLAUSE_AUTO:
+            add_auto_clause = false;
+          default:
+            break;
          }
-    }
+      if (found_num_clause && OMP_CLAUSE_OPERAND (loop_clause, 0) != NULL)
+        {
+          if (clause_to_check)
+            {
+              const char *clause_name
+                = omp_clause_code_name[OMP_CLAUSE_CODE (loop_clause)];
+              const char *parent_clause_name
+                = omp_clause_code_name[OMP_CLAUSE_CODE (clause_to_check)];
+              error_at (OMP_CLAUSE_LOCATION (loop_clause),
+                        "argument not permitted on %qs clause"
+                        " in OpenACC %<kernels%> region with a %qs clause",
+                        clause_name, parent_clause_name);
+              inform (OMP_CLAUSE_LOCATION (clause_to_check),
+                      "location of OpenACC %<kernels%> region");
+            }
+          /* Copy the gang(N)/worker(N)/vector(N) clause to the enclosing
+             parallel region.  */
+          *clause_ptr = unshare_expr (loop_clause);
+          OMP_CLAUSE_CHAIN (*clause_ptr) = NULL;
+          /* Leave a gang/worker/vector clause on the loop, but without a
+             numeric argument.  */
+          OMP_CLAUSE_OPERAND (loop_clause, 0) = NULL;
+         }
+     }
    if (add_auto_clause)
      {
        tree auto_clause = build_omp_clause (gimple_location (omp_for),
                                             OMP_CLAUSE_AUTO);
        OMP_CLAUSE_CHAIN (auto_clause) = loop_clauses;
-      gimple_omp_for_set_clauses (omp_for, auto_clause);
-    }
-
-  /* If the kernels region had a num_gangs clause, add that to this new
-     parallel region.  */
-  if (num_gangs_clause != NULL)
-    {
-      tree parallel_num_gangs_clause = unshare_expr (num_gangs_clause);
-      OMP_CLAUSE_CHAIN (parallel_num_gangs_clause) = clauses;
-      clauses = parallel_num_gangs_clause;
+      loop_clauses = auto_clause;
      }
+  gimple_omp_for_set_clauses (omp_for, loop_clauses);
+  /* We must also recurse into the loop; it might contain nested loops
+     having their own "worker(num: W)" or "vector(length: V)" annotations.
+     Turn these into worker/vector annotations on the parallel region.  */
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  struct adjust_nested_loop_clauses_wi_info wi_info;
+  wi_info.loop_gang_clause_ptr = &loop_gang_clause;
+  wi_info.loop_worker_clause_ptr = &loop_worker_clause;
+  wi_info.loop_vector_clause_ptr = &loop_vector_clause;
+  wi.info = &wi_info;
+  gimple *body = gimple_omp_body (omp_for);
+  walk_gimple_seq (body, adjust_nested_loop_clauses, NULL, &wi);
+  /* Check if there were conflicting numbers of workers or vector lanes.  */
+  if (loop_gang_clause != NULL &&
+      OMP_CLAUSE_OPERAND (loop_gang_clause, 0) == NULL)
+    loop_gang_clause = NULL;
+  if (loop_worker_clause != NULL &&
+      OMP_CLAUSE_OPERAND (loop_worker_clause, 0) == NULL)
+    loop_worker_clause = NULL;
+  if (loop_vector_clause != NULL &&
+      OMP_CLAUSE_OPERAND (loop_vector_clause, 0) == NULL)
+    vector_length_clause = NULL;
+
+  /* If the kernels region had num_gangs, num_worker, vector_length clauses,
+     add these to this new parallel region.  */
+  clauses
+    = add_parent_or_loop_num_clause (num_gangs_clause, loop_gang_clause,
+				     OMP_CLAUSE_NUM_GANGS, clauses);
+  clauses
+    = add_parent_or_loop_num_clause (num_workers_clause, loop_worker_clause,
+				     OMP_CLAUSE_NUM_WORKERS, clauses);
+  clauses
+    = add_parent_or_loop_num_clause (vector_length_clause, loop_vector_clause,
+				     OMP_CLAUSE_VECTOR_LENGTH, clauses);

    return clauses;
  }
@@ -197,18 +473,33 @@ transform_kernels_loop_clauses (gimple *omp_for,
     STMT, which must be identical to, or a bind containing, the loop OMP_FOR
     with OpenACC loop annotations.

-   The newly created region is annotated with the optional NUM_GANGS_CLAUSE
-   as well as the other CLAUSES, which must not contain a num_gangs clause.  */
+   The NUM_GANGS_CLAUSE, NUM_WORKERS_CLAUSE, and VECTOR_LENGTH_CLAUSE are
+   optional clauses from the original kernels region and must not be
+   contained in the other CLAUSES. The newly created region is annotated
+   with the optional NUM_GANGS_CLAUSE as well as the other CLAUSES. If there
+   is no NUM_GANGS_CLAUSE but the loop has a "gang(num: N)" clause, that is
+   converted to a "num_gangs(N)" clause on the new region, and similarly for
+   workers and vectors.
+
+   The outermost loop gets an auto clause unless there already is an
+   independent/seq/auto clause or a gang/worker/vector annotation.  Nested
+   loops inside OMP_FOR are treated similarly by the
+   adjust_nested_loop_clauses function.  */

  static gimple *
  make_gang_parallel_loop_region (gimple *omp_for, gimple *stmt,
-                                tree num_gangs_clause, tree clauses)
+                                tree num_gangs_clause,
+                                tree num_workers_clause,
+                                tree vector_length_clause,
+                                tree clauses)
  {
    /* This correctly unshares the entire clause chain rooted here.  */
    clauses = unshare_expr (clauses);

    clauses = transform_kernels_loop_clauses (omp_for,
  					    num_gangs_clause,
+					    num_workers_clause,
+					    vector_length_clause,
  					    clauses);

    /* Now build the parallel region containing this loop.  */
@@ -596,23 +887,43 @@ decompose_kernels_region_body (gimple *kernels_region, 
tree kernels_clauses)
    location_t loc = gimple_location (kernels_region);

    /* The kernels clauses will be propagated to the child clauses unmodified,
-     except that that num_gangs clause will only be added to loop regions.
-     The other regions are "gang-single" and get an explicit num_gangs(1)
-     clause.  So separate out the num_gangs clause here.  */
-  tree num_gangs_clause = NULL, prev_clause = NULL;
+     except that the num_gangs, num_workers, and vector_length clauses will
+     only be added to loop regions.  The other regions are "gang-single" and
+     get an explicit num_gangs(1) clause.  So separate out the num_gangs,
+     num_workers, and vector_length clauses here.  */
+  tree num_gangs_clause = NULL, num_workers_clause = NULL,
+       vector_length_clause = NULL;
+  tree prev_clause = NULL, next_clause = NULL;
    tree parallel_clauses = kernels_clauses;
-  for (tree c = parallel_clauses; c; c = OMP_CLAUSE_CHAIN (c))
+  for (tree c = parallel_clauses; c; c = next_clause)
      {
-      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_NUM_GANGS)
+      /* Preserve this here, as we might NULL it later.  */
+      next_clause = OMP_CLAUSE_CHAIN (c);
+
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_NUM_GANGS
+          || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_NUM_WORKERS
+          || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_VECTOR_LENGTH)
          {
            /* Cut this clause out of the chain.  */
-          num_gangs_clause = c;
            if (prev_clause != NULL)
              OMP_CLAUSE_CHAIN (prev_clause) = OMP_CLAUSE_CHAIN (c);
            else
              kernels_clauses = OMP_CLAUSE_CHAIN (c);
-          OMP_CLAUSE_CHAIN (num_gangs_clause) = NULL;
-          break;
+          OMP_CLAUSE_CHAIN (c) = NULL;
+          switch (OMP_CLAUSE_CODE (c))
+            {
+              case OMP_CLAUSE_NUM_GANGS:
+                num_gangs_clause = c;
+                break;
+              case OMP_CLAUSE_NUM_WORKERS:
+                num_workers_clause = c;
+                break;
+              case OMP_CLAUSE_VECTOR_LENGTH:
+                vector_length_clause = c;
+                break;
+              default:
+                gcc_unreachable ();
+            }
          }
        else
          prev_clause = c;
@@ -735,6 +1046,8 @@ decompose_kernels_region_body (gimple *kernels_region, tree 
kernels_clauses)
            gimple *parallel_region
              = make_gang_parallel_loop_region (omp_for, stmt,
                                                num_gangs_clause,
+                                              num_workers_clause,
+                                              vector_length_clause,
                                                kernels_clauses);
            gimple_seq_add_stmt (&region_body, parallel_region);
          }
@@ -752,6 +1065,9 @@ decompose_kernels_region_body (gimple *kernels_region, tree 
kernels_clauses)
                  && DECL_ARTIFICIAL (gimple_assign_lhs (stmt)));
            if (!is_simple_assignment)
              only_simple_assignments = false;
+          /* Remove and issue warnings about gang clauses on any OpenACC
+             loops nested inside this sequentially executed statement.  */
+          make_loops_gang_single (gsi);
          }
      }

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 4ebfa83..23d7455 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1310,7 +1310,7 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned 
outer_mask)

  static unsigned
  oacc_loop_auto_partitions (oacc_loop *loop, unsigned outer_mask,
-			   bool outer_assign)
+			   bool outer_assign, bool is_oacc_gang_single)
  {
    bool assign = (loop->flags & OLF_AUTO) && (loop->flags & OLF_INDEPENDENT);
    bool noisy = true;
@@ -1328,6 +1328,10 @@ oacc_loop_auto_partitions (oacc_loop *loop, unsigned 
outer_mask,
  	 non-innermost available level.  */
        unsigned this_mask = GOMP_DIM_MASK (GOMP_DIM_GANG);

+      /* Gang partitioning is not available in a gang-single region.  */
+      if (is_oacc_gang_single)
+        this_mask = GOMP_DIM_MASK (GOMP_DIM_WORKER);
+
        /* Find the first outermost available partition. */
        while (this_mask <= outer_mask)
  	this_mask <<= 1;
@@ -1357,7 +1361,8 @@ oacc_loop_auto_partitions (oacc_loop *loop, unsigned 
outer_mask,
      {
        unsigned tmp_mask = outer_mask | loop->mask | loop->e_mask;
        loop->inner = oacc_loop_auto_partitions (loop->child, tmp_mask,
-					       outer_assign | assign);
+					       outer_assign | assign,
+					       is_oacc_gang_single);
      }

    if (assign && (!loop->mask || (tiling && !loop->e_mask) || !outer_assign))
@@ -1416,7 +1421,8 @@ oacc_loop_auto_partitions (oacc_loop *loop, unsigned 
outer_mask,

    if (loop->sibling)
      inner_mask |= oacc_loop_auto_partitions (loop->sibling,
-					     outer_mask, outer_assign);
+					     outer_mask, outer_assign,
+					     is_oacc_gang_single);

    inner_mask |= loop->inner | loop->mask | loop->e_mask;

@@ -1427,14 +1433,16 @@ oacc_loop_auto_partitions (oacc_loop *loop, unsigned 
outer_mask,
     axes.  Return mask of partitioning.  */

  static unsigned
-oacc_loop_partition (oacc_loop *loop, unsigned outer_mask)
+oacc_loop_partition (oacc_loop *loop, unsigned outer_mask,
+                     bool is_oacc_gang_single)
  {
    unsigned mask_all = oacc_loop_fixed_partitions (loop, outer_mask);

    if (mask_all & GOMP_DIM_MASK (GOMP_DIM_MAX))
      {
        mask_all ^= GOMP_DIM_MASK (GOMP_DIM_MAX);
-      mask_all |= oacc_loop_auto_partitions (loop, outer_mask, false);
+      mask_all |= oacc_loop_auto_partitions (loop, outer_mask, false,
+                                             is_oacc_gang_single);
      }
    return mask_all;
  }
@@ -1573,7 +1581,9 @@ execute_oacc_device_lower ()
      }

    unsigned outer_mask = fn_level >= 0 ? GOMP_DIM_MASK (fn_level) - 1 : 0;
-  unsigned used_mask = oacc_loop_partition (loops, outer_mask);
+  unsigned used_mask = oacc_loop_partition (loops, outer_mask,
+ is_oacc_parallel_kernels_gang_single);
+
    /* OpenACC kernels constructs are special: they currently don't use the
       generic oacc_loop infrastructure and attribute/dimension processing.  */
    if (is_oacc_kernels && is_oacc_kernels_parallelized)
-- 
2.8.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 07/10, OpenACC] Launch kernels asynchronously in OpenACC kernels regions
  2019-07-17 21:03 ` [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions Kwok Cheung Yeung
                     ` (5 preceding siblings ...)
  2019-07-17 21:13   ` [PATCH 06/10, OpenACC] Adjust parallelism of loops in gang-single parts of OpenACC " Kwok Cheung Yeung
@ 2019-07-17 21:24   ` Kwok Cheung Yeung
  2019-07-17 21:30   ` [PATCH 08/10, OpenACC] New OpenACC kernels region decompose algorithm Kwok Cheung Yeung
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: Kwok Cheung Yeung @ 2019-07-17 21:24 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Thomas Schwinge

Kernels regions are decomposed into one or more smaller regions that are to be 
executed in sequence. With this patch, all of these regions are launched 
asynchronously, and a wait directive is added after them. This means that the 
host only waits once for the kernels to complete, not once per kernel. If the 
original kernels region was marked async, that asynchronous behavior is 
preserved, and no wait is added.

2019-07-16  Gergö Barany  <gergo@codesourcery.com>

	gcc/
	* omp-oacc-kernels.c (add_async_clauses_and_wait): New function...
	(decompose_kernels_region_body): ... called from here.

	gcc/testsuite/
	* c-c++-common/goacc/kernels-conversion.c: Test automatically generated
	async clauses.
	* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
---
  gcc/omp-oacc-kernels.c                             | 56 ++++++++++++++++++++--
  .../c-c++-common/goacc/kernels-conversion.c        |  5 ++
  .../gfortran.dg/goacc/kernels-conversion.f95       |  5 ++
  3 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c
index 11a960c..0fae74a 100644
--- a/gcc/omp-oacc-kernels.c
+++ b/gcc/omp-oacc-kernels.c
@@ -66,7 +66,13 @@ along with GCC; see the file COPYING3.  If not see
       gang-parallelizable loop inside an if statement is "gang-serialized" by
       the transformation.
       The transformation visits loops inside such new gang-single-regions and
-     removes and warns about any gang annotations.  */
+     removes and warns about any gang annotations.
+   - In order to make the host wait only once for the whole region instead
+     of once per kernel launch, the new parallel and serial regions are
+     annotated async.  Unless the original kernels region was marked async,
+     the entire region ends with a wait construct.  If the original kernels
+     region was marked async, the generated async statements use the async
+     queue the kernels region was annotated with (possibly implicitly).  */

  /* Helper function for decompose_kernels_region_body.  If STMT contains a
     "top-level" OMP_FOR statement, returns a pointer to that statement;
@@ -676,6 +682,38 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
    return body;
  }

+/* Helper function of decompose_kernels_region_body.  The statements in
+   REGION_BODY are expected to be decomposed parallel regions; add an
+   "async" clause to each.  Also add a "wait" pragma at the end of the
+   sequence.  */
+
+static void
+add_async_clauses_and_wait (location_t loc, gimple_seq *region_body)
+{
+  tree default_async_queue
+    = build_int_cst (integer_type_node, GOMP_ASYNC_NOVAL);
+  for (gimple_stmt_iterator gsi = gsi_start (*region_body);
+       !gsi_end_p (gsi);
+       gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+      tree target_clauses = gimple_omp_target_clauses (stmt);
+      tree new_async_clause = build_omp_clause (loc, OMP_CLAUSE_ASYNC);
+      OMP_CLAUSE_OPERAND (new_async_clause, 0) = default_async_queue;
+      OMP_CLAUSE_CHAIN (new_async_clause) = target_clauses;
+      target_clauses = new_async_clause;
+      gimple_omp_target_set_clauses (as_a <gomp_target *> (stmt),
+                                     target_clauses);
+    }
+  /* A "#pragma acc wait" is just a call GOACC_wait (acc_async_sync, 0).  */
+  tree wait_fn = builtin_decl_explicit (BUILT_IN_GOACC_WAIT);
+  tree sync_arg = build_int_cst (integer_type_node, GOMP_ASYNC_SYNC);
+  gimple *wait_call = gimple_build_call (wait_fn, 2,
+                                         sync_arg, integer_zero_node);
+  gimple_set_location (wait_call, loc);
+  gimple_seq_add_stmt (region_body, wait_call);
+}
+
  /* Auxiliary analysis of the body of a kernels region, to determine for each
     OpenACC loop whether it is control-dependent (i.e., not necessarily
     executed every time the kernels region is entered) or not.
@@ -890,10 +928,12 @@ decompose_kernels_region_body (gimple *kernels_region, 
tree kernels_clauses)
       except that the num_gangs, num_workers, and vector_length clauses will
       only be added to loop regions.  The other regions are "gang-single" and
       get an explicit num_gangs(1) clause.  So separate out the num_gangs,
-     num_workers, and vector_length clauses here.  */
+     num_workers, and vector_length clauses here.
+     Also check for the presence of an async clause but do not remove it
+     from the kernels clauses.  */
    tree num_gangs_clause = NULL, num_workers_clause = NULL,
         vector_length_clause = NULL;
-  tree prev_clause = NULL, next_clause = NULL;
+  tree prev_clause = NULL, next_clause = NULL, async_clause = NULL;
    tree parallel_clauses = kernels_clauses;
    for (tree c = parallel_clauses; c; c = next_clause)
      {
@@ -927,6 +967,8 @@ decompose_kernels_region_body (gimple *kernels_region, tree 
kernels_clauses)
          }
        else
          prev_clause = c;
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_ASYNC)
+        async_clause = c;
      }

    gimple *kernels_body = gimple_omp_body (kernels_region);
@@ -1090,6 +1132,14 @@ decompose_kernels_region_body (gimple *kernels_region, 
tree kernels_clauses)
        gimple_seq_add_stmt (&region_body, single_region);
      }

+  /* We want to launch these kernels asynchronously.  If the original
+     kernels region had an async clause, this is done automatically because
+     that async clause was copied to the individual regions we created.
+     Otherwise, add an async clause to each newly created region, as well as
+     a wait directive at the end.  */
+  if (async_clause == NULL)
+    add_async_clauses_and_wait (loc, &region_body);
+
    tree kernels_locals = gimple_bind_vars (as_a <gbind *> (kernels_body));
    gimple *body = gimple_build_bind (kernels_locals, region_body,
                                      make_node (BLOCK));
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
index ed4d642..3e52ec4 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
@@ -49,5 +49,10 @@ main (void)
  /* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 3 
"convert_oacc_kernels" } } */
  /* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 2 
"convert_oacc_kernels" } } */

+/* Each of the parallel regions is async, and there is a final call to
+   __builtin_GOACC_wait.  */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels.* async\(-1\)" 5 
"convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "__builtin_GOACC_wait" 1 
"convert_oacc_kernels" } } */
+
  /* Check that the original kernels region is removed.  */
  /* { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
index f89e46b..559916c 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
@@ -47,5 +47,10 @@ end program main
  ! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 3 
"convert_oacc_kernels" } }
  ! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 2 
"convert_oacc_kernels" } }

+! Each of the parallel regions is async, and there is a final call to
+! __builtin_GOACC_wait.
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels.* async\(-1\)" 5 
"convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "__builtin_GOACC_wait" 1 
"convert_oacc_kernels" } }
+
  ! Check that the original kernels region is removed.
  ! { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } }
-- 
2.8.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 08/10, OpenACC] New OpenACC kernels region decompose algorithm
  2019-07-17 21:03 ` [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions Kwok Cheung Yeung
                     ` (6 preceding siblings ...)
  2019-07-17 21:24   ` [PATCH 07/10, OpenACC] Launch kernels asynchronously in " Kwok Cheung Yeung
@ 2019-07-17 21:30   ` Kwok Cheung Yeung
  2019-07-17 21:32   ` [PATCH 09/10, OpenACC] Avoid introducing 'create' mapping clauses for loop index variables in kernels regions Kwok Cheung Yeung
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: Kwok Cheung Yeung @ 2019-07-17 21:30 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Thomas Schwinge

Previously, OpenACC kernels region bodies were decomposed into a sequence of 
alternating gang-single and gang-parallel "parallel" regions. The new algorithm 
in this patch introduces a third possibility: Loops that look like they might 
benefit from the parloops pass are converted into old "kernels" regions, 
exposing them to the parloops pass later on. This has the benefit that loops 
that cannot be parallelized are not offloaded to the GPU.

2019-07-16  Thomas Schwinge  <thomas@codesourcery.com>

	gcc/
	* omp-oacc-kernels.c (adjust_region_code_walk_stmt_fn)
	(adjust_region_code): New functions.
	(make_loops_gang_single): Update.
	(make_gang_single_region): Rename to...
	(make_region_seq): ... this, and update.
	(make_gang_parallel_loop_region): Rename to...
	(make_region_loop_nest): ... this, and update.
	(is_unconditional_oacc_for_loop): Remove stmt parameter and check.
	(decompose_kernels_region_body): Update.

	gcc/testsuite/
	* c-c++-common/goacc/kernels-conversion.c: Adjust test.
	* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
	* c-c++-common/goacc/kernels-decompose-1.c: New file.
	* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: New
	file.
---
  gcc/omp-oacc-kernels.c                             | 293 +++++++++++++++++----
  .../c-c++-common/goacc/kernels-conversion.c        |  19 +-
  .../c-c++-common/goacc/kernels-decompose-1.c       | 123 +++++++++
  .../gfortran.dg/goacc/kernels-conversion.f95       |  22 +-
  .../gfortran.dg/goacc/kernels-decompose-1.f95      | 132 ++++++++++
  .../kernels-decompose-1.c                          |  30 +++
  6 files changed, 553 insertions(+), 66 deletions(-)
  create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
  create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
  create mode 100644 
libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c

diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c
index 0fae74a..d65e6c6 100644
--- a/gcc/omp-oacc-kernels.c
+++ b/gcc/omp-oacc-kernels.c
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "gimple-iterator.h"
  #include "gimple-walk.h"
  #include "gomp-constants.h"
+#include "omp-general.h"

  /* This is a preprocessing pass to be run immediately before lower_omp.  It
     will convert OpenACC "kernels" regions into sequences of "parallel"
@@ -135,6 +136,95 @@ top_level_omp_for_in_stmt (gimple *stmt)
    return NULL;
  }

+/* Helper for adjust_region_code: evaluate the statement at GSI_P.  */
+
+static tree
+adjust_region_code_walk_stmt_fn (gimple_stmt_iterator *gsi_p,
+				 bool *handled_ops_p,
+				 struct walk_stmt_info *wi)
+{
+  int *region_code = (int *) wi->info;
+
+  gimple *stmt = gsi_stmt (*gsi_p);
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_OMP_FOR:
+      {
+	tree clauses = gimple_omp_for_clauses (stmt);
+	if (omp_find_clause (clauses, OMP_CLAUSE_INDEPENDENT))
+	  {
+	    /* Explicit 'independent' clause.  */
+	    /* Keep going; recurse into loop body.  */
+	    break;
+	  }
+	else if (omp_find_clause (clauses, OMP_CLAUSE_SEQ))
+	  {
+	    /* Explicit 'seq' clause.  */
+	    /* We'll "parallelize" if at some level a loop construct has been
+	       marked up by the user as unparallelizable ('seq' clause; we'll
+	       respect that in the later processing).  Given that the user has
+	       explicitly marked it up, this loop construct cannot be
+	       performance-critical (and we thus don't have to "avoid
+	       offloading"), and in this case it's also fine to "parallelize"
+	       instead of "gang-single", because any outer or inner loops may
+	       still exploit the available parallelism.  */
+	    /* Keep going; recurse into loop body.  */
+	    break;
+	  }
+	else
+	  {
+	    /* Explicit or implicit 'auto' clause.  */
+	    /* The user would like this loop analyzed ('auto' clause) and
+	       typically parallelized, but we don't have available yet the
+	       compiler logic to analyze this, so can't parallelize it here, so
+	       we'd very likely be running into a performance problem if we
+	       were to execute this unparallelized, thus forward the whole loop
+	       nest to "parloops".  */
+	    *region_code = GF_OMP_TARGET_KIND_OACC_KERNELS;
+	    /* Terminate: final decision for this region.  */
+	    *handled_ops_p = true;
+	    return integer_zero_node;
+	  }
+	gcc_unreachable ();
+      }
+
+    case GIMPLE_COND:
+    case GIMPLE_GOTO:
+    case GIMPLE_SWITCH:
+    case GIMPLE_ASM:
+    case GIMPLE_TRANSACTION:
+    case GIMPLE_RETURN:
+      /* Statement that might constitute some looping/control flow pattern.  */
+      /* The user would like this code analyzed (implicit inside a 'kernels'
+	 region) and typically parallelized, but we don't have available yet
+	 the compiler logic to analyze this, so can't parallelize it here, so
+	 we'd very likely be running into a performance problem if we were to
+	 execute this unparallelized, thus forward the whole thing to
+	 "parloops".  */
+      *region_code = GF_OMP_TARGET_KIND_OACC_KERNELS;
+      /* Terminate: final decision for this region.  */
+      *handled_ops_p = true;
+      return integer_zero_node;
+
+    default:
+      /* Keep going.  */
+      break;
+    }
+
+  return NULL;
+}
+
+/* Adjust the REGION_CODE for the region in GS.  */
+
+static void
+adjust_region_code (gimple_seq gs, int *region_code)
+{
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  wi.info = region_code;
+  walk_gimple_seq (gs, adjust_region_code_walk_stmt_fn, NULL, &wi);
+}
+
  /* Helper function for make_loops_gang_single for walking the tree. If the
     statement indicated by GSI_P is an OpenACC for loop with a gang clause,
     issue a warning and remove the clause.  */
@@ -174,6 +264,7 @@ visit_loops_in_gang_single_region (gimple_stmt_iterator *gsi_p,
        gimple_omp_for_set_clauses (stmt, clauses);
        /* No need to recurse into nested statements; no loop nested inside
           this loop can be gang-partitioned.  */
+      sorry ("'gang' loop in \"gang-single\" region");
        *handled_ops_p = true;
        break;

@@ -184,16 +275,16 @@ visit_loops_in_gang_single_region (gimple_stmt_iterator 
*gsi_p,
    return NULL;
  }

-/* Visit all nested OpenACC loops in the statement indicated by GSI.  This
+/* Visit all nested OpenACC loops in the sequence indicated by GS.  This
     statement is expected to be inside a gang-single region.  Issue a warning
     for any loops inside it that have gang clauses and remove the clauses.  */

  static void
-make_loops_gang_single (gimple_stmt_iterator gsi)
+make_loops_gang_single (gimple_seq gs)
  {
    struct walk_stmt_info wi;
    memset (&wi, 0, sizeof (wi));
-  walk_gimple_stmt (&gsi, visit_loops_in_gang_single_region, NULL, &wi);
+  walk_gimple_seq (gs, visit_loops_in_gang_single_region, NULL, &wi);
  }

  /* Construct a "gang-single" OpenACC parallel region at LOC containing the
@@ -202,21 +293,75 @@ make_loops_gang_single (gimple_stmt_iterator gsi)
     to force gang-single execution.  */

  static gimple *
-make_gang_single_region (location_t loc, gimple_seq stmts, tree clauses)
+make_region_seq (location_t loc, gimple_seq stmts,
+		 tree num_gangs_clause,
+		 tree num_workers_clause,
+		 tree vector_length_clause,
+		 tree clauses)
  {
    /* This correctly unshares the entire clause chain rooted here.  */
    clauses = unshare_expr (clauses);
-  /* Make a num_gangs(1) clause.  */
-  tree gang_single_clause = build_omp_clause (loc, OMP_CLAUSE_NUM_GANGS);
-  OMP_CLAUSE_OPERAND (gang_single_clause, 0) = integer_one_node;
-  OMP_CLAUSE_CHAIN (gang_single_clause) = clauses;
+
+  dump_user_location_t loc_stmts_first = gimple_seq_first (stmts);
+
+  /* Figure out the region code for this region.  */
+  /* Optimistic default: assume "setup code", no looping; thus not
+     performance-critical.  */
+  int region_code = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE;
+  adjust_region_code (stmts, &region_code);
+
+  if (region_code == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc_stmts_first,
+			 "beginning \"gang-single\" region in OpenACC"
+			 " 'kernels' construct\n");
+
+      /* Make a num_gangs(1) clause.  */
+      tree gang_single_clause = build_omp_clause (loc, OMP_CLAUSE_NUM_GANGS);
+      OMP_CLAUSE_OPERAND (gang_single_clause, 0) = integer_one_node;
+      OMP_CLAUSE_CHAIN (gang_single_clause) = clauses;
+      clauses = gang_single_clause;
+
+      /* Remove and issue warnings about gang clauses on any OpenACC
+	 loops nested inside this sequentially executed statement.  */
+      make_loops_gang_single (stmts);
+    }
+  else if (region_code == GF_OMP_TARGET_KIND_OACC_KERNELS)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc_stmts_first,
+			 "beginning \"parloops\" region in OpenACC"
+			 " 'kernels' construct\n");
+
+      /* As we're transforming a "GF_OMP_TARGET_KIND_OACC_KERNELS" into another
+	 "GF_OMP_TARGET_KIND_OACC_KERNELS", this isn't doing any of the clauses
+	 mangling that "make_region_loop_nest" is doing.  */
+      /* Re-assemble the clauses stripped off earlier.  */
+      if (num_gangs_clause != NULL)
+	{
+	  tree c = unshare_expr (num_gangs_clause);
+	  OMP_CLAUSE_CHAIN (c) = clauses;
+	  clauses = c;
+	}
+      if (num_workers_clause != NULL)
+	{
+	  tree c = unshare_expr (num_workers_clause);
+	  OMP_CLAUSE_CHAIN (c) = clauses;
+	  clauses = c;
+	}
+      if (vector_length_clause != NULL)
+	{
+	  tree c = unshare_expr (vector_length_clause);
+	  OMP_CLAUSE_CHAIN (c) = clauses;
+	  clauses = c;
+	}
+    }
+  else
+    gcc_unreachable ();

    /* Build the gang-single region.  */
-  gimple *single_region
-    = gimple_build_omp_target (
-        NULL,
-        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE,
-        gang_single_clause);
+  gimple *single_region = gimple_build_omp_target (NULL, region_code, clauses);
    gimple_set_location (single_region, loc);
    gbind *single_body = gimple_build_bind (NULL, stmts, make_node (BLOCK));
    gimple_omp_set_body (single_region, single_body);
@@ -224,7 +369,7 @@ make_gang_single_region (location_t loc, gimple_seq stmts, 
tree clauses)
    return single_region;
  }

-/* Helper function for make_gang_parallel_loop_region.  Adds a num_gangs
+/* Helper function for make_region_loop_nest.  Adds a num_gangs
     (num_workers, vector_length) clause to the given CLAUSES, either the one
     from the parent region (PARENT_CLAUSE) or a new one based on the loop's
     own LOOP_CLAUSE ("gang(num: N)" or similar for workers or vectors) with
@@ -256,7 +401,7 @@ add_parent_or_loop_num_clause (tree parent_clause, tree 
loop_clause,
    return clauses;
  }

-/* Helper for make_gang_parallel_loop_region, looking for "worker(num: N)"
+/* Helper for make_region_loop_nest, looking for "worker(num: N)"
     or "vector(length: N)" clauses in nested loops.  Removes the numeric
     argument, transferring it to the enclosing parallel region (via
     WI->INFO).  If numeric arguments within the same loop nest conflict,
@@ -493,32 +638,65 @@ transform_kernels_loop_clauses (gimple *omp_for,
     adjust_nested_loop_clauses function.  */

  static gimple *
-make_gang_parallel_loop_region (gimple *omp_for, gimple *stmt,
-                                tree num_gangs_clause,
-                                tree num_workers_clause,
-                                tree vector_length_clause,
-                                tree clauses)
+make_region_loop_nest (gimple *omp_for, gimple_seq stmts,
+		       tree num_gangs_clause,
+		       tree num_workers_clause,
+		       tree vector_length_clause,
+		       tree clauses)
  {
    /* This correctly unshares the entire clause chain rooted here.  */
    clauses = unshare_expr (clauses);

-  clauses = transform_kernels_loop_clauses (omp_for,
-					    num_gangs_clause,
-					    num_workers_clause,
-					    vector_length_clause,
-					    clauses);
+  /* Figure out the region code for this region.  */
+  /* Optimistic default: assume that the loop nest is parallelizable
+     (essentially, no GIMPLE_OMP_FOR with (explicit or implicit) 'auto' clause,
+     and no un-annotated loops).  */
+  int region_code = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED;
+  adjust_region_code (stmts, &region_code);
+
+  if (region_code == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, omp_for,
+			 "parallelized loop nest in OpenACC 'kernels'"
+			 " construct\n");
+
+      clauses = transform_kernels_loop_clauses (omp_for,
+						num_gangs_clause,
+						num_workers_clause,
+						vector_length_clause,
+						clauses);
+    }
+  else if (region_code == GF_OMP_TARGET_KIND_OACC_KERNELS)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, omp_for,
+			 "forwarded loop nest in OpenACC 'kernels' construct"
+			 " to \"parloops\" for analysis\n");
+
+      /* We're transforming one "GF_OMP_TARGET_KIND_OACC_KERNELS" into another
+	 "GF_OMP_TARGET_KIND_OACC_KERNELS", so don't have to
+	 "transform_kernels_loop_clauses".  */
+      /* Re-assemble the clauses stripped off earlier.  */
+      clauses
+	= add_parent_or_loop_num_clause (num_gangs_clause, NULL,
+					 OMP_CLAUSE_NUM_GANGS, clauses);
+      clauses
+	= add_parent_or_loop_num_clause (num_workers_clause, NULL,
+					 OMP_CLAUSE_NUM_WORKERS, clauses);
+      clauses
+	= add_parent_or_loop_num_clause (vector_length_clause, NULL,
+					 OMP_CLAUSE_VECTOR_LENGTH, clauses);
+    }
+  else
+    gcc_unreachable ();

    /* Now build the parallel region containing this loop.  */
-  gimple_seq parallel_body = NULL;
-  gimple_seq_add_stmt (&parallel_body, stmt);
    gimple *parallel_body_bind
-    = gimple_build_bind (NULL, parallel_body, make_node (BLOCK));
+    = gimple_build_bind (NULL, stmts, make_node (BLOCK));
    gimple *parallel_region
-    = gimple_build_omp_target (
-        parallel_body_bind,
-        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED,
-        clauses);
-  gimple_set_location (parallel_region, gimple_location (stmt));
+    = gimple_build_omp_target (parallel_body_bind, region_code, clauses);
+  gimple_set_location (parallel_region, gimple_location (omp_for));

    return parallel_region;
  }
@@ -744,9 +922,9 @@ class control_flow_regions
         information for the statement sequence SEQ.  */
      control_flow_regions (gimple_seq seq);

-    /* Return true if the STMT with the given index IDX in the analyzed
+    /* Return true if the statement with the given index IDX in the analyzed
         statement sequence is an unconditionally executed OpenACC loop.  */
-    bool is_unconditional_oacc_for_loop (gimple *stmt, size_t idx);
+    bool is_unconditional_oacc_for_loop (size_t idx);

    private:
      /* Find the region representative for the statement identified by index
@@ -777,11 +955,8 @@ control_flow_regions::control_flow_regions (gimple_seq seq)
  }

  bool
-control_flow_regions::is_unconditional_oacc_for_loop (gimple *stmt, size_t idx)
+control_flow_regions::is_unconditional_oacc_for_loop (size_t idx)
  {
-  if (top_level_omp_for_in_stmt (stmt) == NULL)
-    /* Not an OpenACC for loop.  */
-    return false;
    if (idx == 0 || idx == representatives.length () - 1)
      /* The first or last statement in the kernels region.  This means that
         there is no room before or after it for a jump or a label.  Thus
@@ -917,7 +1092,7 @@ control_flow_regions::compute_regions (gimple_seq seq)
  }

  /* Decompose the body of the KERNELS_REGION, which was originally annotated
-   with the KERNELS_CLAUSES, into a series of parallel regions.  */
+   with the KERNELS_CLAUSES, into a series of regions.  */

  static gimple *
  decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
@@ -1057,17 +1232,24 @@ decompose_kernels_region_body (gimple *kernels_region, 
tree kernels_clauses)

        gimple *stmt = gsi_stmt (gsi);
        gimple *omp_for = top_level_omp_for_in_stmt (stmt);
+      bool is_unconditional_oacc_for_loop = false;
+      if (omp_for != NULL)
+	is_unconditional_oacc_for_loop
+	  = cf_regions.is_unconditional_oacc_for_loop (idx);
        if (omp_for != NULL
-          && cf_regions.is_unconditional_oacc_for_loop (stmt, idx))
+          && is_unconditional_oacc_for_loop)
          {
-          /* This is an OMP for statement, put it into a parallel region.
+          /* This is an OMP for statement, put it into a separate region.
               But first, construct a gang-single region containing any
               complex sequential statements we may have seen.  */
            if (gang_single_seq != NULL && !only_simple_assignments)
              {
                gimple *single_region
-                = make_gang_single_region (loc, gang_single_seq,
-                                           kernels_clauses);
+                = make_region_seq (loc, gang_single_seq,
+				   num_gangs_clause,
+				   num_workers_clause,
+				   vector_length_clause,
+				   kernels_clauses);
                gimple_seq_add_stmt (&region_body, single_region);
              }
            else if (gang_single_seq != NULL && only_simple_assignments)
@@ -1085,8 +1267,10 @@ decompose_kernels_region_body (gimple *kernels_region, 
tree kernels_clauses)
            gang_single_seq = NULL;
            only_simple_assignments = true;

+	  gimple_seq parallel_seq = NULL;
+	  gimple_seq_add_stmt (&parallel_seq, stmt);
            gimple *parallel_region
-            = make_gang_parallel_loop_region (omp_for, stmt,
+	    = make_region_loop_nest (omp_for, parallel_seq,
                                                num_gangs_clause,
                                                num_workers_clause,
                                                vector_length_clause,
@@ -1095,6 +1279,16 @@ decompose_kernels_region_body (gimple *kernels_region, 
tree kernels_clauses)
          }
        else
          {
+	  if (omp_for != NULL)
+	    {
+	      gcc_checking_assert (!is_unconditional_oacc_for_loop);
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, omp_for,
+				 "unparallelized loop nest in OpenACC"
+				 " 'kernels' region: it's executed"
+				 " conditionally\n");
+	    }
+
            /* This is not an unconditional OMP for statement, so it will be
               put into a gang-single region.  */
            gimple_seq_add_stmt (&gang_single_seq, stmt);
@@ -1107,9 +1301,6 @@ decompose_kernels_region_body (gimple *kernels_region, 
tree kernels_clauses)
                  && DECL_ARTIFICIAL (gimple_assign_lhs (stmt)));
            if (!is_simple_assignment)
              only_simple_assignments = false;
-          /* Remove and issue warnings about gang clauses on any OpenACC
-             loops nested inside this sequentially executed statement.  */
-          make_loops_gang_single (gsi);
          }
      }

@@ -1128,7 +1319,11 @@ decompose_kernels_region_body (gimple *kernels_region, 
tree kernels_clauses)
    if (gang_single_seq != NULL)
      {
        gimple *single_region
-        = make_gang_single_region (loc, gang_single_seq, kernels_clauses);
+        = make_region_seq (loc, gang_single_seq,
+			   num_gangs_clause,
+			   num_workers_clause,
+			   vector_length_clause,
+			   kernels_clauses);
        gimple_seq_add_stmt (&region_body, single_region);
      }

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
index 3e52ec4..ea7eec9 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
@@ -12,19 +12,22 @@ main (void)
    unsigned int sum = 1;

  #pragma acc kernels copyin(a[0:N]) copy(sum)
-  /* { dg-bogus "region contains gang partitoned code but is not gang 
partitioned" "gang partitioned" { xfail *-*-* } .-1 } */
    {
+    /* converted to "oacc_kernels" */
      #pragma acc loop
      for (i = 0; i < N; ++i)
        sum += a[i];

+    /* converted to "oacc_parallel_kernels_gang_single" */
      sum++;
      a[0]++;

-    #pragma acc loop
+    /* converted to "oacc_parallel_kernels_parallelized" */
+    #pragma acc loop independent
      for (i = 0; i < N; ++i)
        sum += a[i];

+    /* converted to "oacc_kernels" */
      if (sum > 10)
        {
          #pragma acc loop
@@ -32,7 +35,8 @@ main (void)
            sum += a[i];
        }

-    #pragma acc loop
+    /* converted to "oacc_kernels" */
+    #pragma acc loop auto
      for (i = 0; i < N; ++i)
        sum += a[i];
    }
@@ -44,10 +48,11 @@ main (void)
     parallel regions.  */
  /* { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 
"convert_oacc_kernels" } } */

-/* The three unconditional loop regions are parallelized, the sequential
-   part in between and the conditional loop are made gang-single.  */
-/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 3 
"convert_oacc_kernels" } } */
-/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 2 
"convert_oacc_kernels" } } */
+/* As noted in the comments above, we get one gang-single serial region; one
+   parallelized loop region; and three "old-style" kernel regions. */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 1 
"convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 1 
"convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "oacc_kernels" 3 "convert_oacc_kernels" } } */

  /* Each of the parallel regions is async, and there is a final call to
     __builtin_GOACC_wait.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
new file mode 100644
index 0000000..b5d58c3
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
@@ -0,0 +1,123 @@
+/* Test OpenACC 'kernels' construct decomposition.  */
+
+/* { dg-additional-options "-fopenacc-kernels=split" } */
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+/* { dg-additional-options "-O2" } for "parloops".  */
+
+/* See also "../../gfortran.dg/goacc/kernels-decompose-1.f95".  */
+
+#pragma acc routine gang
+extern int
+f_g (int);
+
+#pragma acc routine worker
+extern int
+f_w (int);
+
+#pragma acc routine vector
+extern int
+f_v (int);
+
+#pragma acc routine seq
+extern int
+f_s (int);
+
+int
+main ()
+{
+  int x, y, z;
+#define N 10
+  int a[N], b[N], c[N];
+
+#pragma acc kernels
+  {
+    x = 0; /* { dg-message "note: beginning .gang-single. region in OpenACC 
.kernels. construct" } */
+    y = x < 10;
+    z = x++;
+    ;
+  }
+
+#pragma acc kernels
+  for (int i = 0; i < N; i++) /* { dg-message "note: beginning .parloops. 
region in OpenACC .kernels. construct" } */
+    a[i] = 0;
+
+#pragma acc kernels loop
+  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to 
.parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (int i = 0; i < N; i++)
+    b[i] = a[N - i - 1];
+
+#pragma acc kernels
+  {
+#pragma acc loop
+    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct 
to .parloops. for analysis" "" { target *-*-* } .-1 } */
+    for (int i = 0; i < N; i++)
+      b[i] = a[N - i - 1];
+
+#pragma acc loop
+    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct 
to .parloops. for analysis" "" { target *-*-* } .-1 } */
+    for (int i = 0; i < N; i++)
+      c[i] = a[i] * b[i];
+
+    a[z] = 0; /* { dg-message "note: beginning .gang-single. region in OpenACC 
.kernels. construct" } */
+
+#pragma acc loop
+    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct 
to .parloops. for analysis" "" { target *-*-* } .-1 } */
+    for (int i = 0; i < N; i++)
+      c[i] += a[i];
+
+#pragma acc loop seq /* { dg-message "note: assigned OpenACC seq loop 
parallelism" } */
+    /* { dg-message "note: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+    for (int i = 0 + 1; i < N; i++)
+      c[i] += c[i - 1];
+  }
+
+#pragma acc kernels
+  {
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang loop 
parallelism" } */
+    /* { dg-message "note: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+    for (int i = 0; i < N; ++i)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC worker 
loop parallelism" } */
+      for (int j = 0; j < N; ++j)
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC seq loop 
parallelism" } */
+	 /* { dg-warning "insufficient partitioning available to parallelize loop" "" 
{ target *-*-* } .-1 } */
+	for (int k = 0; k < N; ++k)
+	  a[(i + j + k) % N]
+	    = b[j]
+	    + f_v (c[k]); /* { dg-message "note: assigned OpenACC vector loop 
parallelism" } */
+
+    //TODO Should the following turn into "gang-single" instead of "parloops"?
+    //TODO The problem is that the first STMT is "if (y <= 4) goto <D.2547>; 
else goto <D.2548>;", thus "parloops".
+    if (y < 5) /* { dg-message "note: beginning .parloops. region in OpenACC 
.kernels. construct" } */
+#pragma acc loop independent /* { dg-message "note: unparallelized loop nest in 
OpenACC .kernels. region: it's executed conditionally" } */
+      for (int j = 0; j < N; ++j)
+	b[j] = f_w (c[j]);
+  }
+
+#pragma acc kernels /* { dg-warning "region contains gang partitoned code but 
is not gang partitioned" } */
+  {
+    /* { dg-message "note: beginning .gang-single. region in OpenACC .kernels. 
construct" "" { target *-*-* } .+1 } */
+    y = f_g (a[5]); /* { dg-message "note: assigned OpenACC gang worker vector 
loop parallelism" } */
+
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang loop 
parallelism" } */
+    /* { dg-message "note: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+    for (int j = 0; j < N; ++j)
+      b[j] = y + f_w (c[j]); /* { dg-message "note: assigned OpenACC worker 
vector loop parallelism" } */
+  }
+
+#pragma acc kernels
+  {
+    y = 3; /* { dg-message "note: beginning .gang-single. region in OpenACC 
.kernels. construct" } */
+
+#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang 
worker loop parallelism" } */
+    /* { dg-message "note: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+    for (int j = 0; j < N; ++j)
+      b[j] = y + f_v (c[j]); /* { dg-message "note: assigned OpenACC vector 
loop parallelism" } */
+
+    z = 2; /* { dg-message "note: beginning .gang-single. region in OpenACC 
.kernels. construct" } */
+  }
+
+#pragma acc kernels /* { dg-message "note: beginning .gang-single. region in 
OpenACC .kernels. construct" } */
+  ;
+
+  return 0;
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
index 559916c..6604727 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
@@ -9,19 +9,23 @@ program main

    !$acc kernels copyin(a(1:N)) copy(sum)

+  ! converted to "oacc_kernels"
    !$acc loop
    do i = 1, N
      sum = sum + a(i)
    end do

+  ! converted to "oacc_parallel_kernels_gang_single"
    sum = sum + 1
    a(1) = a(1) + 1

-  !$acc loop
+  ! converted to "oacc_parallel_kernels_parallelized"
+  !$acc loop independent
    do i = 1, N
      sum = sum + a(i)
    end do

+  ! converted to "oacc_kernels"
    if (sum .gt. 10) then
      !$acc loop
      do i = 1, N
@@ -29,8 +33,8 @@ program main
      end do
    end if

-  !$acc loop
-  ! { dg-bogus "region contains gang partitoned code but is not gang 
partitioned" "gang partitioned" { xfail *-*-* } .-1 }
+  ! converted to "oacc_kernels"
+  !$acc loop auto
    do i = 1, N
      sum = sum + a(i)
    end do
@@ -42,15 +46,13 @@ end program main
  ! parallel regions.
  ! { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 
"convert_oacc_kernels" } }

-! The three unconditional loop regions are parallelized, the sequential part
-! in between and the conditional loop are made gang-single.
-! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 3 
"convert_oacc_kernels" } }
-! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 2 
"convert_oacc_kernels" } }
+! As noted in the comments above, we get one gang-single serial region; one
+! parallelized loop region; and three "old-style" kernel regions.
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 1 
"convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 1 
"convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_kernels" 3 "convert_oacc_kernels" } }

  ! Each of the parallel regions is async, and there is a final call to
  ! __builtin_GOACC_wait.
  ! { dg-final { scan-tree-dump-times "oacc_parallel_kernels.* async\(-1\)" 5 
"convert_oacc_kernels" } }
  ! { dg-final { scan-tree-dump-times "__builtin_GOACC_wait" 1 
"convert_oacc_kernels" } }
-
-! Check that the original kernels region is removed.
-! { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
new file mode 100644
index 0000000..520bf03
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
@@ -0,0 +1,132 @@
+! Test OpenACC 'kernels' construct decomposition.
+
+! { dg-additional-options "-fopenacc-kernels=split" }
+! { dg-additional-options "-fopt-info-optimized-omp" }
+! { dg-additional-options "-O2" } for "parloops".
+
+! See also "../../c-c++-common/goacc/kernels-decompose-1.c".
+
+program main
+  implicit none
+
+  integer, external :: f_g
+  !$acc routine (f_g) gang
+  integer, external :: f_w
+  !$acc routine (f_w) worker
+  integer, external :: f_v
+  !$acc routine (f_v) vector
+  integer, external :: f_s
+  !$acc routine (f_s) seq
+
+  integer :: i, j, k
+  integer :: x, y, z
+  logical :: y_l
+  integer, parameter :: N = 10
+  integer :: a(N), b(N), c(N)
+
+  !$acc kernels
+  x = 0 ! { dg-message "note: beginning .gang-single. region in OpenACC 
.kernels. construct" }
+  y = 0
+  y_l = x < 10
+  z = x
+  x = x + 1
+  ;
+  !$acc end kernels
+
+  !$acc kernels ! { dg-message "note: assigned OpenACC gang loop parallelism" }
+  do i = 1, N ! { dg-message "note: beginning .parloops. region in OpenACC 
.kernels. construct" }
+     a(i) = 0
+  end do
+  !$acc end kernels
+
+  !$acc kernels loop ! { dg-message "note: assigned OpenACC gang loop 
parallelism" }
+  ! { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to 
.parloops. for analysis" "" { target *-*-* } .-1 }
+  do i = 1, N
+     b(i) = a(N - i + 1)
+  end do
+
+  !$acc kernels
+  !$acc loop ! { dg-message "note: assigned OpenACC gang loop parallelism" }
+  ! { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to 
.parloops. for analysis" "" { target *-*-* } .-1 }
+  do i = 1, N
+     b(i) = a(N - i + 1)
+  end do
+
+  !$acc loop ! { dg-message "note: assigned OpenACC gang loop parallelism" }
+  ! { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to 
.parloops. for analysis" "" { target *-*-* } .-1 }
+  do i = 1, N
+     c(i) = a(i) * b(i)
+  end do
+
+  a(z) = 0 ! { dg-message "note: beginning .gang-single. region in OpenACC 
.kernels. construct" }
+
+  !$acc loop ! { dg-message "note: assigned OpenACC gang loop parallelism" }
+  ! { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to 
.parloops. for analysis" "" { target *-*-* } .-1 }
+  do i = 1, N
+     c(i) = c(i) + a(i)
+  end do
+
+  !$acc loop seq ! { dg-message "note: assigned OpenACC seq loop parallelism" }
+  ! { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" 
"" { target *-*-* } .-1 }
+  do i = 1 + 1, N
+     c(i) = c(i) + c(i - 1)
+  end do
+  !$acc end kernels
+
+  !$acc kernels ! { dg-bogus "note: assigned OpenACC seq loop parallelism" 
"TODO" { xfail *-*-* } }
+  !$acc loop independent ! { dg-message "note: assigned OpenACC gang loop 
parallelism" }
+  ! { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" 
"" { target *-*-* } .-1 }
+  do i = 1, N
+     !$acc loop independent ! { dg-message "note: assigned OpenACC worker loop 
parallelism" }
+     do j = 1, N
+        !$acc loop independent ! { dg-message "note: assigned OpenACC seq loop 
parallelism" "TODO" { xfail *-*-* } }
+        ! { dg-warning "insufficient partitioning available to parallelize 
loop" "TODO" { xfail *-*-* } .-1 }
+        ! { dg-bogus "note: assigned OpenACC vector loop parallelism" "TODO" { 
xfail *-*-* } .-2 }
+        do k = 1, N
+           a(1 + mod(i + j + k, N)) &
+                = b(j) &
+                + f_v (c(k)) ! { dg-message "note: assigned OpenACC vector loop 
parallelism" "TODO" { xfail *-*-* } .-1 }
+        end do
+     end do
+  end do
+
+  !TODO Should the following turn into "gang-single" instead of "parloops"?
+  !TODO The problem is that the first STMT is "if (y <= 4) goto <D.2547>; else 
goto <D.2548>;", thus "parloops".
+  if (y < 5) then ! { dg-message "note: beginning .parloops. region in OpenACC 
.kernels. construct" }
+     !$acc loop independent ! { dg-message "note: unparallelized loop nest in 
OpenACC .kernels. region: it's executed conditionally" }
+     do j = 1, N
+        b(j) = f_w (c(j))
+     end do
+  end if
+  !$acc end kernels
+
+  !$acc kernels
+  !TODO This refers to the "gang-single" "f_g" call.
+  ! { dg-warning "region contains gang partitoned code but is not gang 
partitioned" "TODO" { xfail *-*-* } .-2 }
+  ! { dg-message "note: beginning .gang-single. region in OpenACC .kernels. 
construct" "" { target *-*-* } .+1 }
+  y = f_g (a(5)) ! { dg-message "note: assigned OpenACC gang worker vector loop 
parallelism" "TODO" { xfail *-*-* } }
+
+  !$acc loop independent ! { dg-message "note: assigned OpenACC gang loop 
parallelism" "TODO" { xfail *-*-* } }
+  ! { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" 
"" { target *-*-* } .-1 }
+  ! { dg-bogus "note: assigned OpenACC gang vector loop parallelism" "TODO" { 
xfail *-*-* } .-2 }
+  do j = 1, N
+     b(j) = y + f_w (c(j)) ! { dg-message "note: assigned OpenACC worker vector 
loop parallelism" "TODO" { xfail *-*-* } }
+  end do
+  !$acc end kernels
+
+  !$acc kernels
+  y = 3 ! { dg-message "note: beginning .gang-single. region in OpenACC 
.kernels. construct" }
+
+  !$acc loop independent ! { dg-message "note: assigned OpenACC gang worker 
loop parallelism" "TODO" { xfail *-*-* } }
+  ! { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" 
"" { target *-*-* } .-1 }
+  ! { dg-bogus "note: assigned OpenACC gang vector loop parallelism" "TODO" { 
xfail *-*-* } .-2 }
+  do j = 1, N
+     b(j) = y + f_v (c(j)) ! { dg-message "note: assigned OpenACC vector loop 
parallelism" "TODO" { xfail *-*-* } }
+  end do
+
+  z = 2 ! { dg-message "note: beginning .gang-single. region in OpenACC 
.kernels. construct" }
+  !$acc end kernels
+
+  !$acc kernels ! { dg-message "note: beginning .gang-single. region in OpenACC 
.kernels. construct" }
+  !$acc end kernels
+end program main
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
new file mode 100644
index 0000000..601e543
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
@@ -0,0 +1,30 @@
+/* { dg-additional-options "-fopenacc-kernels=split" } */
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+#undef NDEBUG
+#include <assert.h>
+
+int main()
+{
+  int a = 0;
+#define N 123
+  int b[N] = { 0 };
+
+#pragma acc kernels
+  {
+    int c = 234; /* { dg-warning "note: beginning .gang-single. region in 
OpenACC .kernels. construct" } */
+
+#pragma acc loop independent gang /* { dg-warning "note: assigned OpenACC gang 
loop parallelism" } */
+    /* { dg-warning "note: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } 17 } */
+    for (int i = 0; i < N; ++i)
+      b[i] = c;
+
+    a = c; /* { dg-warning "note: beginning .gang-single. region in OpenACC 
.kernels. construct" } */
+  }
+
+  for (int i = 0; i < N; ++i)
+    assert (b[i] == 234);
+  assert (a == 234);
+
+  return 0;
+}
-- 
2.8.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 09/10, OpenACC] Avoid introducing 'create' mapping clauses for loop index variables in kernels regions
  2019-07-17 21:03 ` [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions Kwok Cheung Yeung
                     ` (7 preceding siblings ...)
  2019-07-17 21:30   ` [PATCH 08/10, OpenACC] New OpenACC kernels region decompose algorithm Kwok Cheung Yeung
@ 2019-07-17 21:32   ` Kwok Cheung Yeung
  2019-07-17 21:37   ` [PATCH 10/10, OpenACC] Make new OpenACC kernels conversion the default; adjust and add tests Kwok Cheung Yeung
  2019-07-18  9:24   ` [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions Jakub Jelinek
  10 siblings, 0 replies; 33+ messages in thread
From: Kwok Cheung Yeung @ 2019-07-17 21:32 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Thomas Schwinge, Julian Brown

This patch avoids adding CREATE mapping clauses for loop index variables. It 
also sets TREE_ADDRESSABLE on newly mapped declarations, which fixes an ICE that 
sometimes appears due to an assert firing in omp-low.c.

2019-07-16  Julian Brown  <julian@codesourcery.com>

	gcc/
	* omp-oacc-kernels.c (find_omp_for_index_vars_1,
	find_omp_for_index_vars): New functions.
	(maybe_build_inner_data_region): Add IDX_VARS argument. Don't add
	CREATE mapping clauses for loop index variables.  Set TREE_ADDRESSABLE
	flag on newly-mapped declarations as a side effect.
	(decompose_kernels_region_body): Call find_omp_for_index_vars.  Don't
	create PRESENT clause for loop index variables.  Pass index variable
	set to maybe_build_inner_data_region.
---
  gcc/omp-oacc-kernels.c | 58 ++++++++++++++++++++++++++++++++++++++++++++------
  1 file changed, 51 insertions(+), 7 deletions(-)

diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c
index d65e6c6..2091385 100644
--- a/gcc/omp-oacc-kernels.c
+++ b/gcc/omp-oacc-kernels.c
@@ -766,6 +766,43 @@ flatten_binds (gbind *bind, bool include_toplevel_vars = false)
    return vars;
  }

+/* Recursively search BODY_SEQUENCE for 'for' loops, and record their loop
+   indices in IDX_VARS.  */
+
+static void
+find_omp_for_index_vars_1 (gimple_seq body_sequence, hash_set<tree> *idx_vars)
+{
+  gimple_stmt_iterator gsi;
+
+  for (gsi = gsi_start (body_sequence); !gsi_end_p (gsi); gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+      gimple *for_stmt = top_level_omp_for_in_stmt (stmt);
+
+      if (for_stmt)
+        {
+	  tree idx = gimple_omp_for_index (for_stmt, 0);
+	  idx_vars->add (idx);
+	  find_omp_for_index_vars_1 (gimple_omp_body (for_stmt), idx_vars);
+	}
+      else if (gimple_code (stmt) == GIMPLE_BIND)
+	find_omp_for_index_vars_1 (gimple_bind_body (as_a <gbind *> (stmt)),
+				   idx_vars);
+    }
+}
+
+/* Find all loop index variables in a bind.  */
+
+static hash_set<tree>
+find_omp_for_index_vars (gbind *bind)
+{
+  hash_set<tree> idx_vars;
+
+  find_omp_for_index_vars_1 (gimple_bind_body (bind), &idx_vars);
+
+  return idx_vars;
+}
+
  /* Helper function for places where we construct data regions.  Wraps the BODY
     inside a try-finally construct at LOC that calls __builtin_GOACC_data_end
     in its cleanup block.  Returns this try statement.  */
@@ -784,13 +821,15 @@ make_data_region_try_statement (location_t loc, gimple *body)

  /* If INNER_BIND_VARS holds variables, build an OpenACC data region with
     location LOC containing BODY and having "create(var)" clauses for each
-   variable.  If INNER_CLEANUP is present, add a try-finally statement with
-   this cleanup code in the finally block.  Return the new data region, or
-   the original BODY if no data region was needed.  */
+   variable (such variables are also made addressable as a side effect).  If
+   INNER_CLEANUP is present, add a try-finally statement with this cleanup
+   code in the finally block.  Return the new data region, or the original
+   BODY if no data region was needed.  */

  static gimple *
  maybe_build_inner_data_region (location_t loc, gimple *body,
-                               tree inner_bind_vars, gimple *inner_cleanup)
+			       tree inner_bind_vars, gimple *inner_cleanup,
+			       hash_set<tree> *idx_vars)
  {
    /* Build data "create(var)" clauses for these local variables.
       Below we will add these to a data region enclosing the entire body
@@ -817,7 +856,7 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
            else
              inner_bind_vars = next;
          }
-      else
+      else if (!idx_vars->contains (v))
          {
            /* Otherwise, build the map clause.  */
            tree new_clause = build_omp_clause (loc, OMP_CLAUSE_MAP);
@@ -825,6 +864,7 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
            OMP_CLAUSE_DECL (new_clause) = v;
            OMP_CLAUSE_SIZE (new_clause) = DECL_SIZE_UNIT (v);
            OMP_CLAUSE_CHAIN (new_clause) = inner_data_clauses;
+	  TREE_ADDRESSABLE (v) = 1;
            inner_data_clauses = new_clause;

            prev_mapped_var = v;
@@ -1156,6 +1196,8 @@ decompose_kernels_region_body (gimple *kernels_region, 
tree kernels_clauses)
    tree inner_bind_vars = flatten_binds (kernels_bind);
    gimple_seq body_sequence = gimple_bind_body (kernels_bind);

+  hash_set<tree> idx_vars = find_omp_for_index_vars (kernels_bind);
+
    /* All these inner variables will get allocated on the device (below, by
       calling maybe_build_inner_data_region).  Here we create "present"
       clauses for them and add these clauses to the list of clauses to be
@@ -1163,7 +1205,9 @@ decompose_kernels_region_body (gimple *kernels_region, 
tree kernels_clauses)
    tree present_clauses = kernels_clauses;
    for (tree var = inner_bind_vars; var; var = TREE_CHAIN (var))
      {
-      if (!DECL_ARTIFICIAL (var) && TREE_CODE (var) != CONST_DECL)
+      if (!DECL_ARTIFICIAL (var)
+	  && TREE_CODE (var) != CONST_DECL
+	  && !idx_vars.contains (var))
          {
            tree present_clause = build_omp_clause (loc, OMP_CLAUSE_MAP);
            OMP_CLAUSE_SET_MAP_KIND (present_clause, GOMP_MAP_FORCE_PRESENT);
@@ -1342,7 +1386,7 @@ decompose_kernels_region_body (gimple *kernels_region, 
tree kernels_clauses)
    /* If we found variables declared in nested scopes, build a data region to
       map them to the device.  */
    body = maybe_build_inner_data_region (loc, body, inner_bind_vars,
-                                        inner_cleanup);
+                                        inner_cleanup, &idx_vars);

    return body;
  }
-- 
2.8.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 10/10, OpenACC] Make new OpenACC kernels conversion the default; adjust and add tests
  2019-07-17 21:03 ` [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions Kwok Cheung Yeung
                     ` (8 preceding siblings ...)
  2019-07-17 21:32   ` [PATCH 09/10, OpenACC] Avoid introducing 'create' mapping clauses for loop index variables in kernels regions Kwok Cheung Yeung
@ 2019-07-17 21:37   ` Kwok Cheung Yeung
  2019-07-18  9:24   ` [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions Jakub Jelinek
  10 siblings, 0 replies; 33+ messages in thread
From: Kwok Cheung Yeung @ 2019-07-17 21:37 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Thomas Schwinge

This patch makes the new kernel conversion scheme the default, and adjusts the 
tests accordingly.

2019-07-16  Thomas Schwinge  <thomas@codesourcery.com>
             Kwok Cheung Yeung  <kcy@codesourcery.com>

	gcc/c-family/
	* c.opt (fopenacc-kernels): Default to "split".

	gcc/fortran/
	* lang.opt (fopenacc-kernels): Default to "split".

	gcc/
	* doc/invoke.texi (-fopenacc-kernels): Update.

	gcc/testsuite/
	* c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c:
	New file.
	* c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c:
	Likewise.
	* c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c:
	Likewise.
	* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise.
	* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c:
	Likewise.
	* c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c:
	Likewise.
	* c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c:
	Likewise.
	* c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c:
	Likewise.
	* c-c++-common/goacc/note-parallelism-kernels-loop-auto.c:
	Likewise.
	* c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c:
	Likewise.
	* c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise.
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Update.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/classify-parallel.c: Likewise.
	* c-c++-common/goacc/classify-routine.c: Likewise.
	* c-c++-common/goacc/if-clause-2.c: Likewise.
	* c-c++-common/goacc/kernels-conversion.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-1.c: Likewise.
	* c-c++-common/goacc/loop-2-kernels.c: Likewise.
	* c-c++-common/goacc/note-parallelism.c: Likewise.
	* c-c++-common/goacc/uninit-dim-clause.c: Likewise.
	* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
	* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
	* gfortran.dg/goacc/classify-kernels-unparallelized.f95
	* gfortran.dg/goacc/classify-kernels.f95
	* gfortran.dg/goacc/loop-2-kernels.f95

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c:
	Update.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
---
  gcc/c-family/c.opt                                 |   2 +-
  gcc/doc/invoke.texi                                |   2 +-
  gcc/fortran/lang.opt                               |   2 +-
  .../goacc/classify-kernels-unparallelized.c        |   9 +-
  .../c-c++-common/goacc/classify-kernels.c          |   4 +-
  .../c-c++-common/goacc/classify-parallel.c         |   2 +-
  .../c-c++-common/goacc/classify-routine.c          |   2 +-
  gcc/testsuite/c-c++-common/goacc/if-clause-2.c     |   1 -
  .../c-c++-common/goacc/kernels-conversion.c        |  10 +-
  .../c-c++-common/goacc/kernels-decompose-1.c       |  69 ++++---
  gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c  |  14 +-
  ...sm-1-kernels-conditional-loop-independent_seq.c | 129 +++++++++++++
  .../goacc/note-parallelism-1-kernels-loop-auto.c   | 126 +++++++++++++
  ...te-parallelism-1-kernels-loop-independent_seq.c | 126 +++++++++++++
  .../goacc/note-parallelism-1-kernels-loops.c       |  47 +++++
  .../note-parallelism-1-kernels-straight-line.c     |  82 +++++++++
  .../note-parallelism-combined-kernels-loop-auto.c  | 121 ++++++++++++
  ...llelism-combined-kernels-loop-independent_seq.c | 121 ++++++++++++
  ...lism-kernels-conditional-loop-independent_seq.c | 204 +++++++++++++++++++++
  .../goacc/note-parallelism-kernels-loop-auto.c     | 138 ++++++++++++++
  ...note-parallelism-kernels-loop-independent_seq.c | 138 ++++++++++++++
  .../goacc/note-parallelism-kernels-loops.c         |  50 +++++
  .../c-c++-common/goacc/note-parallelism.c          |   3 +-
  .../c-c++-common/goacc/uninit-dim-clause.c         |   6 +-
  .../goacc/classify-kernels-unparallelized.f95      |   1 +
  .../gfortran.dg/goacc/classify-kernels.f95         |   1 +
  .../gfortran.dg/goacc/kernels-conversion.f95       |   7 +-
  .../gfortran.dg/goacc/kernels-decompose-1.f95      |  79 ++++----
  gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95   |   1 -
  gcc/testsuite/gfortran.dg/goacc/loop-2-kernels.f95 |  22 +--
  .../libgomp.oacc-c-c++-common/acc_prof-kernels-1.c |  17 +-
  .../kernels-decompose-1.c                          |   9 +-
  32 files changed, 1416 insertions(+), 129 deletions(-)
  create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c
  create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c
  create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c
  create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c
  create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c
  create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c
  create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c
  create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c
  create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-auto.c
  create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c
  create mode 100644 
gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index a193875..8efc5ea 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1689,7 +1689,7 @@ C ObjC C++ ObjC++ LTO Joined Var(flag_openacc_dims)
  Specify default OpenACC compute dimensions.

  fopenacc-kernels=
-C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) 
Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS)
+C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) 
Var(flag_openacc_kernels) Init(OPENACC_KERNELS_SPLIT)
  -fopenacc-kernels=[split|parloops]	Configure OpenACC 'kernels' constructs 
handling.

  Enum
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ec98ab6..ffde9a2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2200,9 +2200,9 @@ Configure OpenACC 'kernels' constructs handling.
  With @option{-fopenacc-kernels=split}, OpenACC 'kernels' constructs
  are split into a sequence of compute constructs, each then handled
  individually.
+This is the default.
  With @option{-fopenacc-kernels=parloops}, the whole OpenACC
  'kernels' constructs is handled by the @samp{parloops} pass.
-This is the default.

  @item -fopenmp
  @opindex fopenmp
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index e7e277a..c84b284 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -663,7 +663,7 @@ Fortran LTO Joined Var(flag_openacc_dims)
  ; Documented in C

  fopenacc-kernels=
-Fortran RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) 
Init(OPENACC_KERNELS_PARLOOPS)
+Fortran RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) 
Init(OPENACC_KERNELS_SPLIT)
  ; Documented in C

  fopenmp
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c 
b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
index d4c4b2c..443b207 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
@@ -1,5 +1,5 @@
  /* Check offloaded function's attributes and classification for unparallelized
-   OpenACC kernels.  */
+   OpenACC 'kernels'.  */

  /* { dg-additional-options "-O2" }
     { dg-additional-options "-fopt-info-optimized-omp" }
@@ -13,14 +13,15 @@ extern unsigned int *__restrict a;
  extern unsigned int *__restrict b;
  extern unsigned int *__restrict c;

-/* An "extern"al mapping of loop iterations/array indices makes the loop
-   unparallelizable.  */
  extern unsigned int f (unsigned int);
+#pragma acc routine (f) seq

  void KERNELS ()
  {
  #pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-message 
"optimized: assigned OpenACC seq loop parallelism" } */
-  for (unsigned int i = 0; i < N; i++)
+  for (unsigned int i = 0; i < N; i++) /* { dg-message "optimized: beginning 
\"parloops\" region in OpenACC 'kernels' construct" } */
+    /* An "extern"al mapping of loop iterations/array indices makes the loop
+       unparallelizable.  */
      c[i] = a[f (i)] + b[f (i)];
  }

diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c 
b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
index 16e9b9e..c154edf 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
@@ -1,5 +1,5 @@
  /* Check offloaded function's attributes and classification for OpenACC
-   kernels.  */
+   'kernels'.  */

  /* { dg-additional-options "-O2" }
     { dg-additional-options "-fopt-info-optimized-omp" }
@@ -16,7 +16,7 @@ extern unsigned int *__restrict c;
  void KERNELS ()
  {
  #pragma acc kernels copyin (a[0:N], b[0:N]) copyout (c[0:N]) /* { dg-message 
"optimized: assigned OpenACC gang loop parallelism" } */
-  for (unsigned int i = 0; i < N; i++)
+  for (unsigned int i = 0; i < N; i++) /* { dg-message "optimized: beginning 
\"parloops\" region in OpenACC 'kernels' construct" } */
      c[i] = a[i] + b[i];
  }

diff --git a/gcc/testsuite/c-c++-common/goacc/classify-parallel.c 
b/gcc/testsuite/c-c++-common/goacc/classify-parallel.c
index 66a6d13..9c80efd 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-parallel.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-parallel.c
@@ -1,5 +1,5 @@
  /* Check offloaded function's attributes and classification for OpenACC
-   parallel.  */
+   'parallel'.  */

  /* { dg-additional-options "-O2" }
     { dg-additional-options "-fopt-info-optimized-omp" }
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-routine.c 
b/gcc/testsuite/c-c++-common/goacc/classify-routine.c
index 0b9ba6e..a4994b0 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-routine.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-routine.c
@@ -1,5 +1,5 @@
  /* Check offloaded function's attributes and classification for OpenACC
-   routine.  */
+   'routine'.  */

  /* { dg-additional-options "-O2" }
     { dg-additional-options "-fopt-info-optimized-omp" }
diff --git a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c 
b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
index e17b5dd..9920b4f 100644
--- a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
@@ -1,4 +1,3 @@
-/* { dg-additional-options "-fopenacc-kernels=split" } */
  /* { dg-additional-options "-fdump-tree-convert_oacc_kernels" } */

  void
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
index ea7eec9..8cb63f0 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
@@ -1,4 +1,3 @@
-/* { dg-additional-options "-fopenacc-kernels=split" } */
  /* { dg-additional-options "-fdump-tree-convert_oacc_kernels" } */

  #define N 1024
@@ -52,12 +51,11 @@ main (void)
     parallelized loop region; and three "old-style" kernel regions. */
  /* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 1 
"convert_oacc_kernels" } } */
  /* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 1 
"convert_oacc_kernels" } } */
-/* { dg-final { scan-tree-dump-times "oacc_kernels" 3 "convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "oacc_kernels " 3 "convert_oacc_kernels" } 
} */

  /* Each of the parallel regions is async, and there is a final call to
     __builtin_GOACC_wait.  */
-/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels.* async\(-1\)" 5 
"convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "oacc_kernels async\\(-1\\)" 3 
"convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single 
async\\(-1\\)" 1 "convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized 
async\\(-1\\)" 1 "convert_oacc_kernels" } } */
  /* { dg-final { scan-tree-dump-times "__builtin_GOACC_wait" 1 
"convert_oacc_kernels" } } */
-
-/* Check that the original kernels region is removed.  */
-/* { dg-final { scan-tree-dump-not "oacc_kernels" "convert_oacc_kernels" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
index b5d58c3..293ee42 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
@@ -1,6 +1,5 @@
  /* Test OpenACC 'kernels' construct decomposition.  */

-/* { dg-additional-options "-fopenacc-kernels=split" } */
  /* { dg-additional-options "-fopt-info-optimized-omp" } */
  /* { dg-additional-options "-O2" } for "parloops".  */

@@ -31,92 +30,92 @@ main ()

  #pragma acc kernels
    {
-    x = 0; /* { dg-message "note: beginning .gang-single. region in OpenACC 
.kernels. construct" } */
+    x = 0; /* { dg-message "optimized: beginning .gang-single. region in 
OpenACC .kernels. construct" } */
      y = x < 10;
      z = x++;
      ;
    }

-#pragma acc kernels
-  for (int i = 0; i < N; i++) /* { dg-message "note: beginning .parloops. 
region in OpenACC .kernels. construct" } */
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC gang loop 
parallelism" } */
+  for (int i = 0; i < N; i++) /* { dg-message "optimized: beginning .parloops. 
region in OpenACC .kernels. construct" } */
      a[i] = 0;

-#pragma acc kernels loop
-  /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to 
.parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc kernels loop /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
    for (int i = 0; i < N; i++)
      b[i] = a[N - i - 1];

  #pragma acc kernels
    {
-#pragma acc loop
-    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct 
to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+    /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
      for (int i = 0; i < N; i++)
        b[i] = a[N - i - 1];

-#pragma acc loop
-    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct 
to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+    /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
      for (int i = 0; i < N; i++)
        c[i] = a[i] * b[i];

-    a[z] = 0; /* { dg-message "note: beginning .gang-single. region in OpenACC 
.kernels. construct" } */
+    a[z] = 0; /* { dg-message "optimized: beginning .gang-single. region in 
OpenACC .kernels. construct" } */

-#pragma acc loop
-    /* { dg-message "note: forwarded loop nest in OpenACC .kernels. construct 
to .parloops. for analysis" "" { target *-*-* } .-1 } */
+#pragma acc loop /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+    /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
      for (int i = 0; i < N; i++)
        c[i] += a[i];

-#pragma acc loop seq /* { dg-message "note: assigned OpenACC seq loop 
parallelism" } */
-    /* { dg-message "note: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+    /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
      for (int i = 0 + 1; i < N; i++)
        c[i] += c[i - 1];
    }

-#pragma acc kernels
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC worker vector 
loop parallelism" } */
    {
-#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang loop 
parallelism" } */
-    /* { dg-message "note: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
loop parallelism" } */
+    /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
      for (int i = 0; i < N; ++i)
-#pragma acc loop independent /* { dg-message "note: assigned OpenACC worker 
loop parallelism" } */
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
worker loop parallelism" } */
        for (int j = 0; j < N; ++j)
-#pragma acc loop independent /* { dg-message "note: assigned OpenACC seq loop 
parallelism" } */
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC seq 
loop parallelism" } */
  	 /* { dg-warning "insufficient partitioning available to parallelize loop" "" 
{ target *-*-* } .-1 } */
  	for (int k = 0; k < N; ++k)
  	  a[(i + j + k) % N]
  	    = b[j]
-	    + f_v (c[k]); /* { dg-message "note: assigned OpenACC vector loop 
parallelism" } */
+	    + f_v (c[k]); /* { dg-message "optimized: assigned OpenACC vector loop 
parallelism" } */

      //TODO Should the following turn into "gang-single" instead of "parloops"?
      //TODO The problem is that the first STMT is "if (y <= 4) goto <D.2547>; 
else goto <D.2548>;", thus "parloops".
-    if (y < 5) /* { dg-message "note: beginning .parloops. region in OpenACC 
.kernels. construct" } */
-#pragma acc loop independent /* { dg-message "note: unparallelized loop nest in 
OpenACC .kernels. region: it's executed conditionally" } */
+    if (y < 5) /* { dg-message "optimized: beginning .parloops. region in 
OpenACC .kernels. construct" } */
+#pragma acc loop independent /* { dg-message "optimized: unparallelized loop 
nest in OpenACC .kernels. region: it's executed conditionally" } */
        for (int j = 0; j < N; ++j)
  	b[j] = f_w (c[j]);
    }

-#pragma acc kernels /* { dg-warning "region contains gang partitoned code but 
is not gang partitioned" } */
+#pragma acc kernels
    {
-    /* { dg-message "note: beginning .gang-single. region in OpenACC .kernels. 
construct" "" { target *-*-* } .+1 } */
-    y = f_g (a[5]); /* { dg-message "note: assigned OpenACC gang worker vector 
loop parallelism" } */
+    /* { dg-message "optimized: beginning .gang-single. region in OpenACC 
.kernels. construct" "" { target *-*-* } .+1 } */
+    y = f_g (a[5]); /* { dg-message "optimized: assigned OpenACC gang worker 
vector loop parallelism" } */

-#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang loop 
parallelism" } */
-    /* { dg-message "note: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
loop parallelism" } */
+    /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
      for (int j = 0; j < N; ++j)
-      b[j] = y + f_w (c[j]); /* { dg-message "note: assigned OpenACC worker 
vector loop parallelism" } */
+      b[j] = y + f_w (c[j]); /* { dg-message "optimized: assigned OpenACC 
worker vector loop parallelism" } */
    }

  #pragma acc kernels
    {
-    y = 3; /* { dg-message "note: beginning .gang-single. region in OpenACC 
.kernels. construct" } */
+    y = 3; /* { dg-message "optimized: beginning .gang-single. region in 
OpenACC .kernels. construct" } */

-#pragma acc loop independent /* { dg-message "note: assigned OpenACC gang 
worker loop parallelism" } */
-    /* { dg-message "note: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
worker loop parallelism" } */
+    /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
      for (int j = 0; j < N; ++j)
-      b[j] = y + f_v (c[j]); /* { dg-message "note: assigned OpenACC vector 
loop parallelism" } */
+      b[j] = y + f_v (c[j]); /* { dg-message "optimized: assigned OpenACC 
vector loop parallelism" } */

-    z = 2; /* { dg-message "note: beginning .gang-single. region in OpenACC 
.kernels. construct" } */
+    z = 2; /* { dg-message "optimized: beginning .gang-single. region in 
OpenACC .kernels. construct" } */
    }

-#pragma acc kernels /* { dg-message "note: beginning .gang-single. region in 
OpenACC .kernels. construct" } */
+#pragma acc kernels /* { dg-message "optimized: beginning .gang-single. region 
in OpenACC .kernels. construct" } */
    ;

    return 0;
diff --git a/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c 
b/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
index 0151508..c989222 100644
--- a/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
+++ b/gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
@@ -37,7 +37,7 @@ void K(void)
  	for (j = 0; j < 10; j++)
  	  { }
        }
-#pragma acc loop seq gang // { dg-error "'seq' overrides" }
+#pragma acc loop seq gang // { dg-error "'seq' overrides" "TODO" { xfail *-*-* } }
      for (i = 0; i < 10; i++)
        { }

@@ -63,7 +63,7 @@ void K(void)
  	for (j = 0; j < 10; j++)
  	  { }
        }
-#pragma acc loop seq worker // { dg-error "'seq' overrides" }
+#pragma acc loop seq worker // { dg-error "'seq' overrides" "TODO" { xfail 
*-*-* } }
      for (i = 0; i < 10; i++)
        { }
  #pragma acc loop gang worker
@@ -92,7 +92,7 @@ void K(void)
  	for (j = 1; j < 10; j++)
  	  { }
        }
-#pragma acc loop seq vector // { dg-error "'seq' overrides" }
+#pragma acc loop seq vector // { dg-error "'seq' overrides" "TODO" { xfail 
*-*-* } }
      for (i = 0; i < 10; i++)
        { }
  #pragma acc loop gang vector
@@ -105,7 +105,7 @@ void K(void)
  #pragma acc loop auto
      for (i = 0; i < 10; i++)
        { }
-#pragma acc loop seq auto // { dg-error "'seq' overrides" }
+#pragma acc loop seq auto // { dg-error "'seq' overrides" "TODO" { xfail *-*-* } }
      for (i = 0; i < 10; i++)
        { }
  #pragma acc loop gang auto // { dg-error "'auto' conflicts" }
@@ -147,7 +147,7 @@ void K(void)
  #pragma acc kernels loop worker(num:5)
    for (i = 0; i < 10; i++)
      { }
-#pragma acc kernels loop seq worker // { dg-error "'seq' overrides" }
+#pragma acc kernels loop seq worker // { dg-error "'seq' overrides" "TODO" { 
xfail *-*-* } }
    for (i = 0; i < 10; i++)
      { }
  #pragma acc kernels loop gang worker
@@ -163,7 +163,7 @@ void K(void)
  #pragma acc kernels loop vector(length:5)
    for (i = 0; i < 10; i++)
      { }
-#pragma acc kernels loop seq vector // { dg-error "'seq' overrides" }
+#pragma acc kernels loop seq vector // { dg-error "'seq' overrides" "TODO" { 
xfail *-*-* } }
    for (i = 0; i < 10; i++)
      { }
  #pragma acc kernels loop gang vector
@@ -176,7 +176,7 @@ void K(void)
  #pragma acc kernels loop auto
    for (i = 0; i < 10; i++)
      { }
-#pragma acc kernels loop seq auto // { dg-error "'seq' overrides" }
+#pragma acc kernels loop seq auto // { dg-error "'seq' overrides" "TODO" { 
xfail *-*-* } }
    for (i = 0; i < 10; i++)
      { }
  #pragma acc kernels loop gang auto // { dg-error "'auto' conflicts" }
diff --git 
a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c
new file mode 100644
index 0000000..c21273a
--- /dev/null
+++ 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c
@@ -0,0 +1,129 @@
+/* Test the output of "-fopt-info-optimized-omp" for an OpenACC 'kernels'
+   construct containing conditionally executed 'loop' constructs with
+   'independent' or 'seq' clauses.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+extern int c;
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+  if (c) /* { dg-message "optimized: beginning .parloops. region in OpenACC 
.kernels. construct" } */
+ {
+#pragma acc loop seq
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent worker
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent vector
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang vector
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang worker
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent worker vector
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang worker vector
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent worker
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent vector
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop independent
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc loop independent
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop seq
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop independent
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop seq
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop independent
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop seq
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+  return 0;
+}
diff --git 
a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c
new file mode 100644
index 0000000..eedc472
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c
@@ -0,0 +1,126 @@
+/* Test the output of "-fopt-info-optimized-omp" for an OpenACC 'kernels'
+   construct containing 'loop' constructs with explicit or implicit 'auto'
+   clause.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels
+ /* Strangely indented to keep this similar to other test cases.  */
+ {
+#pragma acc loop /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop gang /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop worker /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop vector /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop gang vector /* { dg-message "optimized: assigned OpenACC seq 
loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop gang worker /* { dg-message "optimized: assigned OpenACC seq 
loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop worker vector /* { dg-message "optimized: assigned OpenACC seq 
loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop gang worker vector /* { dg-message "optimized: assigned 
OpenACC seq loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop gang /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop worker
+    for (y = 0; y < 10; y++)
+#pragma acc loop vector
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop auto /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop auto /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc loop auto /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop auto /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop auto /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+  return 0;
+}
diff --git 
a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c
new file mode 100644
index 0000000..dad1bdb
--- /dev/null
+++ 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c
@@ -0,0 +1,126 @@
+/* Test the output of "-fopt-info-optimized-omp" for an OpenACC 'kernels'
+   construct containing 'loop' constructs with 'independent' or 'seq'
+   clauses.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels
+ /* Strangely indented to keep this similar to other test cases.  */
+ {
+#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang /* { dg-message "optimized: assigned OpenACC 
gang loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent worker /* { dg-message "optimized: assigned 
OpenACC worker loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent vector /* { dg-message "optimized: assigned 
OpenACC vector loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang vector /* { dg-message "optimized: assigned 
OpenACC gang vector loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang worker /* { dg-message "optimized: assigned 
OpenACC gang worker loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent worker vector /* { dg-message "optimized: assigned 
OpenACC worker vector loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang worker vector /* { dg-message "optimized: 
assigned OpenACC gang worker vector loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent gang /* { dg-message "optimized: assigned OpenACC 
gang loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent worker /* { dg-message "optimized: assigned 
OpenACC worker loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent vector /* { dg-message "optimized: assigned 
OpenACC vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
vector loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
worker loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
vector loop parallelism" } */
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
worker loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
worker loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
worker loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
worker loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
vector loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
vector loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c
new file mode 100644
index 0000000..336be88
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loops.c
@@ -0,0 +1,47 @@
+/* Test the output of "-fopt-info-optimized-omp" for an OpenACC 'kernels'
+   construct containing loops.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ {
+  for (x = 0; x < 10; x++) /* { dg-message "optimized: beginning .parloops. 
region in OpenACC .kernels. construct" } */
+    ;
+
+  for (x = 0; x < 10; x++)
+    ;
+
+  for (x = 0; x < 10; x++)
+    for (y = 0; y < 10; y++)
+      for (z = 0; z < 10; z++)
+	;
+
+  for (x = 0; x < 10; x++)
+    ;
+
+  for (x = 0; x < 10; x++)
+    for (y = 0; y < 10; y++)
+      ;
+
+  for (x = 0; x < 10; x++)
+    for (y = 0; y < 10; y++)
+      for (z = 0; z < 10; z++)
+	;
+
+  for (x = 0; x < 10; x++)
+    for (y = 0; y < 10; y++)
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+  return 0;
+}
diff --git 
a/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c
new file mode 100644
index 0000000..07a1e32
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c
@@ -0,0 +1,82 @@
+/* Test the output of "-fopt-info-optimized-omp" for an OpenACC 'kernels'
+   construct containing straight-line code.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+#pragma acc routine gang
+extern int
+f_g (int);
+
+#pragma acc routine worker
+extern int
+f_w (int);
+
+#pragma acc routine vector
+extern int
+f_v (int);
+
+#pragma acc routine seq
+extern int
+f_s (int);
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels
+  {
+    x = 0; /* { dg-message "optimized: beginning .gang-single. region in 
OpenACC .kernels. construct" } */
+    y = x < 10;
+    z = x++;
+    ;
+
+    y = 0;
+    z = y < 10;
+    x -= f_g (y++); /* { dg-message "optimized: assigned OpenACC gang worker 
vector loop parallelism" } */
+    ;
+
+    x = f_w (0); /* { dg-message "optimized: assigned OpenACC worker vector 
loop parallelism" } */
+    z = f_v (x < 10); /* { dg-message "optimized: assigned OpenACC vector loop 
parallelism" } */
+    y -= f_s (x++); /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+    ;
+
+    x = 0;
+    y = x < 10;
+    z = (x++);
+    y = 0;
+    x = y < 10;
+    z += (y++);
+    ;
+
+    x = 0;
+    y += f_s (x < 10); /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+    x++;
+    y = 0;
+    y += f_v (y < 10); /* { dg-message "optimized: assigned OpenACC vector loop 
parallelism" } */
+    y++;
+    z = 0;
+    y += f_w (z < 10); /* { dg-message "optimized: assigned OpenACC worker 
vector loop parallelism" } */
+    z++;
+    ;
+
+    x = 0;
+    y *= f_g ( /* { dg-message "optimized: assigned OpenACC gang worker vector 
loop parallelism" } */
+	      f_w (x < 10) /* { dg-message "optimized: assigned OpenACC worker vector 
loop parallelism" } */
+	      + f_g (x < 10) /* { dg-message "optimized: assigned OpenACC gang worker 
vector loop parallelism" } */
+	      );
+    x++;
+    y = 0;
+    y *= y < 10;
+    y++;
+    z = 0;
+    y *= z < 10;
+    z++;
+    ;
+  }
+
+  return 0;
+}
diff --git 
a/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c
new file mode 100644
index 0000000..2241901
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c
@@ -0,0 +1,121 @@
+/* Test the output of "-fopt-info-optimized-omp" for combined OpenACC 'kernels
+   loop' constructs with explicit or implicit 'auto' clause.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels loop /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop gang /* { dg-message "optimized: assigned OpenACC seq 
loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop worker /* { dg-message "optimized: assigned OpenACC 
seq loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop vector /* { dg-message "optimized: assigned OpenACC 
seq loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop gang vector /* { dg-message "optimized: assigned 
OpenACC seq loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop gang worker /* { dg-message "optimized: assigned 
OpenACC seq loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop worker vector /* { dg-message "optimized: assigned 
OpenACC seq loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop gang worker vector /* { dg-message "optimized: 
assigned OpenACC seq loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop gang /* { dg-message "optimized: assigned OpenACC seq 
loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop worker
+    for (y = 0; y < 10; y++)
+#pragma acc loop vector
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop auto /* { dg-message "optimized: assigned OpenACC seq 
loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop auto /* { dg-message "optimized: assigned OpenACC seq 
loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc kernels loop auto /* { dg-message "optimized: assigned OpenACC seq 
loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop auto /* { dg-message "optimized: assigned OpenACC seq 
loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop auto /* { dg-message "optimized: assigned OpenACC seq 
loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop
+      for (z = 0; z < 10; z++)
+	;
+
+  return 0;
+}
diff --git 
a/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c
new file mode 100644
index 0000000..b743636
--- /dev/null
+++ 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c
@@ -0,0 +1,121 @@
+/* Test the output of "-fopt-info-optimized-omp" for combined OpenACC 'kernels
+   loop' constructs with 'independent' or 'seq' clauses.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels loop seq /* { dg-message "optimized: assigned OpenACC seq 
loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop independent gang /* { dg-message "optimized: assigned 
OpenACC gang loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop independent worker /* { dg-message "optimized: 
assigned OpenACC worker loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop independent vector /* { dg-message "optimized: 
assigned OpenACC vector loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop independent gang vector /* { dg-message "optimized: 
assigned OpenACC gang vector loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop independent gang worker /* { dg-message "optimized: 
assigned OpenACC gang worker loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop independent worker vector /* { dg-message "optimized: 
assigned OpenACC worker vector loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop independent gang worker vector /* { dg-message 
"optimized: assigned OpenACC gang worker vector loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop independent gang /* { dg-message "optimized: assigned 
OpenACC gang loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent worker /* { dg-message "optimized: assigned 
OpenACC worker loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent vector /* { dg-message "optimized: assigned 
OpenACC vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop independent /* { dg-message "optimized: assigned 
OpenACC gang vector loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels loop independent /* { dg-message "optimized: assigned 
OpenACC gang worker loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
vector loop parallelism" } */
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc kernels loop independent /* { dg-message "optimized: assigned 
OpenACC gang loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
worker loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop seq /* { dg-message "optimized: assigned OpenACC seq 
loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
worker loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop independent /* { dg-message "optimized: assigned 
OpenACC gang worker loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop independent /* { dg-message "optimized: assigned 
OpenACC gang worker loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
vector loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels loop seq /* { dg-message "optimized: assigned OpenACC seq 
loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
vector loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+  return 0;
+}
diff --git 
a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c
new file mode 100644
index 0000000..21d31c4
--- /dev/null
+++ 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c
@@ -0,0 +1,204 @@
+/* Test the output of "-fopt-info-optimized-omp" for OpenACC 'kernels'
+   constructs containing conditionally executed 'loop' constructs with
+   'independent' or 'seq' clauses.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+extern int c;
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "optimized: beginning .parloops. region in OpenACC 
.kernels. construct" } */
+ {
+#pragma acc loop seq
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+ }
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "optimized: beginning .parloops. region in OpenACC 
.kernels. construct" } */
+ {
+#pragma acc loop independent gang
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+ }
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "optimized: beginning .parloops. region in OpenACC 
.kernels. construct" } */
+ {
+#pragma acc loop independent worker
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+ }
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "optimized: beginning .parloops. region in OpenACC 
.kernels. construct" } */
+ {
+#pragma acc loop independent vector
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+ }
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "optimized: beginning .parloops. region in OpenACC 
.kernels. construct" } */
+ {
+#pragma acc loop independent gang vector
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+ }
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "optimized: beginning .parloops. region in OpenACC 
.kernels. construct" } */
+ {
+#pragma acc loop independent gang worker
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+ }
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "optimized: beginning .parloops. region in OpenACC 
.kernels. construct" } */
+ {
+#pragma acc loop independent worker vector
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+ }
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "optimized: beginning .parloops. region in OpenACC 
.kernels. construct" } */
+ {
+#pragma acc loop independent gang worker vector
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+ }
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "optimized: beginning .parloops. region in OpenACC 
.kernels. construct" } */
+ {
+#pragma acc loop independent gang
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent worker
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent vector
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "optimized: beginning .parloops. region in OpenACC 
.kernels. construct" } */
+ {
+#pragma acc loop independent
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+ }
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "optimized: beginning .parloops. region in OpenACC 
.kernels. construct" } */
+ {
+#pragma acc loop independent
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+      ;
+ }
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "optimized: beginning .parloops. region in OpenACC 
.kernels. construct" } */
+ {
+#pragma acc loop independent
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "optimized: beginning .parloops. region in OpenACC 
.kernels. construct" } */
+ {
+#pragma acc loop seq
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "optimized: beginning .parloops. region in OpenACC 
.kernels. construct" } */
+ {
+#pragma acc loop independent
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop seq
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "optimized: beginning .parloops. region in OpenACC 
.kernels. construct" } */
+ {
+#pragma acc loop independent
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+ /* Strangely indented to keep this similar to other test cases.  */
+ if (c) /* { dg-message "optimized: beginning .parloops. region in OpenACC 
.kernels. construct" } */
+ {
+#pragma acc loop seq
+  /* { dg-message "optimized: unparallelized loop nest in OpenACC .kernels. 
region: it's executed conditionally" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq
+      for (z = 0; z < 10; z++)
+	;
+ }
+
+  return 0;
+}
diff --git 
a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-auto.c 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-auto.c
new file mode 100644
index 0000000..02b9064
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-auto.c
@@ -0,0 +1,138 @@
+/* Test the output of "-fopt-info-optimized-omp" for OpenACC 'kernels'
+   constructs containing 'loop' constructs with explicit or implicit 'auto'
+   clause.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels
+#pragma acc loop /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop gang /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop worker /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop vector /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop gang vector /* { dg-message "optimized: assigned OpenACC seq 
loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop gang worker /* { dg-message "optimized: assigned OpenACC seq 
loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop worker vector /* { dg-message "optimized: assigned OpenACC seq 
loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop gang worker vector /* { dg-message "optimized: assigned 
OpenACC seq loop parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop gang /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop worker
+    for (y = 0; y < 10; y++)
+#pragma acc loop vector
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop auto /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop auto /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc kernels
+#pragma acc loop auto /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop auto /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop
+    for (y = 0; y < 10; y++)
+#pragma acc loop auto
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop auto /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: forwarded loop nest in OpenACC .kernels. 
construct to .parloops. for analysis" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop auto
+    for (y = 0; y < 10; y++)
+#pragma acc loop
+      for (z = 0; z < 10; z++)
+	;
+
+  return 0;
+}
diff --git 
a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c
new file mode 100644
index 0000000..6824d70
--- /dev/null
+++ 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c
@@ -0,0 +1,138 @@
+/* Test the output of "-fopt-info-optimized-omp" for OpenACC 'kernels'
+   constructs containing 'loop' constructs with 'independent' or 'seq'
+   clauses.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels
+#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop independent gang /* { dg-message "optimized: assigned OpenACC 
gang loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop independent worker /* { dg-message "optimized: assigned 
OpenACC worker loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop independent vector /* { dg-message "optimized: assigned 
OpenACC vector loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop independent gang vector /* { dg-message "optimized: assigned 
OpenACC gang vector loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop independent gang worker /* { dg-message "optimized: assigned 
OpenACC gang worker loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop independent worker vector /* { dg-message "optimized: assigned 
OpenACC worker vector loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop independent gang worker vector /* { dg-message "optimized: 
assigned OpenACC gang worker vector loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop independent gang /* { dg-message "optimized: assigned OpenACC 
gang loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent worker /* { dg-message "optimized: assigned 
OpenACC worker loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent vector /* { dg-message "optimized: assigned 
OpenACC vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
vector loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+    ;
+
+#pragma acc kernels
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
worker loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
vector loop parallelism" } */
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc kernels
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
worker loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
worker loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
worker loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
vector loop parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
worker loop parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC 
vector loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels
+#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  /* { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 } */
+  for (x = 0; x < 10; x++)
+#pragma acc loop independent /* { dg-message "optimized: assigned OpenACC gang 
vector loop parallelism" } */
+    for (y = 0; y < 10; y++)
+#pragma acc loop seq /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+      for (z = 0; z < 10; z++)
+	;
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c
new file mode 100644
index 0000000..365464b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops.c
@@ -0,0 +1,50 @@
+/* Test the output of "-fopt-info-optimized-omp" for an OpenACC 'kernels'
+   construct containing loops.  */
+
+/* { dg-additional-options "-fopt-info-optimized-omp" } */
+
+//TODO update accordingly
+/* See also "../../gfortran.dg/goacc/note-parallelism.f90".  */
+
+int
+main ()
+{
+  int x, y, z;
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "optimized: beginning .parloops. 
region in OpenACC .kernels. construct" } */
+    ;
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "optimized: beginning .parloops. 
region in OpenACC .kernels. construct" } */
+    ;
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "optimized: beginning .parloops. 
region in OpenACC .kernels. construct" } */
+    for (y = 0; y < 10; y++)
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "optimized: beginning .parloops. 
region in OpenACC .kernels. construct" } */
+    ;
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "optimized: beginning .parloops. 
region in OpenACC .kernels. construct" } */
+    for (y = 0; y < 10; y++)
+      ;
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "optimized: beginning .parloops. 
region in OpenACC .kernels. construct" } */
+    for (y = 0; y < 10; y++)
+      for (z = 0; z < 10; z++)
+	;
+
+#pragma acc kernels /* { dg-message "optimized: assigned OpenACC seq loop 
parallelism" } */
+  for (x = 0; x < 10; x++) /* { dg-message "optimized: beginning .parloops. 
region in OpenACC .kernels. construct" } */
+    for (y = 0; y < 10; y++)
+      for (z = 0; z < 10; z++)
+	;
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/note-parallelism.c 
b/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
index 735df7d..2b49a8b 100644
--- a/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
+++ b/gcc/testsuite/c-c++-common/goacc/note-parallelism.c
@@ -1,4 +1,5 @@
-/* Test the output of "-fopt-info-optimized-omp".  */
+/* Test the output of "-fopt-info-optimized-omp" for OpenACC 'parallel'
+   constructs.  */

  /* { dg-additional-options "-fopt-info-optimized-omp" } */

diff --git a/gcc/testsuite/c-c++-common/goacc/uninit-dim-clause.c 
b/gcc/testsuite/c-c++-common/goacc/uninit-dim-clause.c
index 9f11196..f00daa7 100644
--- a/gcc/testsuite/c-c++-common/goacc/uninit-dim-clause.c
+++ b/gcc/testsuite/c-c++-common/goacc/uninit-dim-clause.c
@@ -18,12 +18,12 @@ void acc_kernels()
  {
    int i, j, k;

-  #pragma acc kernels num_gangs(i) /* { dg-warning "is used uninitialized in 
this function" } */
+  #pragma acc kernels num_gangs(i) /* { dg-warning "is used uninitialized in 
this function" "TODO" { xfail *-*-* } } */
    ;

-  #pragma acc kernels num_workers(j) /* { dg-warning "is used uninitialized in 
this function" } */
+  #pragma acc kernels num_workers(j) /* { dg-warning "is used uninitialized in 
this function" "TODO" { xfail *-*-* } } */
    ;

-  #pragma acc kernels vector_length(k) /* { dg-warning "is used uninitialized 
in this function" } */
+  #pragma acc kernels vector_length(k) /* { dg-warning "is used uninitialized 
in this function" "TODO" { xfail *-*-* } } */
    ;
  }
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 
b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
index 0877242..27ba39b 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
@@ -21,6 +21,7 @@ program main

    !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
    do i = 0, n - 1 ! { dg-message "optimized: assigned OpenACC seq loop 
parallelism" }
+  ! { dg-message "optimized: beginning \"parloops\" region in OpenACC 'kernels' 
construct" "" { target *-*-* } .-1 }
       c(i) = a(f (i)) + b(f (i))
    end do
    !$acc end kernels
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 
b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
index f2c4736..68d0512 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
@@ -17,6 +17,7 @@ program main

    !$acc kernels copyin (a(0:n-1), b(0:n-1)) copyout (c(0:n-1))
    do i = 0, n - 1 ! { dg-message "optimized: assigned OpenACC gang loop 
parallelism" }
+  ! { dg-message "optimized: beginning \"parloops\" region in OpenACC 'kernels' 
construct" "" { target *-*-* } .-1 }
       c(i) = a(i) + b(i)
    end do
    !$acc end kernels
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
index 6604727..4672d15 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
@@ -1,4 +1,3 @@
-! { dg-additional-options "-fopenacc-kernels=split" }
  ! { dg-additional-options "-fdump-tree-convert_oacc_kernels" }

  program main
@@ -50,9 +49,11 @@ end program main
  ! parallelized loop region; and three "old-style" kernel regions.
  ! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 1 
"convert_oacc_kernels" } }
  ! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized" 1 
"convert_oacc_kernels" } }
-! { dg-final { scan-tree-dump-times "oacc_kernels" 3 "convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_kernels " 3 "convert_oacc_kernels" } }

  ! Each of the parallel regions is async, and there is a final call to
  ! __builtin_GOACC_wait.
-! { dg-final { scan-tree-dump-times "oacc_parallel_kernels.* async\(-1\)" 5 
"convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_kernels async\\(-1\\)" 3 
"convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single 
async\\(-1\\)" 1 "convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_parallelized 
async\\(-1\\)" 1 "convert_oacc_kernels" } }
  ! { dg-final { scan-tree-dump-times "__builtin_GOACC_wait" 1 
"convert_oacc_kernels" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
index 520bf03..b2956d7 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
@@ -1,6 +1,5 @@
  ! Test OpenACC 'kernels' construct decomposition.

-! { dg-additional-options "-fopenacc-kernels=split" }
  ! { dg-additional-options "-fopt-info-optimized-omp" }
  ! { dg-additional-options "-O2" } for "parloops".

@@ -25,7 +24,7 @@ program main
    integer :: a(N), b(N), c(N)

    !$acc kernels
-  x = 0 ! { dg-message "note: beginning .gang-single. region in OpenACC 
.kernels. construct" }
+  x = 0 ! { dg-message "optimized: beginning .gang-single. region in OpenACC 
.kernels. construct" }
    y = 0
    y_l = x < 10
    z = x
@@ -33,67 +32,67 @@ program main
    ;
    !$acc end kernels

-  !$acc kernels ! { dg-message "note: assigned OpenACC gang loop parallelism" }
-  do i = 1, N ! { dg-message "note: beginning .parloops. region in OpenACC 
.kernels. construct" }
+  !$acc kernels ! { dg-message "optimized: assigned OpenACC gang loop 
parallelism" }
+  do i = 1, N ! { dg-message "optimized: beginning .parloops. region in OpenACC 
.kernels. construct" }
       a(i) = 0
    end do
    !$acc end kernels

-  !$acc kernels loop ! { dg-message "note: assigned OpenACC gang loop 
parallelism" }
-  ! { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to 
.parloops. for analysis" "" { target *-*-* } .-1 }
+  !$acc kernels loop ! { dg-message "optimized: assigned OpenACC seq loop 
parallelism" }
+  ! { dg-message "optimized: forwarded loop nest in OpenACC .kernels. construct 
to .parloops. for analysis" "" { target *-*-* } .-1 }
    do i = 1, N
       b(i) = a(N - i + 1)
    end do

    !$acc kernels
-  !$acc loop ! { dg-message "note: assigned OpenACC gang loop parallelism" }
-  ! { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to 
.parloops. for analysis" "" { target *-*-* } .-1 }
+  !$acc loop ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+  ! { dg-message "optimized: forwarded loop nest in OpenACC .kernels. construct 
to .parloops. for analysis" "" { target *-*-* } .-1 }
    do i = 1, N
       b(i) = a(N - i + 1)
    end do

-  !$acc loop ! { dg-message "note: assigned OpenACC gang loop parallelism" }
-  ! { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to 
.parloops. for analysis" "" { target *-*-* } .-1 }
+  !$acc loop ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+  ! { dg-message "optimized: forwarded loop nest in OpenACC .kernels. construct 
to .parloops. for analysis" "" { target *-*-* } .-1 }
    do i = 1, N
       c(i) = a(i) * b(i)
    end do

-  a(z) = 0 ! { dg-message "note: beginning .gang-single. region in OpenACC 
.kernels. construct" }
+  a(z) = 0 ! { dg-message "optimized: beginning .gang-single. region in OpenACC 
.kernels. construct" }

-  !$acc loop ! { dg-message "note: assigned OpenACC gang loop parallelism" }
-  ! { dg-message "note: forwarded loop nest in OpenACC .kernels. construct to 
.parloops. for analysis" "" { target *-*-* } .-1 }
+  !$acc loop ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+  ! { dg-message "optimized: forwarded loop nest in OpenACC .kernels. construct 
to .parloops. for analysis" "" { target *-*-* } .-1 }
    do i = 1, N
       c(i) = c(i) + a(i)
    end do

-  !$acc loop seq ! { dg-message "note: assigned OpenACC seq loop parallelism" }
-  ! { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" 
"" { target *-*-* } .-1 }
+  !$acc loop seq ! { dg-message "optimized: assigned OpenACC seq loop 
parallelism" }
+  ! { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 }
    do i = 1 + 1, N
       c(i) = c(i) + c(i - 1)
    end do
    !$acc end kernels

-  !$acc kernels ! { dg-bogus "note: assigned OpenACC seq loop parallelism" 
"TODO" { xfail *-*-* } }
-  !$acc loop independent ! { dg-message "note: assigned OpenACC gang loop 
parallelism" }
-  ! { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" 
"" { target *-*-* } .-1 }
+  !$acc kernels ! { dg-message "optimized: assigned OpenACC worker vector loop 
parallelism" }
+  !$acc loop independent ! { dg-message "optimized: assigned OpenACC gang loop 
parallelism" }
+  ! { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 }
    do i = 1, N
-     !$acc loop independent ! { dg-message "note: assigned OpenACC worker loop 
parallelism" }
+     !$acc loop independent ! { dg-message "optimized: assigned OpenACC worker 
loop parallelism" }
       do j = 1, N
-        !$acc loop independent ! { dg-message "note: assigned OpenACC seq loop 
parallelism" "TODO" { xfail *-*-* } }
-        ! { dg-warning "insufficient partitioning available to parallelize 
loop" "TODO" { xfail *-*-* } .-1 }
-        ! { dg-bogus "note: assigned OpenACC vector loop parallelism" "TODO" { 
xfail *-*-* } .-2 }
+        !$acc loop independent ! { dg-message "optimized: assigned OpenACC seq 
loop parallelism" }
+        ! { dg-warning "insufficient partitioning available to parallelize 
loop" "" { target *-*-* } .-1 }
+        ! { dg-bogus "optimized: assigned OpenACC vector loop parallelism" }
          do k = 1, N
             a(1 + mod(i + j + k, N)) &
                  = b(j) &
-                + f_v (c(k)) ! { dg-message "note: assigned OpenACC vector loop 
parallelism" "TODO" { xfail *-*-* } .-1 }
+                + f_v (c(k)) ! { dg-message "optimized: assigned OpenACC vector 
loop parallelism" }
          end do
       end do
    end do

    !TODO Should the following turn into "gang-single" instead of "parloops"?
    !TODO The problem is that the first STMT is "if (y <= 4) goto <D.2547>; else 
goto <D.2548>;", thus "parloops".
-  if (y < 5) then ! { dg-message "note: beginning .parloops. region in OpenACC 
.kernels. construct" }
-     !$acc loop independent ! { dg-message "note: unparallelized loop nest in 
OpenACC .kernels. region: it's executed conditionally" }
+  if (y < 5) then ! { dg-message "optimized: beginning .parloops. region in 
OpenACC .kernels. construct" }
+     !$acc loop independent ! { dg-message "optimized: unparallelized loop nest 
in OpenACC .kernels. region: it's executed conditionally" }
       do j = 1, N
          b(j) = f_w (c(j))
       end do
@@ -101,32 +100,30 @@ program main
    !$acc end kernels

    !$acc kernels
-  !TODO This refers to the "gang-single" "f_g" call.
-  ! { dg-warning "region contains gang partitoned code but is not gang 
partitioned" "TODO" { xfail *-*-* } .-2 }
-  ! { dg-message "note: beginning .gang-single. region in OpenACC .kernels. 
construct" "" { target *-*-* } .+1 }
-  y = f_g (a(5)) ! { dg-message "note: assigned OpenACC gang worker vector loop 
parallelism" "TODO" { xfail *-*-* } }
-
-  !$acc loop independent ! { dg-message "note: assigned OpenACC gang loop 
parallelism" "TODO" { xfail *-*-* } }
-  ! { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" 
"" { target *-*-* } .-1 }
-  ! { dg-bogus "note: assigned OpenACC gang vector loop parallelism" "TODO" { 
xfail *-*-* } .-2 }
+  ! { dg-message "optimized: beginning .gang-single. region in OpenACC 
.kernels. construct" "" { target *-*-* } .+1 }
+  y = f_g (a(5)) ! { dg-message "optimized: assigned OpenACC gang worker vector 
loop parallelism" }
+
+  !$acc loop independent ! { dg-message "optimized: assigned OpenACC gang loop 
parallelism" }
+  ! { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 }
+  ! { dg-bogus "optimized: assigned OpenACC gang vector loop parallelism" "" { 
target *-*-* } .-2 }
    do j = 1, N
-     b(j) = y + f_w (c(j)) ! { dg-message "note: assigned OpenACC worker vector 
loop parallelism" "TODO" { xfail *-*-* } }
+     b(j) = y + f_w (c(j)) ! { dg-message "optimized: assigned OpenACC worker 
vector loop parallelism" }
    end do
    !$acc end kernels

    !$acc kernels
-  y = 3 ! { dg-message "note: beginning .gang-single. region in OpenACC 
.kernels. construct" }
+  y = 3 ! { dg-message "optimized: beginning .gang-single. region in OpenACC 
.kernels. construct" }

-  !$acc loop independent ! { dg-message "note: assigned OpenACC gang worker 
loop parallelism" "TODO" { xfail *-*-* } }
-  ! { dg-message "note: parallelized loop nest in OpenACC .kernels. construct" 
"" { target *-*-* } .-1 }
-  ! { dg-bogus "note: assigned OpenACC gang vector loop parallelism" "TODO" { 
xfail *-*-* } .-2 }
+  !$acc loop independent ! { dg-message "optimized: assigned OpenACC gang 
worker loop parallelism" }
+  ! { dg-message "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } .-1 }
+  ! { dg-bogus "optimized: assigned OpenACC gang vector loop parallelism" "" { 
target *-*-* } .-2 }
    do j = 1, N
-     b(j) = y + f_v (c(j)) ! { dg-message "note: assigned OpenACC vector loop 
parallelism" "TODO" { xfail *-*-* } }
+     b(j) = y + f_v (c(j)) ! { dg-message "optimized: assigned OpenACC vector 
loop parallelism" }
    end do

-  z = 2 ! { dg-message "note: beginning .gang-single. region in OpenACC 
.kernels. construct" }
+  z = 2 ! { dg-message "optimized: beginning .gang-single. region in OpenACC 
.kernels. construct" }
    !$acc end kernels

-  !$acc kernels ! { dg-message "note: beginning .gang-single. region in OpenACC 
.kernels. construct" }
+  !$acc kernels ! { dg-message "optimized: beginning .gang-single. region in 
OpenACC .kernels. construct" }
    !$acc end kernels
  end program main
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
index b83ca2d..bc9beba 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
@@ -1,6 +1,5 @@
  ! { dg-do compile }
  ! { dg-additional-options "-fdump-tree-original" }
-! { dg-additional-options "-fopenacc-kernels=split" }
  ! { dg-additional-options "-fdump-tree-convert_oacc_kernels" }

  program test
diff --git a/gcc/testsuite/gfortran.dg/goacc/loop-2-kernels.f95 
b/gcc/testsuite/gfortran.dg/goacc/loop-2-kernels.f95
index 874c62d..f77bb23 100644
--- a/gcc/testsuite/gfortran.dg/goacc/loop-2-kernels.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/loop-2-kernels.f95
@@ -35,7 +35,7 @@ program test
        DO j = 1,10
        ENDDO
      ENDDO
-    !$acc loop seq gang ! { dg-error "'seq' overrides other OpenACC loop 
specifiers" }
+    !$acc loop seq gang ! { dg-error "'seq' overrides other OpenACC loop 
specifiers" "TODO" { xfail *-*-* } }
      DO i = 1,10
      ENDDO

@@ -56,11 +56,11 @@ program test
        !$acc loop worker ! { dg-error "inner loop uses same OpenACC parallelism 
as containing loop" }
        DO j = 1,10
        ENDDO
-      !$acc loop gang ! { dg-error "" "TODO" { xfail *-*-* } }
+      !$acc loop gang
        DO j = 1,10
        ENDDO
      ENDDO
-    !$acc loop seq worker ! { dg-error "'seq' overrides other OpenACC loop 
specifiers" }
+    !$acc loop seq worker ! { dg-error "'seq' overrides other OpenACC loop 
specifiers" "TODO" { xfail *-*-* } }
      DO i = 1,10
      ENDDO
      !$acc loop gang worker
@@ -81,14 +81,14 @@ program test
        !$acc loop vector ! { dg-error "inner loop uses same OpenACC parallelism 
as containing loop" }
        DO j = 1,10
        ENDDO
-      !$acc loop worker ! { dg-error "" "TODO" { xfail *-*-* } }
+      !$acc loop worker
        DO j = 1,10
        ENDDO
-      !$acc loop gang ! { dg-error "" "TODO" { xfail *-*-* } }
+      !$acc loop gang
        DO j = 1,10
        ENDDO
      ENDDO
-    !$acc loop seq vector ! { dg-error "'seq' overrides other OpenACC loop 
specifiers" }
+    !$acc loop seq vector ! { dg-error "'seq' overrides other OpenACC loop 
specifiers" "TODO" { xfail *-*-* } }
      DO i = 1,10
      ENDDO
      !$acc loop gang vector
@@ -101,7 +101,7 @@ program test
      !$acc loop auto
      DO i = 1,10
      ENDDO
-    !$acc loop seq auto ! { dg-error "'seq' overrides other OpenACC loop 
specifiers" }
+    !$acc loop seq auto ! { dg-error "'seq' overrides other OpenACC loop 
specifiers" "TODO" { xfail *-*-* } }
      DO i = 1,10
      ENDDO
      !$acc loop gang auto ! { dg-error "'auto' conflicts with other OpenACC 
loop specifiers" }
@@ -133,7 +133,7 @@ program test
    !$acc kernels loop gang(static:*)
    DO i = 1,10
    ENDDO
-  !$acc kernels loop seq gang ! { dg-error "'seq' overrides other OpenACC loop 
specifiers" }
+  !$acc kernels loop seq gang ! { dg-error "'seq' overrides other OpenACC loop 
specifiers" "TODO" { xfail *-*-* } }
    DO i = 1,10
    ENDDO

@@ -146,7 +146,7 @@ program test
    !$acc kernels loop worker(num:5)
    DO i = 1,10
    ENDDO
-  !$acc kernels loop seq worker ! { dg-error "'seq' overrides other OpenACC 
loop specifiers" }
+  !$acc kernels loop seq worker ! { dg-error "'seq' overrides other OpenACC 
loop specifiers" "TODO" { xfail *-*-* } }
    DO i = 1,10
    ENDDO
    !$acc kernels loop gang worker
@@ -162,7 +162,7 @@ program test
    !$acc kernels loop vector(length:5)
    DO i = 1,10
    ENDDO
-  !$acc kernels loop seq vector ! { dg-error "'seq' overrides other OpenACC 
loop specifiers" }
+  !$acc kernels loop seq vector ! { dg-error "'seq' overrides other OpenACC 
loop specifiers" "TODO" { xfail *-*-* } }
    DO i = 1,10
    ENDDO
    !$acc kernels loop gang vector
@@ -175,7 +175,7 @@ program test
    !$acc kernels loop auto
    DO i = 1,10
    ENDDO
-  !$acc kernels loop seq auto ! { dg-error "'seq' overrides other OpenACC loop 
specifiers" }
+  !$acc kernels loop seq auto ! { dg-error "'seq' overrides other OpenACC loop 
specifiers" "TODO" { xfail *-*-* } }
    DO i = 1,10
    ENDDO
    !$acc kernels loop gang auto ! { dg-error "'auto' conflicts with other 
OpenACC loop specifiers" }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
index 7cfc364..fd339d2 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
@@ -41,6 +41,7 @@ static int state = -1;
  static acc_device_t acc_device_type;
  static int acc_device_num;
  static int num_gangs, num_workers, vector_length;
+static int async;


  static void cb_enqueue_launch_start (acc_prof_info *prof_info, acc_event_info 
*event_info, acc_api_info *api_info)
@@ -58,7 +59,7 @@ static void cb_enqueue_launch_start (acc_prof_info *prof_info, 
acc_event_info *e
    assert (prof_info->device_type == acc_device_type);
    assert (prof_info->device_number == acc_device_num);
    assert (prof_info->thread_id == -1);
-  assert (prof_info->async == acc_async_sync);
+  assert (prof_info->async == async);
    assert (prof_info->async_queue == prof_info->async);
    assert (prof_info->src_file == NULL);
    assert (prof_info->func_name == NULL);
@@ -154,8 +155,10 @@ int main()
    acc_device_num = acc_get_device_num (acc_device_type);
    assert (state == 0);

-  /* Parallelism dimensions: compiler/runtime decides.  */
    STATE_OP (state, = 0);
+  /* Implicit async.  */
+  async = acc_async_noval;
+  /* Parallelism dimensions: compiler/runtime decides.  */
    num_gangs = num_workers = vector_length = 0;
    {
  #define N 100
@@ -175,8 +178,10 @@ int main()
  #undef N
    }

-  /* Parallelism dimensions: literal.  */
    STATE_OP (state, = 0);
+  /* Explicit async: without argument.  */
+  async = acc_async_noval;
+  /* Parallelism dimensions: literal.  */
    num_gangs = 30;
    num_workers = 3;
    vector_length = 5;
@@ -184,6 +189,7 @@ int main()
  #define N 100
      int x[N];
  #pragma acc kernels \
+  async \
    num_gangs (30) num_workers (3) vector_length (5)
      /* { dg-prune-output "using vector_length \\(32\\), ignoring 5" } */
      {
@@ -200,8 +206,10 @@ int main()
  #undef N
    }

-  /* Parallelism dimensions: variable.  */
    STATE_OP (state, = 0);
+  /* Explicit async: variable.  */
+  async = 123;
+  /* Parallelism dimensions: variable.  */
    num_gangs = 22;
    num_workers = 5;
    vector_length = 7;
@@ -209,6 +217,7 @@ int main()
  #define N 100
      int x[N];
  #pragma acc kernels \
+  async (async) \
    num_gangs (num_gangs) num_workers (num_workers) vector_length (vector_length)
      /* { dg-prune-output "using vector_length \\(32\\), ignoring runtime 
setting" } */
      {
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
index 601e543..45d786d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
@@ -1,4 +1,3 @@
-/* { dg-additional-options "-fopenacc-kernels=split" } */
  /* { dg-additional-options "-fopt-info-optimized-omp" } */

  #undef NDEBUG
@@ -12,14 +11,14 @@ int main()

  #pragma acc kernels
    {
-    int c = 234; /* { dg-warning "note: beginning .gang-single. region in 
OpenACC .kernels. construct" } */
+    int c = 234; /* { dg-warning "optimized: beginning .gang-single. region in 
OpenACC .kernels. construct" } */

-#pragma acc loop independent gang /* { dg-warning "note: assigned OpenACC gang 
loop parallelism" } */
-    /* { dg-warning "note: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } 17 } */
+#pragma acc loop independent gang /* { dg-warning "optimized: assigned OpenACC 
gang loop parallelism" } */
+    /* { dg-warning "optimized: parallelized loop nest in OpenACC .kernels. 
construct" "" { target *-*-* } 16 } */
      for (int i = 0; i < N; ++i)
        b[i] = c;

-    a = c; /* { dg-warning "note: beginning .gang-single. region in OpenACC 
.kernels. construct" } */
+    a = c; /* { dg-warning "optimized: beginning .gang-single. region in 
OpenACC .kernels. construct" } */
    }

    for (int i = 0; i < N; ++i)
-- 
2.8.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions
  2019-07-17 21:03 ` [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions Kwok Cheung Yeung
                     ` (9 preceding siblings ...)
  2019-07-17 21:37   ` [PATCH 10/10, OpenACC] Make new OpenACC kernels conversion the default; adjust and add tests Kwok Cheung Yeung
@ 2019-07-18  9:24   ` Jakub Jelinek
  10 siblings, 0 replies; 33+ messages in thread
From: Jakub Jelinek @ 2019-07-18  9:24 UTC (permalink / raw)
  To: Kwok Cheung Yeung; +Cc: gcc-patches, Thomas Schwinge

On Wed, Jul 17, 2019 at 10:02:18PM +0100, Kwok Cheung Yeung wrote:
> This series of patches reworks the way that OpenACC kernels regions are
> processed by GCC. Instead of relying on the parloops pass for
> auto-parallelisation of the kernel region, the contents of the region are
> transformed into a sequence of offloaded regions, which are then processed
> individually.
> 
> Tested on an x86_64 host, with offloading to a Nvidia Tesla K20c card.

So, what is the state of this series?  Has Thomas reviewed it and acked from
OpenACC side?  Which particular patches you want me to look at from the
OpenMP vs. OpenACC interaction?

	Jakub

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10, OpenACC] Add OpenACC target kinds for decomposed kernels regions
  2019-07-17 21:05   ` [PATCH 02/10, OpenACC] Add OpenACC target kinds for decomposed kernels regions Kwok Cheung Yeung
@ 2019-07-18  9:28     ` Jakub Jelinek
  2019-08-05 22:31       ` Kwok Cheung Yeung
  0 siblings, 1 reply; 33+ messages in thread
From: Jakub Jelinek @ 2019-07-18  9:28 UTC (permalink / raw)
  To: Kwok Cheung Yeung; +Cc: gcc-patches, Thomas Schwinge

On Wed, Jul 17, 2019 at 10:04:10PM +0100, Kwok Cheung Yeung wrote:
> @@ -2319,7 +2339,8 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
>      {
>        omp_context *tgt = enclosing_target_ctx (outer_ctx);
> 
> -      if (!tgt || is_oacc_parallel (tgt))
> +      if (!tgt || (is_oacc_parallel (tgt)
> +                    && !was_originally_oacc_kernels (tgt)))
>  	for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
>  	  {
>  	    char const *check = NULL;

Please watch up formatting, the above doesn't use tabs where it should.
Have you run the series through contrib/check_GNU_style.sh ?

Otherwise, no concerns about this particular patch, assuming Thomas is ok
with it.

	Jakub

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/10, OpenACC] Turn OpenACC kernels regions into a sequence of, parallel regions
  2019-07-17 21:11   ` [PATCH 04/10, OpenACC] Turn OpenACC kernels regions into a sequence of, parallel regions Kwok Cheung Yeung
@ 2019-07-18 10:09     ` Jakub Jelinek
  2019-08-05 21:58       ` Kwok Cheung Yeung
  0 siblings, 1 reply; 33+ messages in thread
From: Jakub Jelinek @ 2019-07-18 10:09 UTC (permalink / raw)
  To: Kwok Cheung Yeung; +Cc: gcc-patches, Thomas Schwinge

On Wed, Jul 17, 2019 at 10:06:07PM +0100, Kwok Cheung Yeung wrote:
> --- a/gcc/omp-oacc-kernels.c
> +++ b/gcc/omp-oacc-kernels.c
> @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "backend.h"
>  #include "target.h"
>  #include "tree.h"
> +#include "cp/cp-tree.h"

No, you certainly don't want to do this.  Use langhooks if needed, though
that can be only for stuff done before IPA.  After IPA, because of LTO FE, you
must not rely on anything that is not in the IL generically.

> +  /* Build the gang-single region.  */
> +  gimple *single_region
> +    = gimple_build_omp_target (
> +        NULL,
> +        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE,
> +        gang_single_clause);

Formatting, both lack of tab uses, and ( at the end of line is very ugly.
Either try to use shorter GF_OMP_TARGET_KIND names, or say set a local
variable to that value and use that as an argument to the function to make
it shorter and more readable.

	Jakub

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/10, OpenACC] Turn OpenACC kernels regions into a sequence of, parallel regions
  2019-07-18 10:09     ` Jakub Jelinek
@ 2019-08-05 21:58       ` Kwok Cheung Yeung
  2020-11-13 22:33         ` In 'gcc/omp-oacc-kernels-decompose.cc', use langhook instead of accessing language-specific decl information (was: [PATCH 04/10, OpenACC] Turn OpenACC kernels regions into a sequence of, parallel regions) Thomas Schwinge
  0 siblings, 1 reply; 33+ messages in thread
From: Kwok Cheung Yeung @ 2019-08-05 21:58 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Thomas Schwinge

On 18/07/2019 10:30 am, Jakub Jelinek wrote:
> On Wed, Jul 17, 2019 at 10:06:07PM +0100, Kwok Cheung Yeung wrote:
>> --- a/gcc/omp-oacc-kernels.c
>> +++ b/gcc/omp-oacc-kernels.c
>> @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
>>   #include "backend.h"
>>   #include "target.h"
>>   #include "tree.h"
>> +#include "cp/cp-tree.h"
> 
> No, you certainly don't want to do this.  Use langhooks if needed, though
> that can be only for stuff done before IPA.  After IPA, because of LTO FE, you
> must not rely on anything that is not in the IL generically.
> 

I have modified the patch to use the get_generic_function_decl langhook 
to determine whether current_function_decl is an instantiation of a 
template (in this case, we don't care what the generic decl is - just 
whether the function decl has one).

>> +  /* Build the gang-single region.  */
>> +  gimple *single_region
>> +    = gimple_build_omp_target (
>> +        NULL,
>> +        GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE,
>> +        gang_single_clause);
> 
> Formatting, both lack of tab uses, and ( at the end of line is very ugly.
> Either try to use shorter GF_OMP_TARGET_KIND names, or say set a local
> variable to that value and use that as an argument to the function to make
> it shorter and more readable.
> 

I have fixed this and other formatting problems in the patch, though 
this particular code is modified in patch 08 (New OpenACC kernels region 
decompose algorithm) anyway.

Kwok


2019-07-16  Gergö Barany  <gergo@codesourcery.com>
             Kwok Cheung Yeung  <kcy@codesourcery.com>

	gcc/
	* omp-oacc-kernels.c: Include langhooks.h.
	(top_level_omp_for_in_stmt): New function.
	(make_gang_single_region): Likewise.
	(transform_kernels_loop_clauses, make_gang_parallel_loop_region):
	Likewise.
	(flatten_binds): Likewise.
	(make_data_region_try_statement): Likewise.
	(maybe_build_inner_data_region): Likewise.
	(decompose_kernels_region_body): Likewise.
	(transform_kernels_region): Delegate to decompose_kernels_region_body
	and make_data_region_try_statement.

	gcc/testsuite/
	* c-c++-common/goacc/kernels-conversion.c: Test for a gang-single
	region.
	* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
---
  gcc/omp-oacc-kernels.c                             | 562 
++++++++++++++++++++-
  .../c-c++-common/goacc/kernels-conversion.c        |  11 +-
  .../gfortran.dg/goacc/kernels-conversion.f95       |  11 +-
  3 files changed, 561 insertions(+), 23 deletions(-)

diff --git a/gcc/omp-oacc-kernels.c b/gcc/omp-oacc-kernels.c
index 5ec24f3..03c18dc 100644
--- a/gcc/omp-oacc-kernels.c
+++ b/gcc/omp-oacc-kernels.c
@@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "backend.h"
  #include "target.h"
  #include "tree.h"
+#include "langhooks.h"
  #include "gimple.h"
  #include "tree-pass.h"
  #include "cgraph.h"
@@ -45,16 +46,552 @@ along with GCC; see the file COPYING3.  If not see
     For now, the translation is as follows:
     - The entire kernels region is turned into a data region with clauses
       taken from the kernels region.  New "create" clauses are added 
for all
-     variables declared at the top level in the kernels region.  */
+     variables declared at the top level in the kernels region.
+   - Any loop annotated with an OpenACC loop directive is wrapped in a new
+     parallel region.  Gang/worker/vector annotations are copied from the
+     original kernels region if present.
+     * Loops without an explicit "independent" or "seq" annotation get an
+       "auto" annotation; other annotations are preserved on the loop or
+       moved to the new surrounding parallel region.  Which annotations are
+       moved is determined by the constraints in the OpenACC spec; for
+       example, loops in the kernels region may have a gang clause, but
+       such annotations must now be moved to the new parallel region.
+   - Any sequences of other code (non-loops, non-OpenACC loops) are wrapped
+     in new "gang-single" parallel regions: Worker/vector annotations are
+     copied from the original kernels region if present, but num_gangs is
+     explicitly set to 1.  */
+
+/* Helper function for decompose_kernels_region_body.  If STMT contains a
+   "top-level" OMP_FOR statement, returns a pointer to that statement;
+   returns NULL otherwise.
+
+   A "top-level" OMP_FOR statement is one that is possibly accompanied by
+   small snippets of setup code.  Specifically, this function accepts an
+   OMP_FOR possibly wrapped in a singleton bind and a singleton try
+   statement to allow for a local loop variable, but not an OMP_FOR
+   statement nested in any other constructs.  Alternatively, it accepts a
+   non-singleton bind containing only assignments and then an OMP_FOR
+   statement at the very end.  The former style can be generated by the C
+   frontend, the latter by the Fortran frontend.  */
+
+static gimple *
+top_level_omp_for_in_stmt (gimple *stmt)
+{
+  if (gimple_code (stmt) == GIMPLE_OMP_FOR)
+    return stmt;
+
+  if (gimple_code (stmt) == GIMPLE_BIND)
+    {
+      gimple_seq body = gimple_bind_body (as_a <gbind *> (stmt));
+      if (gimple_seq_singleton_p (body))
+	{
+	  /* Accept an OMP_FOR statement, or a try statement containing only
+	     a single OMP_FOR.  */
+	  gimple *maybe_for_or_try = gimple_seq_first_stmt (body);
+	  if (gimple_code (maybe_for_or_try) == GIMPLE_OMP_FOR)
+	    return maybe_for_or_try;
+	  else if (gimple_code (maybe_for_or_try) == GIMPLE_TRY)
+	    {
+	      gimple_seq try_body = gimple_try_eval (maybe_for_or_try);
+	      if (!gimple_seq_singleton_p (try_body))
+		return NULL;
+	      gimple *maybe_omp_for_stmt = gimple_seq_first_stmt (try_body);
+	      if (gimple_code (maybe_omp_for_stmt) == GIMPLE_OMP_FOR)
+		return maybe_omp_for_stmt;
+	    }
+	}
+      else
+	{
+	  gimple_stmt_iterator gsi;
+	  /* Accept only a block of optional assignments followed by an
+	     OMP_FOR at the end.  No other kinds of statements allowed.  */
+	  for (gsi = gsi_start (body); !gsi_end_p (gsi); gsi_next (&gsi))
+	    {
+	      gimple *body_stmt = gsi_stmt (gsi);
+	      if (gimple_code (body_stmt) == GIMPLE_ASSIGN)
+		continue;
+	      else if (gimple_code (body_stmt) == GIMPLE_OMP_FOR
+		       && gsi_one_before_end_p (gsi))
+		return body_stmt;
+	      else
+		return NULL;
+	    }
+	}
+    }
+
+  return NULL;
+}
+
+/* Construct a "gang-single" OpenACC parallel region at LOC containing the
+   STMTS.  The newly created region is annotated with CLAUSES, which must
+   not contain a num_gangs clause, and an additional "num_gangs(1)" clause
+   to force gang-single execution.  */
+
+static gimple *
+make_gang_single_region (location_t loc, gimple_seq stmts, tree clauses)
+{
+  /* This correctly unshares the entire clause chain rooted here.  */
+  clauses = unshare_expr (clauses);
+  /* Make a num_gangs(1) clause.  */
+  tree gang_single_clause = build_omp_clause (loc, OMP_CLAUSE_NUM_GANGS);
+  OMP_CLAUSE_OPERAND (gang_single_clause, 0) = integer_one_node;
+  OMP_CLAUSE_CHAIN (gang_single_clause) = clauses;
+
+  /* Build the gang-single region.  */
+  const enum gf_mask region_kind
+    = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE;
+  gimple *single_region
+    = gimple_build_omp_target (NULL,
+			       region_kind,
+			       gang_single_clause);
+  gimple_set_location (single_region, loc);
+  gbind *single_body = gimple_build_bind (NULL, stmts, make_node (BLOCK));
+  gimple_omp_set_body (single_region, single_body);
+
+  return single_region;
+}
+
+/* Helper for make_region_loop_nest.  Transform OpenACC 'kernels'/'loop'
+   construct clauses into OpenACC 'parallel'/'loop' construct ones.  */
+
+static tree
+transform_kernels_loop_clauses (gimple *omp_for,
+				tree num_gangs_clause,
+				tree clauses)
+{
+  /* If this loop in a kernels region does not have an explicit
+     "independent", "seq", or "auto" clause, we must give it an explicit
+     "auto" clause.  */
+  bool add_auto_clause = true;
+  tree loop_clauses = gimple_omp_for_clauses (omp_for);
+  for (tree c = loop_clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_AUTO
+	  || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_INDEPENDENT
+	  || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_SEQ)
+	{
+	  add_auto_clause = false;
+	  break;
+	}
+    }
+  if (add_auto_clause)
+    {
+      tree auto_clause = build_omp_clause (gimple_location (omp_for),
+					   OMP_CLAUSE_AUTO);
+      OMP_CLAUSE_CHAIN (auto_clause) = loop_clauses;
+      gimple_omp_for_set_clauses (omp_for, auto_clause);
+    }
+
+  /* If the kernels region had a num_gangs clause, add that to this new
+     parallel region.  */
+  if (num_gangs_clause != NULL)
+    {
+      tree parallel_num_gangs_clause = unshare_expr (num_gangs_clause);
+      OMP_CLAUSE_CHAIN (parallel_num_gangs_clause) = clauses;
+      clauses = parallel_num_gangs_clause;
+    }
+
+  return clauses;
+}
+
+/* Construct a possibly gang-parallel OpenACC parallel region 
containing the
+   STMT, which must be identical to, or a bind containing, the loop OMP_FOR
+   with OpenACC loop annotations.
+
+   The newly created region is annotated with the optional NUM_GANGS_CLAUSE
+   as well as the other CLAUSES, which must not contain a num_gangs 
clause.  */
+
+static gimple *
+make_gang_parallel_loop_region (gimple *omp_for, gimple *stmt,
+				tree num_gangs_clause, tree clauses)
+{
+  /* This correctly unshares the entire clause chain rooted here.  */
+  clauses = unshare_expr (clauses);
+
+  clauses = transform_kernels_loop_clauses (omp_for,
+					    num_gangs_clause,
+					    clauses);
+
+  /* Now build the parallel region containing this loop.  */
+  gimple_seq parallel_body = NULL;
+  gimple_seq_add_stmt (&parallel_body, stmt);
+  gimple *parallel_body_bind
+    = gimple_build_bind (NULL, parallel_body, make_node (BLOCK));
+  gimple *parallel_region
+    = gimple_build_omp_target (
+	parallel_body_bind,
+	GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED,
+	clauses);
+  gimple_set_location (parallel_region, gimple_location (stmt));
+
+  return parallel_region;
+}
+
+/* Eliminate any binds directly inside BIND by adding their statements to
+   BIND (i.e., modifying it in place), excluding binds that hold only an
+   OMP_FOR loop and associated setup/cleanup code.  Recurse into binds but
+   not other statements.  Return a chain of the local variables of 
eliminated
+   binds, i.e., the local variables found in nested binds.  If
+   INCLUDE_TOPLEVEL_VARS is true, this also includes the variables 
belonging
+   to BIND itself.  */
+
+static tree
+flatten_binds (gbind *bind, bool include_toplevel_vars = false)
+{
+  tree vars = NULL, last_var = NULL;
+
+  if (include_toplevel_vars)
+    {
+      vars = gimple_bind_vars (bind);
+      last_var = vars;
+    }
+
+  gimple_seq new_body = NULL;
+  gimple_seq body_sequence = gimple_bind_body (bind);
+  gimple_stmt_iterator gsi, gsi_n;
+  for (gsi = gsi_start (body_sequence); !gsi_end_p (gsi); gsi = gsi_n)
+    {
+      /* Advance the iterator here because otherwise it would be 
invalidated
+	 by moving statements below.  */
+      gsi_n = gsi;
+      gsi_next (&gsi_n);
+
+      gimple *stmt = gsi_stmt (gsi);
+      /* Flatten bind statements, except the ones that contain only an
+	 OpenACC for loop.  */
+      if (gimple_code (stmt) == GIMPLE_BIND
+	  && !top_level_omp_for_in_stmt (stmt))
+	{
+	  gbind *inner_bind = as_a <gbind *> (stmt);
+	  /* Flatten recursively, and collect all variables.  */
+	  tree inner_vars = flatten_binds (inner_bind, true);
+	  gimple_seq inner_sequence = gimple_bind_body (inner_bind);
+	  gcc_assert (gimple_code (inner_sequence) != GIMPLE_BIND
+		      || top_level_omp_for_in_stmt (inner_sequence));
+	  gimple_seq_add_seq (&new_body, inner_sequence);
+	  /* Find the last variable; we will append others to it.  */
+	  while (last_var != NULL && TREE_CHAIN (last_var) != NULL)
+	    last_var = TREE_CHAIN (last_var);
+	  if (last_var != NULL)
+	    {
+	      TREE_CHAIN (last_var) = inner_vars;
+	      last_var = inner_vars;
+	    }
+	  else
+	    {
+	      vars = inner_vars;
+	      last_var = vars;
+	    }
+	}
+      else
+	gimple_seq_add_stmt (&new_body, stmt);
+    }
+
+  /* Put the possibly transformed body back into the bind.  */
+  gimple_bind_set_body (bind, new_body);
+  return vars;
+}
+
+/* Helper function for places where we construct data regions.  Wraps 
the BODY
+   inside a try-finally construct at LOC that calls 
__builtin_GOACC_data_end
+   in its cleanup block.  Returns this try statement.  */
+
+static gimple *
+make_data_region_try_statement (location_t loc, gimple *body)
+{
+  tree data_end_fn = builtin_decl_explicit (BUILT_IN_GOACC_DATA_END);
+  gimple *call = gimple_build_call (data_end_fn, 0);
+  gimple_seq cleanup = NULL;
+  gimple_seq_add_stmt (&cleanup, call);
+  gimple *try_stmt = gimple_build_try (body, cleanup, GIMPLE_TRY_FINALLY);
+  gimple_set_location (body, loc);
+  return try_stmt;
+}
+
+/* If INNER_BIND_VARS holds variables, build an OpenACC data region with
+   location LOC containing BODY and having "create(var)" clauses for each
+   variable.  If INNER_CLEANUP is present, add a try-finally statement with
+   this cleanup code in the finally block.  Return the new data region, or
+   the original BODY if no data region was needed.  */
+
+static gimple *
+maybe_build_inner_data_region (location_t loc, gimple *body,
+			       tree inner_bind_vars, gimple *inner_cleanup)
+{
+  /* Build data "create(var)" clauses for these local variables.
+     Below we will add these to a data region enclosing the entire body
+     of the decomposed kernels region.  */
+  tree prev_mapped_var = NULL, next = NULL, artificial_vars = NULL,
+       inner_data_clauses = NULL;
+  bool generic_inst_p
+    = lang_hooks.decls.get_generic_function_decl (current_function_decl)
+      != NULL;
+
+  for (tree v = inner_bind_vars; v; v = next)
+    {
+      next = TREE_CHAIN (v);
+      if (DECL_ARTIFICIAL (v)
+	  || TREE_CODE (v) == CONST_DECL
+	  || generic_inst_p)
+	{
+	  /* If this is an artificial temporary, it need not be mapped.  We
+	     move its declaration into the bind inside the data region.
+	     Also avoid mapping variables if we are inside a template
+	     instantiation; the code does not contain all the copies to
+	     temporaries that would make this legal.  */
+	  TREE_CHAIN (v) = artificial_vars;
+	  artificial_vars = v;
+	  if (prev_mapped_var != NULL)
+	    TREE_CHAIN (prev_mapped_var) = next;
+	  else
+	    inner_bind_vars = next;
+	}
+      else
+	{
+	  /* Otherwise, build the map clause.  */
+	  tree new_clause = build_omp_clause (loc, OMP_CLAUSE_MAP);
+	  OMP_CLAUSE_SET_MAP_KIND (new_clause, GOMP_MAP_ALLOC);
+	  OMP_CLAUSE_DECL (new_clause) = v;
+	  OMP_CLAUSE_SIZE (new_clause) = DECL_SIZE_UNIT (v);
+	  OMP_CLAUSE_CHAIN (new_clause) = inner_data_clauses;
+	  inner_data_clauses = new_clause;
+
+	  prev_mapped_var = v;
+	}
+    }
+
+  if (artificial_vars)
+    body = gimple_build_bind (artificial_vars, body, make_node (BLOCK));
+
+  /* If we determined above that there are variables that need to be 
created
+     on the device, construct a data region for them and wrap the body
+     inside that.  */
+  if (inner_data_clauses != NULL)
+    {
+      gcc_assert (inner_bind_vars != NULL);
+      gimple *inner_data_region
+	= gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_DATA_KERNELS,
+				   inner_data_clauses);
+      gimple_set_location (inner_data_region, loc);
+      /* Make sure __builtin_GOACC_data_end is called at the end.  */
+      gimple *try_stmt = make_data_region_try_statement (loc, body);
+      gimple_omp_set_body (inner_data_region, try_stmt);
+      gimple *bind_body;
+      if (inner_cleanup != NULL)
+	/* Clobber all the inner variables that need to be clobbered.  */
+	bind_body = gimple_build_try (inner_data_region, inner_cleanup,
+				      GIMPLE_TRY_FINALLY);
+      else
+	bind_body = inner_data_region;
+      body = gimple_build_bind (inner_bind_vars, bind_body, make_node 
(BLOCK));
+    }
+
+  return body;
+}
+
+/* Decompose the body of the KERNELS_REGION, which was originally annotated
+   with the KERNELS_CLAUSES, into a series of parallel regions.  */
+
+static gimple *
+decompose_kernels_region_body (gimple *kernels_region, tree 
kernels_clauses)
+{
+  location_t loc = gimple_location (kernels_region);
+
+  /* The kernels clauses will be propagated to the child clauses 
unmodified,
+     except that that num_gangs clause will only be added to loop regions.
+     The other regions are "gang-single" and get an explicit num_gangs(1)
+     clause.  So separate out the num_gangs clause here.  */
+  tree num_gangs_clause = NULL, prev_clause = NULL;
+  tree parallel_clauses = kernels_clauses;
+  for (tree c = parallel_clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_NUM_GANGS)
+	{
+	  /* Cut this clause out of the chain.  */
+	  num_gangs_clause = c;
+	  if (prev_clause != NULL)
+	    OMP_CLAUSE_CHAIN (prev_clause) = OMP_CLAUSE_CHAIN (c);
+	  else
+	    kernels_clauses = OMP_CLAUSE_CHAIN (c);
+	  OMP_CLAUSE_CHAIN (num_gangs_clause) = NULL;
+	  break;
+	}
+      else
+	prev_clause = c;
+    }
+
+  gimple *kernels_body = gimple_omp_body (kernels_region);
+  gbind *kernels_bind = as_a <gbind *> (kernels_body);
+
+  /* The body of the region may contain other nested binds declaring inner
+     local variables.  Collapse all these binds into one to ensure that we
+     have a single sequence of statements to iterate over; also, 
collect all
+     inner variables.  */
+  tree inner_bind_vars = flatten_binds (kernels_bind);
+  gimple_seq body_sequence = gimple_bind_body (kernels_bind);
+
+  /* All these inner variables will get allocated on the device (below, by
+     calling maybe_build_inner_data_region).  Here we create "present"
+     clauses for them and add these clauses to the list of clauses to be
+     attached to each inner parallel region.  */
+  tree present_clauses = kernels_clauses;
+  for (tree var = inner_bind_vars; var; var = TREE_CHAIN (var))
+    {
+      if (!DECL_ARTIFICIAL (var) && TREE_CODE (var) != CONST_DECL)
+	{
+	  tree present_clause = build_omp_clause (loc, OMP_CLAUSE_MAP);
+	  OMP_CLAUSE_SET_MAP_KIND (present_clause, GOMP_MAP_FORCE_PRESENT);
+	  OMP_CLAUSE_DECL (present_clause) = var;
+	  OMP_CLAUSE_SIZE (present_clause) = DECL_SIZE_UNIT (var);
+	  OMP_CLAUSE_CHAIN (present_clause) = present_clauses;
+	  present_clauses = present_clause;
+	}
+    }
+  kernels_clauses = present_clauses;
+
+  /* In addition to nested binds, the "real" body of the region may be
+     nested inside a try-finally block.  Find its cleanup block, which
+     contains code to clobber the local variables that must be 
clobbered.  */
+  gimple *inner_cleanup = NULL;
+  if (body_sequence != NULL && gimple_code (body_sequence) == GIMPLE_TRY)
+    {
+      if (gimple_seq_singleton_p (body_sequence))
+	{
+	  /* The try statement is the only thing inside the bind.  */
+	  inner_cleanup = gimple_try_cleanup (body_sequence);
+	  body_sequence = gimple_try_eval (body_sequence);
+	}
+      else
+	{
+	  /* The bind's body starts with a try statement, but it is followed
+	     by other things.  */
+	  gimple_stmt_iterator gsi = gsi_start (body_sequence);
+	  gimple *try_stmt = gsi_stmt (gsi);
+	  inner_cleanup = gimple_try_cleanup (try_stmt);
+	  gimple *try_body = gimple_try_eval (try_stmt);
+
+	  gsi_remove (&gsi, false);
+	  /* Now gsi indicates the sequence of statements after the try
+	     statement in the bind.  Append the statement in the try body and
+	     the trailing statements from gsi.  */
+	  gsi_insert_seq_before (&gsi, try_body, GSI_CONTINUE_LINKING);
+	  body_sequence = gsi_stmt (gsi);
+	}
+    }
+
+  /* This sequence will collect all the top-level statements in the body of
+     the data region we are about to construct.  */
+  gimple_seq region_body = NULL;
+  /* This sequence will collect consecutive statements to be put into a
+     gang-single region.  */
+  gimple_seq gang_single_seq = NULL;
+  /* Flag recording whether the gang_single_seq only contains copies to
+     local variables.  These may be loop setup code that should not be
+     separated from the loop.  */
+  bool only_simple_assignments = true;
+
+  /* Iterate over the statements in the kernels region's body.  */
+  gimple_stmt_iterator gsi, gsi_n;
+  for (gsi = gsi_start (body_sequence); !gsi_end_p (gsi); gsi = gsi_n)
+    {
+      /* Advance the iterator here because otherwise it would be 
invalidated
+	 by moving statements below.  */
+      gsi_n = gsi;
+      gsi_next (&gsi_n);
+
+      gimple *stmt = gsi_stmt (gsi);
+      gimple *omp_for = top_level_omp_for_in_stmt (stmt);
+      if (omp_for != NULL)
+	{
+	  /* This is an OMP for statement, put it into a parallel region.
+	     But first, construct a gang-single region containing any
+	     complex sequential statements we may have seen.  */
+	  if (gang_single_seq != NULL && !only_simple_assignments)
+	    {
+	      gimple *single_region
+		= make_gang_single_region (loc, gang_single_seq,
+					   kernels_clauses);
+	      gimple_seq_add_stmt (&region_body, single_region);
+	    }
+	  else if (gang_single_seq != NULL && only_simple_assignments)
+	    {
+	      /* There is a sequence of sequential statements preceding this
+		 loop, but they are all simple assignments.  This is
+		 probably setup code for the loop; in particular, Fortran DO
+		 loops are preceded by code to copy the loop limit variable
+		 to a temporary.  Group this code together with the loop
+		 itself.  */
+	      gimple_seq_add_stmt (&gang_single_seq, stmt);
+	      stmt = gimple_build_bind (NULL, gang_single_seq,
+					make_node (BLOCK));
+	    }
+	  gang_single_seq = NULL;
+	  only_simple_assignments = true;
+
+	  gimple *parallel_region
+	    = make_gang_parallel_loop_region (omp_for, stmt,
+					      num_gangs_clause,
+					      kernels_clauses);
+	  gimple_seq_add_stmt (&region_body, parallel_region);
+	}
+      else
+	{
+	  /* This is not an OMP for statement, so it will be put into a
+	     gang-single region.  */
+	  gimple_seq_add_stmt (&gang_single_seq, stmt);
+	  /* Is this a simple assignment? We call it simple if it is an
+	     assignment to an artificial local variable.  This captures
+	     Fortran loop setup code computing loop bounds and offsets.  */
+	  bool is_simple_assignment
+	    = (gimple_code (stmt) == GIMPLE_ASSIGN
+	       && TREE_CODE (gimple_assign_lhs (stmt)) == VAR_DECL
+	       && DECL_ARTIFICIAL (gimple_assign_lhs (stmt)));
+	  if (!is_simple_assignment)
+	    only_simple_assignments = false;
+	}
+    }
+
+  /* If we did not emit a new region, and are not going to emit one now
+     (that is, the original region was empty), prepare to emit a dummy 
so as
+     to preserve the original construct, which other processing (at least
+     test cases) depend on.  */
+  if (region_body == NULL && gang_single_seq == NULL)
+    {
+      gimple *stmt = gimple_build_nop ();
+      gimple_set_location (stmt, loc);
+      gimple_seq_add_stmt (&gang_single_seq, stmt);
+    }
+
+  /* Gather up any remaining gang-single statements.  */
+  if (gang_single_seq != NULL)
+    {
+      gimple *single_region
+	= make_gang_single_region (loc, gang_single_seq, kernels_clauses);
+      gimple_seq_add_stmt (&region_body, single_region);
+    }
+
+  tree kernels_locals = gimple_bind_vars (as_a <gbind *> (kernels_body));
+  gimple *body = gimple_build_bind (kernels_locals, region_body,
+				    make_node (BLOCK));
+
+  /* If we found variables declared in nested scopes, build a data 
region to
+     map them to the device.  */
+  body = maybe_build_inner_data_region (loc, body, inner_bind_vars,
+					inner_cleanup);
+
+  return body;
+}

  /* Transform KERNELS_REGION, which is an OpenACC kernels region, into 
a data
-   region containing the original kernels region.  */
+   region containing the original kernels region's body cut up into a
+   sequence of parallel regions.  */

  static gimple *
  transform_kernels_region (gimple *kernels_region)
  {
    gcc_checking_assert (gimple_omp_target_kind (kernels_region)
  		       == GF_OMP_TARGET_KIND_OACC_KERNELS);
+  location_t loc = gimple_location (kernels_region);

    /* Collect the kernels region's data clauses and create the new data
       region with those clauses.  */
@@ -130,26 +667,17 @@ transform_kernels_region (gimple *kernels_region)
    gimple *data_region
      = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_DATA_KERNELS,
  			       data_clauses);
-  gimple_set_location (data_region, gimple_location (kernels_region));
-
-  /* For now, just construct a new parallel region inside the data 
region.  */
-  gimple *inner_region
-    = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_PARALLEL,
-			       kernels_clauses);
-  gimple_set_location (inner_region, gimple_location (kernels_region));
-  gimple_omp_set_body (inner_region, gimple_omp_body (kernels_region));
+  gimple_set_location (data_region, loc);

-  gbind *bind = gimple_build_bind (NULL, NULL, NULL);
-  gimple_bind_add_stmt (bind, inner_region);
+  /* Transform the body of the kernels region into a sequence of parallel
+     regions.  */
+  gimple *body = decompose_kernels_region_body (kernels_region,
+						kernels_clauses);

    /* Put the transformed pieces together.  The entire body of the 
region is
       wrapped in a try-finally statement that calls 
__builtin_GOACC_data_end
       for cleanup.  */
-  tree data_end_fn = builtin_decl_explicit (BUILT_IN_GOACC_DATA_END);
-  gimple *call = gimple_build_call (data_end_fn, 0);
-  gimple_seq cleanup = NULL;
-  gimple_seq_add_stmt (&cleanup, call);
-  gimple *try_stmt = gimple_build_try (bind, cleanup, GIMPLE_TRY_FINALLY);
+  gimple *try_stmt = make_data_region_try_statement (loc, body);
    gimple_omp_set_body (data_region, try_stmt);

    return data_region;
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c 
b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
index c75db37..ec5db02 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-conversion.c
@@ -18,6 +18,7 @@ main (void)
        sum += a[i];

      sum++;
+    a[0]++;

      #pragma acc loop
      for (i = 0; i < N; ++i)
@@ -27,10 +28,14 @@ main (void)
    return 0;
  }

-/* Check that the kernels region is split into a data region and an 
enclosed
-   parallel region.  */
+/* Check that the kernels region is split into a data region and enclosed
+   parallel regions.  */
  /* { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 
"convert_oacc_kernels" } } */
-/* { dg-final { scan-tree-dump-times "oacc_parallel" 1 
"convert_oacc_kernels" } } */
+
+/* The two loop regions are parallelized, the sequential part in between is
+   made gang-single.  */
+/* { dg-final { scan-tree-dump-times 
"oacc_parallel_kernels_parallelized" 2 "convert_oacc_kernels" } } */
+/* { dg-final { scan-tree-dump-times 
"oacc_parallel_kernels_gang_single" 1 "convert_oacc_kernels" } } */

  /* Check that the original kernels region is removed.  */
  /* { dg-final { scan-tree-dump-not "oacc_kernels" 
"convert_oacc_kernels" } } */
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95 
b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
index 8c66330..4aba2b1 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95
@@ -15,6 +15,7 @@ program main
    end do

    sum = sum + 1
+  a(1) = a(1) + 1

    !$acc loop
    do i = 1, N
@@ -24,10 +25,14 @@ program main
    !$acc end kernels
  end program main

-! Check that the kernels region is split into a data region and an enclosed
-! parallel region.
+! Check that the kernels region is split into a data region and enclosed
+! parallel regions.
  ! { dg-final { scan-tree-dump-times "oacc_data_kernels" 1 
"convert_oacc_kernels" } }
-! { dg-final { scan-tree-dump-times "oacc_parallel" 1 
"convert_oacc_kernels" } }
+
+! The two loop regions are parallelized, the sequential part in between is
+! made gang-single.
+! { dg-final { scan-tree-dump-times 
"oacc_parallel_kernels_parallelized" 2 "convert_oacc_kernels" } }
+! { dg-final { scan-tree-dump-times "oacc_parallel_kernels_gang_single" 
1 "convert_oacc_kernels" } }

  ! Check that the original kernels region is removed.
  ! { dg-final { scan-tree-dump-not "oacc_kernels" 
"convert_oacc_kernels" } }
-- 
2.8.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/10, OpenACC] Adjust parallelism of loops in gang-single parts of OpenACC kernels regions
  2019-07-17 21:13   ` [PATCH 06/10, OpenACC] Adjust parallelism of loops in gang-single parts of OpenACC " Kwok Cheung Yeung
@ 2019-08-05 22:17     ` Kwok Cheung Yeung
  0 siblings, 0 replies; 33+ messages in thread
From: Kwok Cheung Yeung @ 2019-08-05 22:17 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek; +Cc: Thomas Schwinge

The change to patch 04 (Turn OpenACC kernels regions into a sequence of 
parallel regions) necessitates an additional include of 
'diagnostic-core.h' in omp-oacc-kernels.c, as it is no longer indirectly 
included by 'cp/cp-tree.h'.

Kwok

On 17/07/2019 10:12 pm, Kwok Cheung Yeung wrote:
> Loops in gang-single parts of kernels regions cannot be executed in
> gang-redundant mode. If the user specified gang clauses on such loops, 
> emit an error and remove these clauses. Adjust automatic partitioning to 
> exclude gang partitioning in gang-single regions.
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10, OpenACC] Add OpenACC target kinds for decomposed kernels regions
  2019-07-18  9:28     ` Jakub Jelinek
@ 2019-08-05 22:31       ` Kwok Cheung Yeung
  0 siblings, 0 replies; 33+ messages in thread
From: Kwok Cheung Yeung @ 2019-08-05 22:31 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Thomas Schwinge

I have run the whole patch series through check_GNU_style.sh and fixed 
up the formatting where indicated. Do I need to post the reformatted 
patchset?

Thanks

Kwok

On 18/07/2019 10:24 am, Jakub Jelinek wrote:
> On Wed, Jul 17, 2019 at 10:04:10PM +0100, Kwok Cheung Yeung wrote:
>> @@ -2319,7 +2339,8 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
>>       {
>>         omp_context *tgt = enclosing_target_ctx (outer_ctx);
>>
>> -      if (!tgt || is_oacc_parallel (tgt))
>> +      if (!tgt || (is_oacc_parallel (tgt)
>> +                    && !was_originally_oacc_kernels (tgt)))
>>   	for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
>>   	  {
>>   	    char const *check = NULL;
> 
> Please watch up formatting, the above doesn't use tabs where it should.
> Have you run the series through contrib/check_GNU_style.sh ?
> 
> Otherwise, no concerns about this particular patch, assuming Thomas is ok
> with it.
> 
> 	Jakub
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs (was: [og8] OpenACC 'kernels' construct changes: splitting of the construct into several regions)
  2019-02-01  0:00 [og8] OpenACC 'kernels' construct changes: splitting of the construct into several regions Thomas Schwinge
  2019-02-01 19:48 ` Thomas Schwinge
  2019-07-17 21:03 ` [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions Kwok Cheung Yeung
@ 2020-11-13 22:22 ` Thomas Schwinge
  2020-11-15  9:14   ` Tobias Burnus
                     ` (5 more replies)
  2 siblings, 6 replies; 33+ messages in thread
From: Thomas Schwinge @ 2020-11-13 22:22 UTC (permalink / raw)
  To: gcc-patches, Frederik Harwath; +Cc: fortran, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 847 bytes --]

Hi!

On 2019-02-01T00:59:30+0100, I wrote:
> I've just pushed the attached nine patches to openacc-gcc-8-branch:
> OpenACC 'kernels' construct changes: splitting of the construct into
> several regions.

Now, slightly more polished, I've pushed to master branch a variant of
most of these patches combined in commit
e898ce7997733c29dcab9c3c62ca102c7f9fa6eb "Decompose OpenACC 'kernels'
constructs into parts, a sequence of compute constructs", see attached.

> There's more work to be done there, and we're aware of a number of TODO
> items, but nevertheless: it's a good first step.

That's still the case...  :-)


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Decompose-OpenACC-kernels-constructs-into-parts-a-se.patch --]
[-- Type: text/x-diff, Size: 116844 bytes --]

From e898ce7997733c29dcab9c3c62ca102c7f9fa6eb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Gerg=C3=B6=20Barany?= <gergo@codesourcery.com>
Date: Fri, 1 Feb 2019 00:59:30 +0100
Subject: [PATCH] Decompose OpenACC 'kernels' constructs into parts, a sequence
 of compute constructs

Not yet enabled by default: for now, the current mode of OpenACC 'kernels'
constructs handling still remains '-fopenacc-kernels=parloops', but that is to
change later.

	gcc/
	* omp-oacc-kernels-decompose.cc: New.
	* Makefile.in (OBJS): Add it.
	* passes.def: Instantiate it.
	* tree-pass.h (make_pass_omp_oacc_kernels_decompose): Declare.
	* flag-types.h (enum openacc_kernels): Add.
	* doc/invoke.texi (-fopenacc-kernels): Document.
	* gimple.h (enum gf_mask): Add
	'GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED',
	'GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE',
	'GF_OMP_TARGET_KIND_OACC_DATA_KERNELS'.
	(is_gimple_omp_oacc, is_gimple_omp_offloaded): Handle these.
	* gimple-pretty-print.c (dump_gimple_omp_target): Likewise.
	* omp-expand.c (expand_omp_target, build_omp_regions_1)
	(omp_make_gimple_edges): Likewise.
	* omp-low.c (scan_sharing_clauses, scan_omp_for)
	(check_omp_nesting_restrictions, lower_oacc_reductions)
	(lower_oacc_head_mark, lower_omp_target): Likewise.
	* omp-offload.c (execute_oacc_device_lower): Likewise.
	gcc/c-family/
	* c.opt (fopenacc-kernels): Add.
	gcc/fortran/
	* lang.opt (fopenacc-kernels): Add.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-1.c: New.
	* c-c++-common/goacc/kernels-decompose-2.c: New.
	* c-c++-common/goacc/kernels-decompose-ice-1.c: New.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: New.
	* gfortran.dg/goacc/kernels-decompose-1.f95: New.
	* gfortran.dg/goacc/kernels-decompose-2.f95: New.
	* c-c++-common/goacc/if-clause-2.c: Adjust.
	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	New.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Adjust.
	* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
---
 gcc/Makefile.in                               |    1 +
 gcc/c-family/c.opt                            |   13 +
 gcc/doc/invoke.texi                           |   14 +-
 gcc/flag-types.h                              |    7 +
 gcc/fortran/lang.opt                          |    4 +
 gcc/gimple-pretty-print.c                     |    9 +
 gcc/gimple.h                                  |   14 +
 gcc/omp-expand.c                              |   22 +
 gcc/omp-low.c                                 |   66 +-
 gcc/omp-oacc-kernels-decompose.cc             | 1531 +++++++++++++++++
 gcc/omp-offload.c                             |   19 +
 gcc/passes.def                                |    1 +
 .../c-c++-common/goacc/if-clause-2.c          |   24 +-
 .../c-c++-common/goacc/kernels-decompose-1.c  |   83 +
 .../c-c++-common/goacc/kernels-decompose-2.c  |  141 ++
 .../goacc/kernels-decompose-ice-1.c           |  108 ++
 .../goacc/kernels-decompose-ice-2.c           |   16 +
 .../gfortran.dg/goacc/kernels-decompose-1.f95 |   81 +
 .../gfortran.dg/goacc/kernels-decompose-2.f95 |  142 ++
 .../gfortran.dg/goacc/kernels-tree.f95        |    5 +
 gcc/tree-pass.h                               |    1 +
 .../declare-vla-kernels-decompose-ice-1.c     |    8 +
 .../declare-vla-kernels-decompose.c           |    6 +
 .../libgomp.oacc-c-c++-common/declare-vla.c   |    6 +
 .../kernels-decompose-1.c                     |   38 +
 .../libgomp.oacc-fortran/pr94358-1.f90        |   11 +-
 26 files changed, 2355 insertions(+), 16 deletions(-)
 create mode 100644 gcc/omp-oacc-kernels-decompose.cc
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 978a08f7b04..273654cfa25 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1480,6 +1480,7 @@ OBJS = \
 	omp-expand.o \
 	omp-general.o \
 	omp-low.o \
+	omp-oacc-kernels-decompose.o \
 	omp-simd-clone.o \
 	opt-problem.o \
 	optabs.o \
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index a0083636aed..0532cb70ffc 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1796,6 +1796,19 @@ fopenacc-dim=
 C ObjC C++ ObjC++ LTO Joined Var(flag_openacc_dims)
 Specify default OpenACC compute dimensions.
 
+fopenacc-kernels=
+C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS)
+-fopenacc-kernels=[decompose|parloops]	Specify mode of OpenACC 'kernels' constructs handling.
+
+Enum
+Name(openacc_kernels) Type(enum openacc_kernels)
+
+EnumValue
+Enum(openacc_kernels) String(decompose) Value(OPENACC_KERNELS_DECOMPOSE)
+
+EnumValue
+Enum(openacc_kernels) String(parloops) Value(OPENACC_KERNELS_PARLOOPS)
+
 fopenmp
 C ObjC C++ ObjC++ LTO Var(flag_openmp)
 Enable OpenMP (implies -frecursive in Fortran).
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 85f7969d87f..8a164ef9788 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -201,7 +201,7 @@ in the following sections.
 -aux-info @var{filename}  -fallow-parameterless-variadic-functions @gol
 -fno-asm  -fno-builtin  -fno-builtin-@var{function}  -fgimple@gol
 -fhosted  -ffreestanding @gol
--fopenacc  -fopenacc-dim=@var{geom} @gol
+-fopenacc  -fopenacc-dim=@var{geom}  -fopenacc-kernels=@var{mode} @gol
 -fopenmp  -fopenmp-simd @gol
 -fms-extensions  -fplan9-extensions  -fsso-struct=@var{endianness} @gol
 -fallow-single-precision  -fcond-mismatch  -flax-vector-conversions @gol
@@ -2589,6 +2589,18 @@ not explicitly specify.  The @var{geom} value is a triple of
 ':'-separated sizes, in order 'gang', 'worker' and, 'vector'.  A size
 can be omitted, to use a target-specific default value.
 
+@item -fopenacc-kernels=@var{mode}
+@opindex fopenacc-kernels
+@cindex OpenACC accelerator programming
+Specify mode of OpenACC `kernels' constructs handling.
+With @option{-fopenacc-kernels=decompose}, OpenACC `kernels'
+constructs are decomposed into parts, a sequence of compute
+constructs, each then handled individually.
+This is work in progress.
+With @option{-fopenacc-kernels=parloops}, OpenACC `kernels' constructs
+are handled by the @samp{parloops} pass, en bloc.
+This is the current default.
+
 @item -fopenmp
 @opindex fopenmp
 @cindex OpenMP parallel
diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index a887c75cfc7..648ed096e30 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -415,6 +415,13 @@ enum evrp_mode
   EVRP_MODE_RVRP_DEBUG = EVRP_MODE_RVRP_ONLY | EVRP_MODE_DEBUG
 };
 
+/* Modes of OpenACC 'kernels' constructs handling.  */
+enum openacc_kernels
+{
+  OPENACC_KERNELS_DECOMPOSE,
+  OPENACC_KERNELS_PARLOOPS
+};
+
 #endif
 
 #endif /* ! GCC_FLAG_TYPES_H */
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index da4b1aa879a..96ed208cb85 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -687,6 +687,10 @@ fopenacc-dim=
 Fortran LTO Joined Var(flag_openacc_dims)
 ; Documented in C
 
+fopenacc-kernels=
+Fortran RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS)
+; Documented in C
+
 fopenmp
 Fortran LTO
 ; Documented in C
diff --git a/gcc/gimple-pretty-print.c b/gcc/gimple-pretty-print.c
index a01bf901657..d97a231e7e8 100644
--- a/gcc/gimple-pretty-print.c
+++ b/gcc/gimple-pretty-print.c
@@ -1700,6 +1700,15 @@ dump_gimple_omp_target (pretty_printer *buffer, const gomp_target *gs,
     case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
       kind = " oacc_host_data";
       break;
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+      kind = " oacc_parallel_kernels_parallelized";
+      break;
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+      kind = " oacc_parallel_kernels_gang_single";
+      break;
+    case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
+      kind = " oacc_data_kernels";
+      break;
     default:
       gcc_unreachable ();
     }
diff --git a/gcc/gimple.h b/gcc/gimple.h
index b935ad4f761..8a1db3cc7db 100644
--- a/gcc/gimple.h
+++ b/gcc/gimple.h
@@ -175,6 +175,15 @@ enum gf_mask {
     GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA = 10,
     GF_OMP_TARGET_KIND_OACC_DECLARE = 11,
     GF_OMP_TARGET_KIND_OACC_HOST_DATA = 12,
+    /* A 'GF_OMP_TARGET_KIND_OACC_PARALLEL' representing an OpenACC 'kernels'
+       decomposed part, parallelized.  */
+    GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED = 13,
+    /* A 'GF_OMP_TARGET_KIND_OACC_PARALLEL' representing an OpenACC 'kernels'
+       decomposed part, "gang-single".  */
+    GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE = 14,
+    /* A 'GF_OMP_TARGET_KIND_OACC_DATA' representing an OpenACC 'kernels'
+       decomposed parts' 'data' construct.  */
+    GF_OMP_TARGET_KIND_OACC_DATA_KERNELS = 15,
     GF_OMP_TEAMS_HOST		= 1 << 0,
 
     /* True on an GIMPLE_OMP_RETURN statement if the return does not require
@@ -6511,6 +6520,9 @@ is_gimple_omp_oacc (const gimple *stmt)
 	case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
 	case GF_OMP_TARGET_KIND_OACC_DECLARE:
 	case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+	case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
 	  return true;
 	default:
 	  return false;
@@ -6536,6 +6548,8 @@ is_gimple_omp_offloaded (const gimple *stmt)
 	case GF_OMP_TARGET_KIND_OACC_PARALLEL:
 	case GF_OMP_TARGET_KIND_OACC_KERNELS:
 	case GF_OMP_TARGET_KIND_OACC_SERIAL:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
 	  return true;
 	default:
 	  return false;
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index c6ee3eb0857..b731fd69b1e 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -9257,11 +9257,14 @@ expand_omp_target (struct omp_region *region)
     case GF_OMP_TARGET_KIND_OACC_UPDATE:
     case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
     case GF_OMP_TARGET_KIND_OACC_DECLARE:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
       data_region = false;
       break;
     case GF_OMP_TARGET_KIND_DATA:
     case GF_OMP_TARGET_KIND_OACC_DATA:
     case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+    case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
       data_region = true;
       break;
     default:
@@ -9307,6 +9310,16 @@ expand_omp_target (struct omp_region *region)
 	= tree_cons (get_identifier ("oacc serial"),
 		     NULL_TREE, DECL_ATTRIBUTES (child_fn));
       break;
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+      DECL_ATTRIBUTES (child_fn)
+	= tree_cons (get_identifier ("oacc parallel_kernels_parallelized"),
+		     NULL_TREE, DECL_ATTRIBUTES (child_fn));
+      break;
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+      DECL_ATTRIBUTES (child_fn)
+	= tree_cons (get_identifier ("oacc parallel_kernels_gang_single"),
+		     NULL_TREE, DECL_ATTRIBUTES (child_fn));
+      break;
     default:
       /* Make sure we don't miss any.  */
       gcc_checking_assert (!(is_gimple_omp_oacc (entry_stmt)
@@ -9517,10 +9530,13 @@ expand_omp_target (struct omp_region *region)
     case GF_OMP_TARGET_KIND_OACC_PARALLEL:
     case GF_OMP_TARGET_KIND_OACC_KERNELS:
     case GF_OMP_TARGET_KIND_OACC_SERIAL:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
       start_ix = BUILT_IN_GOACC_PARALLEL;
       break;
     case GF_OMP_TARGET_KIND_OACC_DATA:
     case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+    case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
       start_ix = BUILT_IN_GOACC_DATA_START;
       break;
     case GF_OMP_TARGET_KIND_OACC_UPDATE:
@@ -9993,6 +10009,9 @@ build_omp_regions_1 (basic_block bb, struct omp_region *parent,
 		case GF_OMP_TARGET_KIND_OACC_SERIAL:
 		case GF_OMP_TARGET_KIND_OACC_DATA:
 		case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+		case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+		case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+		case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
 		  break;
 		case GF_OMP_TARGET_KIND_UPDATE:
 		case GF_OMP_TARGET_KIND_ENTER_DATA:
@@ -10247,6 +10266,9 @@ omp_make_gimple_edges (basic_block bb, struct omp_region **region,
 	case GF_OMP_TARGET_KIND_OACC_SERIAL:
 	case GF_OMP_TARGET_KIND_OACC_DATA:
 	case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+	case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+	case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
 	  break;
 	case GF_OMP_TARGET_KIND_UPDATE:
 	case GF_OMP_TARGET_KIND_ENTER_DATA:
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 2602189d687..a1604e0ee3c 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -193,8 +193,8 @@ static tree scan_omp_1_op (tree *, int *, void *);
       *handled_ops_p = false; \
       break;
 
-/* Return true if CTX corresponds to an OpenACC 'parallel' or 'serial'
-   region.  */
+/* Return whether CTX represents an OpenACC 'parallel' or 'serial' construct.
+   (This doesn't include OpenACC 'kernels' decomposed parts.)  */
 
 static bool
 is_oacc_parallel_or_serial (omp_context *ctx)
@@ -207,7 +207,8 @@ is_oacc_parallel_or_serial (omp_context *ctx)
 		  == GF_OMP_TARGET_KIND_OACC_SERIAL)));
 }
 
-/* Return true if CTX corresponds to an oacc kernels region.  */
+/* Return whether CTX represents an OpenACC 'kernels' construct.
+   (This doesn't include OpenACC 'kernels' decomposed parts.)  */
 
 static bool
 is_oacc_kernels (omp_context *ctx)
@@ -218,6 +219,21 @@ is_oacc_kernels (omp_context *ctx)
 	      == GF_OMP_TARGET_KIND_OACC_KERNELS));
 }
 
+/* Return whether CTX represents an OpenACC 'kernels' decomposed part.  */
+
+static bool
+is_oacc_kernels_decomposed_part (omp_context *ctx)
+{
+  enum gimple_code outer_type = gimple_code (ctx->stmt);
+  return ((outer_type == GIMPLE_OMP_TARGET)
+	  && ((gimple_omp_target_kind (ctx->stmt)
+	       == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED)
+	      || (gimple_omp_target_kind (ctx->stmt)
+		  == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE)
+	      || (gimple_omp_target_kind (ctx->stmt)
+		  == GF_OMP_TARGET_KIND_OACC_DATA_KERNELS)));
+}
+
 /* Return true if STMT corresponds to an OpenMP target region.  */
 static bool
 is_omp_target (gimple *stmt)
@@ -1200,6 +1216,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	    {
 	      /* No 'reduction' clauses on OpenACC 'kernels'.  */
 	      gcc_checking_assert (!is_oacc_kernels (ctx));
+	      /* Likewise, on OpenACC 'kernels' decomposed parts.  */
+	      gcc_checking_assert (!is_oacc_kernels_decomposed_part (ctx));
 
 	      ctx->local_reduction_clauses
 		= tree_cons (NULL, c, ctx->local_reduction_clauses);
@@ -2415,7 +2433,9 @@ enclosing_target_ctx (omp_context *ctx)
   return ctx;
 }
 
-/* Return true if ctx is part of an oacc kernels region.  */
+/* Return whether CTX's parent compute construct is an OpenACC 'kernels'
+   construct.
+   (This doesn't include OpenACC 'kernels' decomposed parts.)  */
 
 static bool
 ctx_in_oacc_kernels_region (omp_context *ctx)
@@ -2431,7 +2451,8 @@ ctx_in_oacc_kernels_region (omp_context *ctx)
   return false;
 }
 
-/* Check the parallelism clauses inside a kernels regions.
+/* Check the parallelism clauses inside a OpenACC 'kernels' region.
+   (This doesn't include OpenACC 'kernels' decomposed parts.)
    Until kernels handling moves to use the same loop indirection
    scheme as parallel, we need to do this checking early.  */
 
@@ -2533,6 +2554,10 @@ scan_omp_for (gomp_for *stmt, omp_context *outer_ctx)
 
 	    if (c_op0)
 	      {
+		/* By construction, this is impossible for OpenACC 'kernels'
+		   decomposed parts.  */
+		gcc_assert (!(tgt && is_oacc_kernels_decomposed_part (tgt)));
+
 		error_at (OMP_CLAUSE_LOCATION (c),
 			  "argument not permitted on %qs clause",
 			  omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
@@ -3070,6 +3095,8 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
 		  case GF_OMP_TARGET_KIND_OACC_PARALLEL:
 		  case GF_OMP_TARGET_KIND_OACC_KERNELS:
 		  case GF_OMP_TARGET_KIND_OACC_SERIAL:
+		  case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+		  case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
 		    ok = true;
 		    break;
 
@@ -3526,6 +3553,11 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
 	    case GF_OMP_TARGET_KIND_OACC_DECLARE: stmt_name = "declare"; break;
 	    case GF_OMP_TARGET_KIND_OACC_HOST_DATA: stmt_name = "host_data";
 	      break;
+	    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+	    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+	    case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
+	      /* OpenACC 'kernels' decomposed parts.  */
+	      stmt_name = "kernels"; break;
 	    default: gcc_unreachable ();
 	    }
 	  switch (gimple_omp_target_kind (ctx->stmt))
@@ -3541,6 +3573,11 @@ check_omp_nesting_restrictions (gimple *stmt, omp_context *ctx)
 	    case GF_OMP_TARGET_KIND_OACC_DATA: ctx_stmt_name = "data"; break;
 	    case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
 	      ctx_stmt_name = "host_data"; break;
+	    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+	    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
+	    case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
+	      /* OpenACC 'kernels' decomposed parts.  */
+	      ctx_stmt_name = "kernels"; break;
 	    default: gcc_unreachable ();
 	    }
 
@@ -6930,6 +6967,8 @@ lower_oacc_reductions (location_t loc, tree clauses, tree level, bool inner,
       {
 	/* No 'reduction' clauses on OpenACC 'kernels'.  */
 	gcc_checking_assert (!is_oacc_kernels (ctx));
+	/* Likewise, on OpenACC 'kernels' decomposed parts.  */
+	gcc_checking_assert (!is_oacc_kernels_decomposed_part (ctx));
 
 	tree orig = OMP_CLAUSE_DECL (c);
 	tree var = maybe_lookup_decl (orig, ctx);
@@ -7785,6 +7824,8 @@ lower_oacc_head_mark (location_t loc, tree ddvar, tree clauses,
   else if (is_oacc_kernels (tgt))
     /* Not using this loops handling inside OpenACC 'kernels' regions.  */
     gcc_unreachable ();
+  else if (is_oacc_kernels_decomposed_part (tgt))
+    ;
   else
     gcc_unreachable ();
 
@@ -7792,6 +7833,14 @@ lower_oacc_head_mark (location_t loc, tree ddvar, tree clauses,
   if (!tgt || is_oacc_parallel_or_serial (tgt))
     tag |= OLF_INDEPENDENT;
 
+  /* Loops inside OpenACC 'kernels' decomposed parts' regions are expected to
+     have an explicit 'seq' or 'independent' clause, and no 'auto' clause.  */
+  if (tgt && is_oacc_kernels_decomposed_part (tgt))
+    {
+      gcc_assert (tag & (OLF_SEQ | OLF_INDEPENDENT));
+      gcc_assert (!(tag & OLF_AUTO));
+    }
+
   if (tag & OLF_TILE)
     /* Tiling could use all 3 levels.  */ 
     levels = 3;
@@ -11639,11 +11688,14 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
     case GF_OMP_TARGET_KIND_OACC_UPDATE:
     case GF_OMP_TARGET_KIND_OACC_ENTER_EXIT_DATA:
     case GF_OMP_TARGET_KIND_OACC_DECLARE:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED:
+    case GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE:
       data_region = false;
       break;
     case GF_OMP_TARGET_KIND_DATA:
     case GF_OMP_TARGET_KIND_OACC_DATA:
     case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+    case GF_OMP_TARGET_KIND_OACC_DATA_KERNELS:
       data_region = true;
       break;
     default:
@@ -11829,6 +11881,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	  {
 	    /* No 'firstprivate' clauses on OpenACC 'kernels'.  */
 	    gcc_checking_assert (!is_oacc_kernels (ctx));
+	    /* Likewise, on OpenACC 'kernels' decomposed parts.  */
+	    gcc_checking_assert (!is_oacc_kernels_decomposed_part (ctx));
 
 	    goto oacc_firstprivate;
 	  }
@@ -11861,6 +11915,8 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	  {
 	    /* No 'private' clauses on OpenACC 'kernels'.  */
 	    gcc_checking_assert (!is_oacc_kernels (ctx));
+	    /* Likewise, on OpenACC 'kernels' decomposed parts.  */
+	    gcc_checking_assert (!is_oacc_kernels_decomposed_part (ctx));
 
 	    break;
 	  }
diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc
new file mode 100644
index 00000000000..c585e5d092b
--- /dev/null
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -0,0 +1,1531 @@
+/* Decompose OpenACC 'kernels' constructs into parts, a sequence of compute
+   constructs
+
+   Copyright (C) 2020 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "target.h"
+#include "tree.h"
+#include "cp/cp-tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "cgraph.h"
+#include "fold-const.h"
+#include "gimplify.h"
+#include "gimple-iterator.h"
+#include "gimple-walk.h"
+#include "gomp-constants.h"
+#include "omp-general.h"
+#include "diagnostic-core.h"
+
+
+/* This preprocessing pass is run immediately before lower_omp.  It decomposes
+   OpenACC 'kernels' constructs into parts, a sequence of compute constructs.
+
+   The translation is as follows:
+     - The entire 'kernels' region is turned into a 'data' region with clauses
+       taken from the 'kernels' region.  New 'create' clauses are added for all
+       variables declared at the top level in the kernels region.
+     - Any loop nests annotated with an OpenACC 'loop' directive are wrapped in
+       a new compute construct.
+	 - 'loop' directives without an explicit 'independent' or 'seq' clause
+	   get an 'auto' clause added; other clauses are preserved on the loop
+	   or moved to the new surrounding compute construct, as applicable.
+     - Any sequences of other code (non-loops, non-OpenACC 'loop's) are wrapped
+       in new "gang-single" compute construct: 'worker'/'vector' parallelism is
+       preserved, but 'num_gangs (1)' is enforced.
+     - Both points above only apply at the topmost level in the region, that
+       is, the transformation does not introduce new compute constructs inside
+       nested statement bodies.  In particular, this means that a
+       gang-parallelizable loop inside an 'if' statement is made "gang-single".
+     - In order to make the host wait only once for the whole region instead
+       of once per device kernel launch, the new compute constructs are
+       annotated 'async'.  Unless the original 'kernels' construct already was
+       marked 'async', the entire region ends with a 'wait' directive.  If the
+       original 'kernels' construct was marked 'async', the synthesized 'async'
+       clauses use the original 'kernels' construct's 'async' argument
+       (possibly implicit).
+*/
+
+
+/*TODO Things are conceptually wrong here: 'loop' clauses may be hidden behind
+  'device_type', so we have to defer a lot of processing until we're in the
+  offloading compilation.  "Fortunately", GCC doesn't support the OpenACC
+  'device_type' clause yet, so we get away that.  */
+
+
+/* Helper function for decompose_kernels_region_body.  If STMT contains a
+   "top-level" OMP_FOR statement, returns a pointer to that statement;
+   returns NULL otherwise.
+
+   A "top-level" OMP_FOR statement is one that is possibly accompanied by
+   small snippets of setup code.  Specifically, this function accepts an
+   OMP_FOR possibly wrapped in a singleton bind and a singleton try
+   statement to allow for a local loop variable, but not an OMP_FOR
+   statement nested in any other constructs.  Alternatively, it accepts a
+   non-singleton bind containing only assignments and then an OMP_FOR
+   statement at the very end.  The former style can be generated by the C
+   frontend, the latter by the Fortran frontend.  */
+
+static gimple *
+top_level_omp_for_in_stmt (gimple *stmt)
+{
+  if (gimple_code (stmt) == GIMPLE_OMP_FOR)
+    return stmt;
+
+  if (gimple_code (stmt) == GIMPLE_BIND)
+    {
+      gimple_seq body = gimple_bind_body (as_a <gbind *> (stmt));
+      if (gimple_seq_singleton_p (body))
+	{
+	  /* Accept an OMP_FOR statement, or a try statement containing only
+	     a single OMP_FOR.  */
+	  gimple *maybe_for_or_try = gimple_seq_first_stmt (body);
+	  if (gimple_code (maybe_for_or_try) == GIMPLE_OMP_FOR)
+	    return maybe_for_or_try;
+	  else if (gimple_code (maybe_for_or_try) == GIMPLE_TRY)
+	    {
+	      gimple_seq try_body = gimple_try_eval (maybe_for_or_try);
+	      if (!gimple_seq_singleton_p (try_body))
+		return NULL;
+	      gimple *maybe_omp_for_stmt = gimple_seq_first_stmt (try_body);
+	      if (gimple_code (maybe_omp_for_stmt) == GIMPLE_OMP_FOR)
+		return maybe_omp_for_stmt;
+	    }
+	}
+      else
+	{
+	  gimple_stmt_iterator gsi;
+	  /* Accept only a block of optional assignments followed by an
+	     OMP_FOR at the end.  No other kinds of statements allowed.  */
+	  for (gsi = gsi_start (body); !gsi_end_p (gsi); gsi_next (&gsi))
+	    {
+	      gimple *body_stmt = gsi_stmt (gsi);
+	      if (gimple_code (body_stmt) == GIMPLE_ASSIGN)
+		continue;
+	      else if (gimple_code (body_stmt) == GIMPLE_OMP_FOR
+		       && gsi_one_before_end_p (gsi))
+		return body_stmt;
+	      else
+		return NULL;
+	    }
+	}
+    }
+
+  return NULL;
+}
+
+/* Helper for adjust_region_code: evaluate the statement at GSI_P.  */
+
+static tree
+adjust_region_code_walk_stmt_fn (gimple_stmt_iterator *gsi_p,
+				 bool *handled_ops_p,
+				 struct walk_stmt_info *wi)
+{
+  int *region_code = (int *) wi->info;
+
+  gimple *stmt = gsi_stmt (*gsi_p);
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_OMP_FOR:
+      {
+	tree clauses = gimple_omp_for_clauses (stmt);
+	if (omp_find_clause (clauses, OMP_CLAUSE_INDEPENDENT))
+	  {
+	    /* Explicit 'independent' clause.  */
+	    /* Keep going; recurse into loop body.  */
+	    break;
+	  }
+	else if (omp_find_clause (clauses, OMP_CLAUSE_SEQ))
+	  {
+	    /* Explicit 'seq' clause.  */
+	    /* We'll "parallelize" if at some level a loop construct has been
+	       marked up by the user as unparallelizable ('seq' clause; we'll
+	       respect that in the later processing).  Given that the user has
+	       explicitly marked it up, this loop construct cannot be
+	       performance-critical, and in this case it's also fine to
+	       "parallelize" instead of "gang-single", because any outer or
+	       inner loops may still exploit the available parallelism.  */
+	    /* Keep going; recurse into loop body.  */
+	    break;
+	  }
+	else
+	  {
+	    /* Explicit or implicit 'auto' clause.  */
+	    /* The user would like this loop analyzed ('auto' clause) and
+	       typically parallelized, but we don't have available yet the
+	       compiler logic to analyze this, so can't parallelize it here, so
+	       we'd very likely be running into a performance problem if we
+	       were to execute this unparallelized, thus forward the whole loop
+	       nest to 'parloops'.  */
+	    *region_code = GF_OMP_TARGET_KIND_OACC_KERNELS;
+	    /* Terminate: final decision for this region.  */
+	    *handled_ops_p = true;
+	    return integer_zero_node;
+	  }
+	gcc_unreachable ();
+      }
+
+    case GIMPLE_COND:
+    case GIMPLE_GOTO:
+    case GIMPLE_SWITCH:
+    case GIMPLE_ASM:
+    case GIMPLE_TRANSACTION:
+    case GIMPLE_RETURN:
+      /* Statement that might constitute some looping/control flow pattern.  */
+      /* The user would like this code analyzed (implicit inside a 'kernels'
+	 region) and typically parallelized, but we don't have available yet
+	 the compiler logic to analyze this, so can't parallelize it here, so
+	 we'd very likely be running into a performance problem if we were to
+	 execute this unparallelized, thus forward the whole thing to
+	 'parloops'.  */
+      *region_code = GF_OMP_TARGET_KIND_OACC_KERNELS;
+      /* Terminate: final decision for this region.  */
+      *handled_ops_p = true;
+      return integer_zero_node;
+
+    default:
+      /* Keep going.  */
+      break;
+    }
+
+  return NULL;
+}
+
+/* Adjust the REGION_CODE for the region in GS.  */
+
+static void
+adjust_region_code (gimple_seq gs, int *region_code)
+{
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  wi.info = region_code;
+  walk_gimple_seq (gs, adjust_region_code_walk_stmt_fn, NULL, &wi);
+}
+
+/* Helper function for make_loops_gang_single for walking the tree.  If the
+   statement indicated by GSI_P is an OpenACC for loop with a gang clause,
+   issue a warning and remove the clause.  */
+
+static tree
+visit_loops_in_gang_single_region (gimple_stmt_iterator *gsi_p,
+				   bool *handled_ops_p,
+				   struct walk_stmt_info *)
+{
+  *handled_ops_p = false;
+
+  gimple *stmt = gsi_stmt (*gsi_p);
+  switch (gimple_code (stmt))
+    {
+    case GIMPLE_OMP_FOR:
+      /*TODO Given the current 'adjust_region_code' algorithm, this is
+	actually...  */
+      gcc_unreachable ();
+
+      {
+	tree clauses = gimple_omp_for_clauses (stmt);
+	tree prev_clause = NULL;
+	for (tree clause = clauses; clause; clause = OMP_CLAUSE_CHAIN (clause))
+	  {
+	    if (OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_GANG)
+	      {
+		/* It makes no sense to have a 'gang' clause in a "gang-single"
+		   region, so warn and remove it.  */
+		warning_at (gimple_location (stmt), 0,
+			    "conditionally executed loop in %<kernels%> region"
+			    " will be executed by a single gang;"
+			    " ignoring %<gang%> clause");
+		if (prev_clause != NULL)
+		  OMP_CLAUSE_CHAIN (prev_clause) = OMP_CLAUSE_CHAIN (clause);
+		else
+		  clauses = OMP_CLAUSE_CHAIN (clause);
+
+		break;
+	      }
+	    prev_clause = clause;
+	  }
+	gimple_omp_for_set_clauses (stmt, clauses);
+      }
+      /* No need to recurse into nested statements; no loop nested inside
+	 this loop can be gang-partitioned.  */
+      sorry ("%<gang%> loop in %<gang-single%> region");
+      *handled_ops_p = true;
+      break;
+
+    default:
+      break;
+    }
+
+  return NULL;
+}
+
+/* Visit all nested OpenACC loops in the sequence indicated by GS.  This
+   statement is expected to be inside a gang-single region.  Issue a warning
+   for any loops inside it that have gang clauses and remove the clauses.  */
+
+static void
+make_loops_gang_single (gimple_seq gs)
+{
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  walk_gimple_seq (gs, visit_loops_in_gang_single_region, NULL, &wi);
+}
+
+/* Construct a "gang-single" compute construct at LOC containing the STMTS.
+   Annotate with CLAUSES, which must not contain a 'num_gangs' clause, and an
+   additional 'num_gangs (1)' clause to force "gang-single" execution.  */
+
+static gimple *
+make_region_seq (location_t loc, gimple_seq stmts,
+		 tree num_gangs_clause,
+		 tree num_workers_clause,
+		 tree vector_length_clause,
+		 tree clauses)
+{
+  /* This correctly unshares the entire clause chain rooted here.  */
+  clauses = unshare_expr (clauses);
+
+  dump_user_location_t loc_stmts_first = gimple_seq_first (stmts);
+
+  /* Figure out the region code for this region.  */
+  /* Optimistic default: assume "setup code", no looping; thus not
+     performance-critical.  */
+  int region_code = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE;
+  adjust_region_code (stmts, &region_code);
+
+  if (region_code == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE)
+    {
+      if (dump_enabled_p ())
+	/*TODO MSG_MISSED_OPTIMIZATION? */
+	dump_printf_loc (MSG_NOTE, loc_stmts_first,
+			 "beginning %<gang-single%> part"
+			 " in OpenACC %<kernels%> region\n");
+
+      /* Synthesize a 'num_gangs (1)' clause.  */
+      tree gang_single_clause = build_omp_clause (loc, OMP_CLAUSE_NUM_GANGS);
+      OMP_CLAUSE_OPERAND (gang_single_clause, 0) = integer_one_node;
+      OMP_CLAUSE_CHAIN (gang_single_clause) = clauses;
+      clauses = gang_single_clause;
+
+      /* Remove and issue warnings about gang clauses on any OpenACC
+	 loops nested inside this sequentially executed statement.  */
+      make_loops_gang_single (stmts);
+    }
+  else if (region_code == GF_OMP_TARGET_KIND_OACC_KERNELS)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, loc_stmts_first,
+			 "beginning %<parloops%> part"
+			 " in OpenACC %<kernels%> region\n");
+
+      /* As we're transforming a 'GF_OMP_TARGET_KIND_OACC_KERNELS' into another
+	 'GF_OMP_TARGET_KIND_OACC_KERNELS', this isn't doing any of the clauses
+	 mangling that 'make_region_loop_nest' is doing.  */
+      /* Re-assemble the clauses stripped off earlier.  */
+      if (num_gangs_clause != NULL)
+	{
+	  tree c = unshare_expr (num_gangs_clause);
+	  OMP_CLAUSE_CHAIN (c) = clauses;
+	  clauses = c;
+	}
+      if (num_workers_clause != NULL)
+	{
+	  tree c = unshare_expr (num_workers_clause);
+	  OMP_CLAUSE_CHAIN (c) = clauses;
+	  clauses = c;
+	}
+      if (vector_length_clause != NULL)
+	{
+	  tree c = unshare_expr (vector_length_clause);
+	  OMP_CLAUSE_CHAIN (c) = clauses;
+	  clauses = c;
+	}
+    }
+  else
+    gcc_unreachable ();
+
+  /* Build the gang-single region.  */
+  gimple *single_region = gimple_build_omp_target (NULL, region_code, clauses);
+  gimple_set_location (single_region, loc);
+  gbind *single_body = gimple_build_bind (NULL, stmts, make_node (BLOCK));
+  gimple_omp_set_body (single_region, single_body);
+
+  return single_region;
+}
+
+/* Helper function for make_region_loop_nest.  Adds a 'num_gangs'
+   ('num_workers', 'vector_length') clause to the given CLAUSES, either the one
+   from the parent compute construct (PARENT_CLAUSE) or a new one based on the
+   loop's own LOOP_CLAUSE ('gang (num: N)' or similar for 'worker' or 'vector'
+   clauses) with the given CLAUSE_CODE.  Does nothing if neither PARENT_CLAUSE
+   nor LOOP_CLAUSE exist.  Returns the new clauses.  */
+
+static tree
+add_parent_or_loop_num_clause (tree parent_clause, tree loop_clause,
+			       omp_clause_code clause_code, tree clauses)
+{
+  if (parent_clause != NULL)
+    {
+      tree num_clause = unshare_expr (parent_clause);
+      OMP_CLAUSE_CHAIN (num_clause) = clauses;
+      clauses = num_clause;
+    }
+  else if (loop_clause != NULL)
+    {
+      /* The kernels region does not have a 'num_gangs' clause, but the loop
+	 itself had a 'gang (num: N)' clause.  Honor it by adding a
+	 'num_gangs (N)' clause on the compute construct.  */
+      tree num = OMP_CLAUSE_OPERAND (loop_clause, 0);
+      tree new_num_clause
+	= build_omp_clause (OMP_CLAUSE_LOCATION (loop_clause), clause_code);
+      OMP_CLAUSE_OPERAND (new_num_clause, 0) = num;
+      OMP_CLAUSE_CHAIN (new_num_clause) = clauses;
+      clauses = new_num_clause;
+    }
+  return clauses;
+}
+
+/* Helper for make_region_loop_nest, looking for 'worker (num: N)' or 'vector
+   (length: N)' clauses in nested loops.  Removes the argument, transferring it
+   to the enclosing compute construct (via WI->INFO).  If arguments within the
+   same loop nest conflict, emits a warning.
+
+   This function also decides whether to add an 'auto' clause on each of these
+   nested loops.  */
+
+struct adjust_nested_loop_clauses_wi_info
+{
+  tree *loop_gang_clause_ptr;
+  tree *loop_worker_clause_ptr;
+  tree *loop_vector_clause_ptr;
+};
+
+static tree
+adjust_nested_loop_clauses (gimple_stmt_iterator *gsi_p, bool *,
+			    struct walk_stmt_info *wi)
+{
+  struct adjust_nested_loop_clauses_wi_info *wi_info
+    = (struct adjust_nested_loop_clauses_wi_info *) wi->info;
+  gimple *stmt = gsi_stmt (*gsi_p);
+
+  if (gimple_code (stmt) == GIMPLE_OMP_FOR)
+    {
+      bool add_auto_clause = true;
+      tree loop_clauses = gimple_omp_for_clauses (stmt);
+      tree loop_clause = loop_clauses;
+      for (; loop_clause; loop_clause = OMP_CLAUSE_CHAIN (loop_clause))
+	{
+	  tree *outer_clause_ptr = NULL;
+	  switch (OMP_CLAUSE_CODE (loop_clause))
+	    {
+	    case OMP_CLAUSE_GANG:
+	      outer_clause_ptr = wi_info->loop_gang_clause_ptr;
+	      break;
+	    case OMP_CLAUSE_WORKER:
+	      outer_clause_ptr = wi_info->loop_worker_clause_ptr;
+	      break;
+	    case OMP_CLAUSE_VECTOR:
+	      outer_clause_ptr = wi_info->loop_vector_clause_ptr;
+	      break;
+	    case OMP_CLAUSE_SEQ:
+	    case OMP_CLAUSE_INDEPENDENT:
+	    case OMP_CLAUSE_AUTO:
+	      add_auto_clause = false;
+	    default:
+	      break;
+	    }
+	  if (outer_clause_ptr != NULL)
+	    {
+	      if (OMP_CLAUSE_OPERAND (loop_clause, 0) != NULL
+		  && *outer_clause_ptr == NULL)
+		{
+		  /* Transfer the clause to the enclosing compute construct and
+		     remove the numerical argument from the 'loop'.  */
+		  *outer_clause_ptr = unshare_expr (loop_clause);
+		  OMP_CLAUSE_OPERAND (loop_clause, 0) = NULL;
+		}
+	      else if (OMP_CLAUSE_OPERAND (loop_clause, 0) != NULL &&
+		       OMP_CLAUSE_OPERAND (*outer_clause_ptr, 0) != NULL)
+		{
+		  /* See if both of these are the same constant.  If they
+		     aren't, emit a warning.  */
+		  tree old_op = OMP_CLAUSE_OPERAND (*outer_clause_ptr, 0);
+		  tree new_op = OMP_CLAUSE_OPERAND (loop_clause, 0);
+		  if (!(cst_and_fits_in_hwi (old_op) &&
+			cst_and_fits_in_hwi (new_op) &&
+			int_cst_value (old_op) == int_cst_value (new_op)))
+		    {
+		      const char *clause_name
+			= omp_clause_code_name[OMP_CLAUSE_CODE (loop_clause)];
+		      error_at (gimple_location (stmt),
+				"cannot honor conflicting %qs clause",
+				clause_name);
+		      inform (OMP_CLAUSE_LOCATION (*outer_clause_ptr),
+			      "location of the previous clause"
+			      " in the same loop nest");
+		    }
+		  OMP_CLAUSE_OPERAND (loop_clause, 0) = NULL;
+		}
+	    }
+	}
+      if (add_auto_clause)
+	{
+	  tree auto_clause
+	    = build_omp_clause (gimple_location (stmt), OMP_CLAUSE_AUTO);
+	  OMP_CLAUSE_CHAIN (auto_clause) = loop_clauses;
+	  gimple_omp_for_set_clauses (stmt, auto_clause);
+	}
+    }
+
+  return NULL;
+}
+
+/* Helper for make_region_loop_nest.  Transform OpenACC 'kernels'/'loop'
+   construct clauses into OpenACC 'parallel'/'loop' construct ones.  */
+
+static tree
+transform_kernels_loop_clauses (gimple *omp_for,
+				tree num_gangs_clause,
+				tree num_workers_clause,
+				tree vector_length_clause,
+				tree clauses)
+{
+  /* If this loop in a kernels region does not have an explicit 'seq',
+     'independent', or 'auto' clause, we must give it an explicit 'auto'
+     clause.
+     We also check for 'gang (num: N)' clauses.  These must not appear in
+     kernels regions that have their own 'num_gangs' clause.  Otherwise, they
+     must be converted and put on the region; similarly for 'worker' and
+     'vector' clauses.  */
+  bool add_auto_clause = true;
+  tree loop_gang_clause = NULL, loop_worker_clause = NULL,
+       loop_vector_clause = NULL;
+  tree loop_clauses = gimple_omp_for_clauses (omp_for);
+  for (tree loop_clause = loop_clauses;
+       loop_clause;
+       loop_clause = OMP_CLAUSE_CHAIN (loop_clause))
+    {
+      bool found_num_clause = false;
+      tree *clause_ptr, clause_to_check;
+      switch (OMP_CLAUSE_CODE (loop_clause))
+	{
+	case OMP_CLAUSE_GANG:
+	  found_num_clause = true;
+	  clause_ptr = &loop_gang_clause;
+	  clause_to_check = num_gangs_clause;
+	  break;
+	case OMP_CLAUSE_WORKER:
+	  found_num_clause = true;
+	  clause_ptr = &loop_worker_clause;
+	  clause_to_check = num_workers_clause;
+	  break;
+	case OMP_CLAUSE_VECTOR:
+	  found_num_clause = true;
+	  clause_ptr = &loop_vector_clause;
+	  clause_to_check = vector_length_clause;
+	  break;
+	case OMP_CLAUSE_INDEPENDENT:
+	case OMP_CLAUSE_SEQ:
+	case OMP_CLAUSE_AUTO:
+	  add_auto_clause = false;
+	default:
+	  break;
+	}
+      if (found_num_clause && OMP_CLAUSE_OPERAND (loop_clause, 0) != NULL)
+	{
+	  if (clause_to_check)
+	    {
+	      const char *clause_name
+		= omp_clause_code_name[OMP_CLAUSE_CODE (loop_clause)];
+	      const char *parent_clause_name
+		= omp_clause_code_name[OMP_CLAUSE_CODE (clause_to_check)];
+	      error_at (OMP_CLAUSE_LOCATION (loop_clause),
+			"argument not permitted on %qs clause"
+			" in OpenACC %<kernels%> region with a %qs clause",
+			clause_name, parent_clause_name);
+	      inform (OMP_CLAUSE_LOCATION (clause_to_check),
+		      "location of OpenACC %<kernels%>");
+	    }
+	  /* Copy the 'gang (N)'/'worker (N)'/'vector (N)' clause to the
+	     enclosing compute construct.  */
+	  *clause_ptr = unshare_expr (loop_clause);
+	  OMP_CLAUSE_CHAIN (*clause_ptr) = NULL;
+	  /* Leave a 'gang'/'worker'/'vector' clause on the 'loop', but without
+	     argument.  */
+	  OMP_CLAUSE_OPERAND (loop_clause, 0) = NULL;
+	}
+    }
+  if (add_auto_clause)
+    {
+      tree auto_clause = build_omp_clause (gimple_location (omp_for),
+					   OMP_CLAUSE_AUTO);
+      OMP_CLAUSE_CHAIN (auto_clause) = loop_clauses;
+      loop_clauses = auto_clause;
+    }
+  gimple_omp_for_set_clauses (omp_for, loop_clauses);
+  /* We must also recurse into the loop; it might contain nested loops having
+     their own 'worker (num: W)' or 'vector (length: V)' clauses.  Turn these
+     into 'worker'/'vector' clauses on the compute construct.  */
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  struct adjust_nested_loop_clauses_wi_info wi_info;
+  wi_info.loop_gang_clause_ptr = &loop_gang_clause;
+  wi_info.loop_worker_clause_ptr = &loop_worker_clause;
+  wi_info.loop_vector_clause_ptr = &loop_vector_clause;
+  wi.info = &wi_info;
+  gimple *body = gimple_omp_body (omp_for);
+  walk_gimple_seq (body, adjust_nested_loop_clauses, NULL, &wi);
+  /* Check if there were conflicting numbers of workers or vector length.  */
+  if (loop_gang_clause != NULL &&
+      OMP_CLAUSE_OPERAND (loop_gang_clause, 0) == NULL)
+    loop_gang_clause = NULL;
+  if (loop_worker_clause != NULL &&
+      OMP_CLAUSE_OPERAND (loop_worker_clause, 0) == NULL)
+    loop_worker_clause = NULL;
+  if (loop_vector_clause != NULL &&
+      OMP_CLAUSE_OPERAND (loop_vector_clause, 0) == NULL)
+    vector_length_clause = NULL;
+
+  /* If the kernels region had 'num_gangs', 'num_worker', 'vector_length'
+     clauses, add these to this new compute construct.  */
+  clauses
+    = add_parent_or_loop_num_clause (num_gangs_clause, loop_gang_clause,
+				     OMP_CLAUSE_NUM_GANGS, clauses);
+  clauses
+    = add_parent_or_loop_num_clause (num_workers_clause, loop_worker_clause,
+				     OMP_CLAUSE_NUM_WORKERS, clauses);
+  clauses
+    = add_parent_or_loop_num_clause (vector_length_clause, loop_vector_clause,
+				     OMP_CLAUSE_VECTOR_LENGTH, clauses);
+
+  return clauses;
+}
+
+/* Construct a possibly gang-parallel compute construct containing the STMT,
+   which must be identical to, or a bind containing, the loop OMP_FOR.
+
+   The NUM_GANGS_CLAUSE, NUM_WORKERS_CLAUSE, and VECTOR_LENGTH_CLAUSE are
+   optional clauses from the original kernels region and must not be contained
+   in the other CLAUSES. The newly created compute construct is annotated with
+   the optional NUM_GANGS_CLAUSE as well as the other CLAUSES.  If there is no
+   NUM_GANGS_CLAUSE but the loop has a 'gang (num: N)' clause, that is
+   converted to a 'num_gangs (N)' clause on the new compute construct, and
+   similarly for 'worker' and 'vector' clauses.
+
+   The outermost loop gets an 'auto' clause unless there already is an
+   'seq'/'independent'/'auto' clause.  Nested loops inside OMP_FOR are treated
+   similarly by the adjust_nested_loop_clauses function.  */
+
+static gimple *
+make_region_loop_nest (gimple *omp_for, gimple_seq stmts,
+		       tree num_gangs_clause,
+		       tree num_workers_clause,
+		       tree vector_length_clause,
+		       tree clauses)
+{
+  /* This correctly unshares the entire clause chain rooted here.  */
+  clauses = unshare_expr (clauses);
+
+  /* Figure out the region code for this region.  */
+  /* Optimistic default: assume that the loop nest is parallelizable
+     (essentially, no GIMPLE_OMP_FOR with (explicit or implicit) 'auto' clause,
+     and no un-annotated loops).  */
+  int region_code = GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED;
+  adjust_region_code (stmts, &region_code);
+
+  if (region_code == GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED)
+    {
+      if (dump_enabled_p ())
+	/* This is not MSG_OPTIMIZED_LOCATIONS, as we're just doing what the
+	   user asked us to.  */
+	dump_printf_loc (MSG_NOTE, omp_for,
+			 "parallelized loop nest"
+			 " in OpenACC %<kernels%> region\n");
+
+      clauses = transform_kernels_loop_clauses (omp_for,
+						num_gangs_clause,
+						num_workers_clause,
+						vector_length_clause,
+						clauses);
+    }
+  else if (region_code == GF_OMP_TARGET_KIND_OACC_KERNELS)
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, omp_for,
+			 "forwarded loop nest"
+			 " in OpenACC %<kernels%> region"
+			 " to %<parloops%> for analysis\n");
+
+      /* We're transforming one 'GF_OMP_TARGET_KIND_OACC_KERNELS' into another
+	 'GF_OMP_TARGET_KIND_OACC_KERNELS', so don't have to
+	 'transform_kernels_loop_clauses'.  */
+      /* Re-assemble the clauses stripped off earlier.  */
+      clauses
+	= add_parent_or_loop_num_clause (num_gangs_clause, NULL,
+					 OMP_CLAUSE_NUM_GANGS, clauses);
+      clauses
+	= add_parent_or_loop_num_clause (num_workers_clause, NULL,
+					 OMP_CLAUSE_NUM_WORKERS, clauses);
+      clauses
+	= add_parent_or_loop_num_clause (vector_length_clause, NULL,
+					 OMP_CLAUSE_VECTOR_LENGTH, clauses);
+    }
+  else
+    gcc_unreachable ();
+
+  gimple *parallel_body_bind
+    = gimple_build_bind (NULL, stmts, make_node (BLOCK));
+  gimple *parallel_region
+    = gimple_build_omp_target (parallel_body_bind, region_code, clauses);
+  gimple_set_location (parallel_region, gimple_location (omp_for));
+
+  return parallel_region;
+}
+
+/* Eliminate any binds directly inside BIND by adding their statements to
+   BIND (i.e., modifying it in place), excluding binds that hold only an
+   OMP_FOR loop and associated setup/cleanup code.  Recurse into binds but
+   not other statements.  Return a chain of the local variables of eliminated
+   binds, i.e., the local variables found in nested binds.  If
+   INCLUDE_TOPLEVEL_VARS is true, this also includes the variables belonging
+   to BIND itself. */
+
+static tree
+flatten_binds (gbind *bind, bool include_toplevel_vars = false)
+{
+  tree vars = NULL, last_var = NULL;
+
+  if (include_toplevel_vars)
+    {
+      vars = gimple_bind_vars (bind);
+      last_var = vars;
+    }
+
+  gimple_seq new_body = NULL;
+  gimple_seq body_sequence = gimple_bind_body (bind);
+  gimple_stmt_iterator gsi, gsi_n;
+  for (gsi = gsi_start (body_sequence); !gsi_end_p (gsi); gsi = gsi_n)
+    {
+      /* Advance the iterator here because otherwise it would be invalidated
+	 by moving statements below.  */
+      gsi_n = gsi;
+      gsi_next (&gsi_n);
+
+      gimple *stmt = gsi_stmt (gsi);
+      /* Flatten bind statements, except the ones that contain only an
+	 OpenACC for loop.  */
+      if (gimple_code (stmt) == GIMPLE_BIND
+	  && !top_level_omp_for_in_stmt (stmt))
+	{
+	  gbind *inner_bind = as_a <gbind *> (stmt);
+	  /* Flatten recursively, and collect all variables.  */
+	  tree inner_vars = flatten_binds (inner_bind, true);
+	  gimple_seq inner_sequence = gimple_bind_body (inner_bind);
+	  gcc_assert (gimple_code (inner_sequence) != GIMPLE_BIND
+		      || top_level_omp_for_in_stmt (inner_sequence));
+	  gimple_seq_add_seq (&new_body, inner_sequence);
+	  /* Find the last variable; we will append others to it.  */
+	  while (last_var != NULL && TREE_CHAIN (last_var) != NULL)
+	    last_var = TREE_CHAIN (last_var);
+	  if (last_var != NULL)
+	    {
+	      TREE_CHAIN (last_var) = inner_vars;
+	      last_var = inner_vars;
+	    }
+	  else
+	    {
+	      vars = inner_vars;
+	      last_var = vars;
+	    }
+	}
+      else
+	gimple_seq_add_stmt (&new_body, stmt);
+    }
+
+  /* Put the possibly transformed body back into the bind.  */
+  gimple_bind_set_body (bind, new_body);
+  return vars;
+}
+
+/* Helper function for places where we construct data regions.  Wraps the BODY
+   inside a try-finally construct at LOC that calls __builtin_GOACC_data_end
+   in its cleanup block.  Returns this try statement.  */
+
+static gimple *
+make_data_region_try_statement (location_t loc, gimple *body)
+{
+  tree data_end_fn = builtin_decl_explicit (BUILT_IN_GOACC_DATA_END);
+  gimple *call = gimple_build_call (data_end_fn, 0);
+  gimple_seq cleanup = NULL;
+  gimple_seq_add_stmt (&cleanup, call);
+  gimple *try_stmt = gimple_build_try (body, cleanup, GIMPLE_TRY_FINALLY);
+  gimple_set_location (body, loc);
+  return try_stmt;
+}
+
+/* If INNER_BIND_VARS holds variables, build an OpenACC data region with
+   location LOC containing BODY and having 'create (var)' clauses for each
+   variable.  If INNER_CLEANUP is present, add a try-finally statement with
+   this cleanup code in the finally block.  Return the new data region, or
+   the original BODY if no data region was needed.  */
+
+static gimple *
+maybe_build_inner_data_region (location_t loc, gimple *body,
+			       tree inner_bind_vars, gimple *inner_cleanup)
+{
+  /* Build data 'create (var)' clauses for these local variables.
+     Below we will add these to a data region enclosing the entire body
+     of the decomposed kernels region.  */
+  tree prev_mapped_var = NULL, next = NULL, artificial_vars = NULL,
+       inner_data_clauses = NULL;
+  for (tree v = inner_bind_vars; v; v = next)
+    {
+      next = TREE_CHAIN (v);
+      if (DECL_ARTIFICIAL (v)
+	  || TREE_CODE (v) == CONST_DECL
+	  || (DECL_LANG_SPECIFIC (current_function_decl)
+	      && DECL_TEMPLATE_INSTANTIATION (current_function_decl)))
+	{
+	  /* If this is an artificial temporary, it need not be mapped.  We
+	     move its declaration into the bind inside the data region.
+	     Also avoid mapping variables if we are inside a template
+	     instantiation; the code does not contain all the copies to
+	     temporaries that would make this legal.  */
+	  TREE_CHAIN (v) = artificial_vars;
+	  artificial_vars = v;
+	  if (prev_mapped_var != NULL)
+	    TREE_CHAIN (prev_mapped_var) = next;
+	  else
+	    inner_bind_vars = next;
+	}
+      else
+	{
+	  /* Otherwise, build the map clause.  */
+	  tree new_clause = build_omp_clause (loc, OMP_CLAUSE_MAP);
+	  OMP_CLAUSE_SET_MAP_KIND (new_clause, GOMP_MAP_ALLOC);
+	  OMP_CLAUSE_DECL (new_clause) = v;
+	  OMP_CLAUSE_SIZE (new_clause) = DECL_SIZE_UNIT (v);
+	  OMP_CLAUSE_CHAIN (new_clause) = inner_data_clauses;
+	  inner_data_clauses = new_clause;
+
+	  prev_mapped_var = v;
+	}
+    }
+
+  if (artificial_vars)
+    body = gimple_build_bind (artificial_vars, body, make_node (BLOCK));
+
+  /* If we determined above that there are variables that need to be created
+     on the device, construct a data region for them and wrap the body
+     inside that.  */
+  if (inner_data_clauses != NULL)
+    {
+      gcc_assert (inner_bind_vars != NULL);
+      gimple *inner_data_region
+	= gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_DATA_KERNELS,
+				   inner_data_clauses);
+      gimple_set_location (inner_data_region, loc);
+      /* Make sure __builtin_GOACC_data_end is called at the end.  */
+      gimple *try_stmt = make_data_region_try_statement (loc, body);
+      gimple_omp_set_body (inner_data_region, try_stmt);
+      gimple *bind_body;
+      if (inner_cleanup != NULL)
+	/* Clobber all the inner variables that need to be clobbered.  */
+	bind_body = gimple_build_try (inner_data_region, inner_cleanup,
+				      GIMPLE_TRY_FINALLY);
+      else
+	bind_body = inner_data_region;
+      body = gimple_build_bind (inner_bind_vars, bind_body, make_node (BLOCK));
+    }
+
+  return body;
+}
+
+/* Helper function of decompose_kernels_region_body.  The statements in
+   REGION_BODY are expected to be decomposed parts; add an 'async' clause to
+   each.  Also add a 'wait' directive at the end of the sequence.  */
+
+static void
+add_async_clauses_and_wait (location_t loc, gimple_seq *region_body)
+{
+  tree default_async_queue
+    = build_int_cst (integer_type_node, GOMP_ASYNC_NOVAL);
+  for (gimple_stmt_iterator gsi = gsi_start (*region_body);
+       !gsi_end_p (gsi);
+       gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+      tree target_clauses = gimple_omp_target_clauses (stmt);
+      tree new_async_clause = build_omp_clause (loc, OMP_CLAUSE_ASYNC);
+      OMP_CLAUSE_OPERAND (new_async_clause, 0) = default_async_queue;
+      OMP_CLAUSE_CHAIN (new_async_clause) = target_clauses;
+      target_clauses = new_async_clause;
+      gimple_omp_target_set_clauses (as_a <gomp_target *> (stmt),
+				     target_clauses);
+    }
+  /* A '#pragma acc wait' is just a call 'GOACC_wait (acc_async_sync, 0)'.  */
+  tree wait_fn = builtin_decl_explicit (BUILT_IN_GOACC_WAIT);
+  tree sync_arg = build_int_cst (integer_type_node, GOMP_ASYNC_SYNC);
+  gimple *wait_call = gimple_build_call (wait_fn, 2,
+					 sync_arg, integer_zero_node);
+  gimple_set_location (wait_call, loc);
+  gimple_seq_add_stmt (region_body, wait_call);
+}
+
+/* Auxiliary analysis of the body of a kernels region, to determine for each
+   OpenACC loop whether it is control-dependent (i.e., not necessarily
+   executed every time the kernels region is entered) or not.
+   We say that a loop is control-dependent if there is some cond, switch, or
+   goto statement that jumps over it, forwards or backwards.  For example,
+   if the loop is controlled by an if statement, then a jump to the true
+   block, the false block, or from one of those blocks to the control flow
+   join point will necessarily jump over the loop.
+   This analysis implements an ad-hoc union-find data structure classifying
+   statements into "control-flow regions" as follows: Most statements are in
+   the same region as their predecessor, except that each OpenACC loop is in
+   a region of its own, and each OpenACC loop's successor starts a new
+   region.  We then unite the regions of any statements linked by jumps,
+   placing any cond, switch, or goto statement in the same region as its
+   target label(s).
+   In the end, control dependence of OpenACC loops can be determined by
+   comparing their immediate predecessor and successor statements' regions.
+   A jump crosses the loop if and only if the predecessor and successor are
+   in the same region.  (If there is no predecessor or successor, the loop
+   is executed unconditionally.)
+   The methods in this class identify statements by their index in the
+   kernels region's body.  */
+
+class control_flow_regions
+{
+  public:
+    /* Initialize an instance and pre-compute the control-flow region
+       information for the statement sequence SEQ.  */
+    control_flow_regions (gimple_seq seq);
+
+    /* Return true if the statement with the given index IDX in the analyzed
+       statement sequence is an unconditionally executed OpenACC loop.  */
+    bool is_unconditional_oacc_for_loop (size_t idx);
+
+  private:
+    /* Find the region representative for the statement identified by index
+       STMT_IDX.  */
+    size_t find_rep (size_t stmt_idx);
+
+    /* Union the regions containing the statements represented by
+       representatives A and B.  */
+    void union_reps (size_t a, size_t b);
+
+    /* Helper for the constructor.  Performs the actual computation of the
+       control-flow regions in the statement sequence SEQ.  */
+    void compute_regions (gimple_seq seq);
+
+    /* The mapping from statement indices to region representatives.  */
+    vec <size_t> representatives;
+
+    /* A cache mapping statement indices to a flag indicating whether the
+       statement is a top level OpenACC for loop.  */
+    vec <bool> omp_for_loops;
+};
+
+control_flow_regions::control_flow_regions (gimple_seq seq)
+{
+  representatives.create (1);
+  omp_for_loops.create (1);
+  compute_regions (seq);
+}
+
+bool
+control_flow_regions::is_unconditional_oacc_for_loop (size_t idx)
+{
+  if (idx == 0 || idx == representatives.length () - 1)
+    /* The first or last statement in the kernels region.  This means that
+       there is no room before or after it for a jump or a label.  Thus
+       there cannot be a jump across it, so it is unconditional.  */
+    return true;
+  /* Otherwise, the loop is unconditional if the statements before and after
+     it are in different control flow regions.  Scan forward and backward,
+     skipping over neighboring OpenACC for loops, to find these preceding
+     statements.  */
+  size_t prev_index = idx - 1;
+  while (prev_index > 0 && omp_for_loops [prev_index] == true)
+    prev_index--;
+  /* If all preceding statements are also OpenACC loops, all of these are
+     unconditional.  */
+  if (prev_index == 0)
+    return true;
+  size_t succ_index = idx + 1;
+  while (succ_index < omp_for_loops.length ()
+	 && omp_for_loops [succ_index] == true)
+    succ_index++;
+  /* If all following statements are also OpenACC loops, all of these are
+     unconditional.  */
+  if (succ_index == omp_for_loops.length ())
+    return true;
+  return (find_rep (prev_index) != find_rep (succ_index));
+}
+
+size_t
+control_flow_regions::find_rep (size_t stmt_idx)
+{
+  size_t rep = stmt_idx, aux = stmt_idx;
+  /* Find the root representative of this statement.  */
+  while (representatives[rep] != rep)
+    rep = representatives[rep];
+  /* Compress the path from the original statement to the representative.  */
+  while (representatives[aux] != rep)
+    {
+      size_t tmp = representatives[aux];
+      representatives[aux] = rep;
+      aux = tmp;
+    }
+  return rep;
+}
+
+void
+control_flow_regions::union_reps (size_t a, size_t b)
+{
+  a = find_rep (a);
+  b = find_rep (b);
+  representatives[b] = a;
+}
+
+void
+control_flow_regions::compute_regions (gimple_seq seq)
+{
+  hash_map <gimple *, size_t> control_flow_reps;
+  hash_map <tree, size_t> label_reps;
+  size_t current_region = 0, idx = 0;
+
+  /* In a first pass, assign an initial region to each statement.  Except in
+     the case of OpenACC loops, each statement simply gets the same region
+     representative as its predecessor.  */
+  for (gimple_stmt_iterator gsi = gsi_start (seq);
+       !gsi_end_p (gsi);
+       gsi_next (&gsi))
+    {
+      gimple *stmt = gsi_stmt (gsi);
+      gimple *omp_for = top_level_omp_for_in_stmt (stmt);
+      omp_for_loops.safe_push (omp_for != NULL);
+      if (omp_for != NULL)
+	{
+	  /* Assign a new region to this loop and to its successor.  */
+	  current_region = idx;
+	  representatives.safe_push (current_region);
+	  current_region++;
+	}
+      else
+	{
+	  representatives.safe_push (current_region);
+	  /* Remember any jumps and labels for the second pass below.  */
+	  if (gimple_code (stmt) == GIMPLE_COND
+	      || gimple_code (stmt) == GIMPLE_SWITCH
+	      || gimple_code (stmt) == GIMPLE_GOTO)
+	    control_flow_reps.put (stmt, current_region);
+	  else if (gimple_code (stmt) == GIMPLE_LABEL)
+	    label_reps.put (gimple_label_label (as_a <glabel *> (stmt)),
+			    current_region);
+	}
+      idx++;
+    }
+  gcc_assert (representatives.length () == omp_for_loops.length ());
+
+  /* Revisit all the control flow statements and union the region of each
+     cond, switch, or goto statement with the target labels' regions.  */
+  for (hash_map <gimple *, size_t>::iterator it = control_flow_reps.begin ();
+       it != control_flow_reps.end ();
+       ++it)
+    {
+      gimple *stmt = (*it).first;
+      size_t stmt_rep = (*it).second;
+      switch (gimple_code (stmt))
+	{
+	  tree label;
+	  unsigned int n;
+
+	case GIMPLE_COND:
+	  label = gimple_cond_true_label (as_a <gcond *> (stmt));
+	  union_reps (stmt_rep, *label_reps.get (label));
+	  label = gimple_cond_false_label (as_a <gcond *> (stmt));
+	  union_reps (stmt_rep, *label_reps.get (label));
+	  break;
+
+	case GIMPLE_SWITCH:
+	  n = gimple_switch_num_labels (as_a <gswitch *> (stmt));
+	  for (unsigned int i = 0; i < n; i++)
+	    {
+	      tree switch_case
+		= gimple_switch_label (as_a <gswitch *> (stmt), i);
+	      label = CASE_LABEL (switch_case);
+	      union_reps (stmt_rep, *label_reps.get (label));
+	    }
+	  break;
+
+	case GIMPLE_GOTO:
+	  label = gimple_goto_dest (stmt);
+	  union_reps (stmt_rep, *label_reps.get (label));
+	  break;
+
+	default:
+	  gcc_unreachable ();
+	}
+    }
+}
+
+/* Decompose the body of the KERNELS_REGION, which was originally annotated
+   with the KERNELS_CLAUSES, into a series of compute constructs.  */
+
+static gimple *
+decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
+{
+  location_t loc = gimple_location (kernels_region);
+
+  /* The kernels clauses will be propagated to the child clauses unmodified,
+     except that the 'num_gangs', 'num_workers', and 'vector_length' clauses
+     will only be added to loop regions.  The other regions are "gang-single"
+     and get an explicit 'num_gangs (1)' clause.  So separate out the
+     'num_gangs', 'num_workers', and 'vector_length' clauses here.
+     Also check for the presence of an 'async' clause but do not remove it from
+     the 'kernels' clauses.  */
+  tree num_gangs_clause = NULL, num_workers_clause = NULL,
+       vector_length_clause = NULL;
+  tree async_clause = NULL;
+  tree prev_clause = NULL, next_clause = NULL;
+  tree parallel_clauses = kernels_clauses;
+  for (tree c = parallel_clauses; c; c = next_clause)
+    {
+      /* Preserve this here, as we might NULL it later.  */
+      next_clause = OMP_CLAUSE_CHAIN (c);
+
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_NUM_GANGS
+	  || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_NUM_WORKERS
+	  || OMP_CLAUSE_CODE (c) == OMP_CLAUSE_VECTOR_LENGTH)
+	{
+	  /* Cut this clause out of the chain.  */
+	  if (prev_clause != NULL)
+	    OMP_CLAUSE_CHAIN (prev_clause) = OMP_CLAUSE_CHAIN (c);
+	  else
+	    kernels_clauses = OMP_CLAUSE_CHAIN (c);
+	  OMP_CLAUSE_CHAIN (c) = NULL;
+	  switch (OMP_CLAUSE_CODE (c))
+	    {
+	    case OMP_CLAUSE_NUM_GANGS:
+	      num_gangs_clause = c;
+	      break;
+	    case OMP_CLAUSE_NUM_WORKERS:
+	      num_workers_clause = c;
+	      break;
+	    case OMP_CLAUSE_VECTOR_LENGTH:
+	      vector_length_clause = c;
+	      break;
+	    default:
+	      gcc_unreachable ();
+	    }
+	}
+      else
+	prev_clause = c;
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_ASYNC)
+	async_clause = c;
+    }
+
+  gimple *kernels_body = gimple_omp_body (kernels_region);
+  gbind *kernels_bind = as_a <gbind *> (kernels_body);
+
+  /* The body of the region may contain other nested binds declaring inner
+     local variables.  Collapse all these binds into one to ensure that we
+     have a single sequence of statements to iterate over; also, collect all
+     inner variables.  */
+  tree inner_bind_vars = flatten_binds (kernels_bind);
+  gimple_seq body_sequence = gimple_bind_body (kernels_bind);
+
+  /* All these inner variables will get allocated on the device (below, by
+     calling maybe_build_inner_data_region).  Here we create 'present'
+     clauses for them and add these clauses to the list of clauses to be
+     attached to each inner compute construct.  */
+  tree present_clauses = kernels_clauses;
+  for (tree var = inner_bind_vars; var; var = TREE_CHAIN (var))
+    {
+      if (!DECL_ARTIFICIAL (var) && TREE_CODE (var) != CONST_DECL)
+	{
+	  tree present_clause = build_omp_clause (loc, OMP_CLAUSE_MAP);
+	  OMP_CLAUSE_SET_MAP_KIND (present_clause, GOMP_MAP_FORCE_PRESENT);
+	  OMP_CLAUSE_DECL (present_clause) = var;
+	  OMP_CLAUSE_SIZE (present_clause) = DECL_SIZE_UNIT (var);
+	  OMP_CLAUSE_CHAIN (present_clause) = present_clauses;
+	  present_clauses = present_clause;
+	}
+    }
+  kernels_clauses = present_clauses;
+
+  /* In addition to nested binds, the "real" body of the region may be
+     nested inside a try-finally block.  Find its cleanup block, which
+     contains code to clobber the local variables that must be clobbered.  */
+  gimple *inner_cleanup = NULL;
+  if (body_sequence != NULL && gimple_code (body_sequence) == GIMPLE_TRY)
+    {
+      if (gimple_seq_singleton_p (body_sequence))
+	{
+	  /* The try statement is the only thing inside the bind.  */
+	  inner_cleanup = gimple_try_cleanup (body_sequence);
+	  body_sequence = gimple_try_eval (body_sequence);
+	}
+      else
+	{
+	  /* The bind's body starts with a try statement, but it is followed
+	     by other things.  */
+	  gimple_stmt_iterator gsi = gsi_start (body_sequence);
+	  gimple *try_stmt = gsi_stmt (gsi);
+	  inner_cleanup = gimple_try_cleanup (try_stmt);
+	  gimple *try_body = gimple_try_eval (try_stmt);
+
+	  gsi_remove (&gsi, false);
+	  /* Now gsi indicates the sequence of statements after the try
+	     statement in the bind.  Append the statement in the try body and
+	     the trailing statements from gsi.  */
+	  gsi_insert_seq_before (&gsi, try_body, GSI_CONTINUE_LINKING);
+	  body_sequence = gsi_stmt (gsi);
+	}
+    }
+
+  /* This sequence will collect all the top-level statements in the body of
+     the data region we are about to construct.  */
+  gimple_seq region_body = NULL;
+  /* This sequence will collect consecutive statements to be put into a
+     gang-single region.  */
+  gimple_seq gang_single_seq = NULL;
+  /* Flag recording whether the gang_single_seq only contains copies to
+     local variables.  These may be loop setup code that should not be
+     separated from the loop.  */
+  bool only_simple_assignments = true;
+
+  /* Precompute the control flow region information to determine whether an
+     OpenACC loop is executed conditionally or unconditionally.  */
+  control_flow_regions cf_regions (body_sequence);
+
+  /* Iterate over the statements in the kernels region's body.  */
+  size_t idx = 0;
+  gimple_stmt_iterator gsi, gsi_n;
+  for (gsi = gsi_start (body_sequence); !gsi_end_p (gsi); gsi = gsi_n, idx++)
+    {
+      /* Advance the iterator here because otherwise it would be invalidated
+	 by moving statements below.  */
+      gsi_n = gsi;
+      gsi_next (&gsi_n);
+
+      gimple *stmt = gsi_stmt (gsi);
+      gimple *omp_for = top_level_omp_for_in_stmt (stmt);
+      bool is_unconditional_oacc_for_loop = false;
+      if (omp_for != NULL)
+	is_unconditional_oacc_for_loop
+	  = cf_regions.is_unconditional_oacc_for_loop (idx);
+      if (omp_for != NULL
+	  && is_unconditional_oacc_for_loop)
+	{
+	  /* This is an OMP for statement, put it into a separate region.
+	     But first, construct a gang-single region containing any
+	     complex sequential statements we may have seen.  */
+	  if (gang_single_seq != NULL && !only_simple_assignments)
+	    {
+	      gimple *single_region
+		= make_region_seq (loc, gang_single_seq,
+				   num_gangs_clause,
+				   num_workers_clause,
+				   vector_length_clause,
+				   kernels_clauses);
+	      gimple_seq_add_stmt (&region_body, single_region);
+	    }
+	  else if (gang_single_seq != NULL && only_simple_assignments)
+	    {
+	      /* There is a sequence of sequential statements preceding this
+		 loop, but they are all simple assignments.  This is
+		 probably setup code for the loop; in particular, Fortran DO
+		 loops are preceded by code to copy the loop limit variable
+		 to a temporary.  Group this code together with the loop
+		 itself.  */
+	      gimple_seq_add_stmt (&gang_single_seq, stmt);
+	      stmt = gimple_build_bind (NULL, gang_single_seq,
+					make_node (BLOCK));
+	    }
+	  gang_single_seq = NULL;
+	  only_simple_assignments = true;
+
+	  gimple_seq parallel_seq = NULL;
+	  gimple_seq_add_stmt (&parallel_seq, stmt);
+	  gimple *parallel_region
+	    = make_region_loop_nest (omp_for, parallel_seq,
+				     num_gangs_clause,
+				     num_workers_clause,
+				     vector_length_clause,
+				     kernels_clauses);
+	  gimple_seq_add_stmt (&region_body, parallel_region);
+	}
+      else
+	{
+	  if (omp_for != NULL)
+	    {
+	      gcc_checking_assert (!is_unconditional_oacc_for_loop);
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, omp_for,
+				 "unparallelized loop nest"
+				 " in OpenACC %<kernels%> region:"
+				 " it's executed conditionally\n");
+	    }
+
+	  /* This is not an unconditional OMP for statement, so it will be
+	     put into a gang-single region.  */
+	  gimple_seq_add_stmt (&gang_single_seq, stmt);
+	  /* Is this a simple assignment? We call it simple if it is an
+	     assignment to an artificial local variable.  This captures
+	     Fortran loop setup code computing loop bounds and offsets.  */
+	  bool is_simple_assignment
+	    = (gimple_code (stmt) == GIMPLE_ASSIGN
+	       && TREE_CODE (gimple_assign_lhs (stmt)) == VAR_DECL
+	       && DECL_ARTIFICIAL (gimple_assign_lhs (stmt)));
+	  if (!is_simple_assignment)
+	    only_simple_assignments = false;
+	}
+    }
+
+  /* If we did not emit a new region, and are not going to emit one now
+     (that is, the original region was empty), prepare to emit a dummy so as
+     to preserve the original construct, which other processing (at least
+     test cases) depend on.  */
+  if (region_body == NULL && gang_single_seq == NULL)
+    {
+      gimple *stmt = gimple_build_nop ();
+      gimple_set_location (stmt, loc);
+      gimple_seq_add_stmt (&gang_single_seq, stmt);
+    }
+
+  /* Gather up any remaining gang-single statements.  */
+  if (gang_single_seq != NULL)
+    {
+      gimple *single_region
+	= make_region_seq (loc, gang_single_seq,
+			   num_gangs_clause,
+			   num_workers_clause,
+			   vector_length_clause,
+			   kernels_clauses);
+      gimple_seq_add_stmt (&region_body, single_region);
+    }
+
+  /* We want to launch these kernels asynchronously.  If the original
+     kernels region had an async clause, this is done automatically because
+     that async clause was copied to the individual regions we created.
+     Otherwise, add an async clause to each newly created region, as well as
+     a wait directive at the end.  */
+  if (async_clause == NULL)
+    add_async_clauses_and_wait (loc, &region_body);
+
+  tree kernels_locals = gimple_bind_vars (as_a <gbind *> (kernels_body));
+  gimple *body = gimple_build_bind (kernels_locals, region_body,
+				    make_node (BLOCK));
+
+  /* If we found variables declared in nested scopes, build a data region to
+     map them to the device.  */
+  body = maybe_build_inner_data_region (loc, body, inner_bind_vars,
+					inner_cleanup);
+
+  return body;
+}
+
+/* Decompose one OpenACC 'kernels' construct into an OpenACC 'data' construct
+   containing the original OpenACC 'kernels' construct's region cut up into a
+   sequence of compute constructs.  */
+
+static gimple *
+omp_oacc_kernels_decompose_1 (gimple *kernels_stmt)
+{
+  gcc_checking_assert (gimple_omp_target_kind (kernels_stmt)
+		       == GF_OMP_TARGET_KIND_OACC_KERNELS);
+  location_t loc = gimple_location (kernels_stmt);
+
+  /* Collect the data clauses of the OpenACC 'kernels' directive and create a
+     new OpenACC 'data' construct with those clauses.  */
+  tree kernels_clauses = gimple_omp_target_clauses (kernels_stmt);
+  tree data_clauses = NULL;
+  for (tree c = kernels_clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      /* Certain clauses are copied to the enclosing OpenACC 'data'.  Other
+	 clauses remain on the OpenACC 'kernels'.  */
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP)
+	{
+	  tree decl = OMP_CLAUSE_DECL (c);
+	  HOST_WIDE_INT map_kind = OMP_CLAUSE_MAP_KIND (c);
+	  switch (map_kind)
+	    {
+	    default:
+	      if (map_kind == GOMP_MAP_ALLOC
+		  && integer_zerop (OMP_CLAUSE_SIZE (c)))
+		/* ??? This is an alloc clause for mapping a pointer whose
+		   target is already mapped.  We leave these on the inner
+		   compute constructs because moving them to the outer data
+		   region causes runtime errors.  */
+		break;
+
+	      /* For non-artificial variables, and for non-declaration
+		 expressions like A[0:n], copy the clause to the data
+		 region.  */
+	      if ((DECL_P (decl) && !DECL_ARTIFICIAL (decl))
+		  || !DECL_P (decl))
+		{
+		  tree new_clause = build_omp_clause (OMP_CLAUSE_LOCATION (c),
+						      OMP_CLAUSE_MAP);
+		  OMP_CLAUSE_SET_MAP_KIND (new_clause, map_kind);
+		  /* This must be unshared here to avoid "incorrect sharing
+		     of tree nodes" errors from verify_gimple.  */
+		  OMP_CLAUSE_DECL (new_clause) = unshare_expr (decl);
+		  OMP_CLAUSE_SIZE (new_clause) = OMP_CLAUSE_SIZE (c);
+		  OMP_CLAUSE_CHAIN (new_clause) = data_clauses;
+		  data_clauses = new_clause;
+
+		  /* Now that this data is mapped, turn the data clause on the
+		     inner OpenACC 'kernels' into a 'present' clause.  */
+		  OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_FORCE_PRESENT);
+		}
+	      break;
+
+	    case GOMP_MAP_POINTER:
+	    case GOMP_MAP_TO_PSET:
+	    case GOMP_MAP_FORCE_TOFROM:
+	    case GOMP_MAP_FIRSTPRIVATE_POINTER:
+	    case GOMP_MAP_FIRSTPRIVATE_REFERENCE:
+	      /* ??? Copying these map kinds leads to internal compiler
+		 errors in later passes.  */
+	      break;
+	    }
+	}
+      else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_IF)
+	{
+	  /* If there is an 'if' clause, it must be duplicated to the
+	     enclosing data region.  Temporarily remove the if clause's
+	     chain to avoid copying it.  */
+	  tree saved_chain = OMP_CLAUSE_CHAIN (c);
+	  OMP_CLAUSE_CHAIN (c) = NULL;
+	  tree new_if_clause = unshare_expr (c);
+	  OMP_CLAUSE_CHAIN (c) = saved_chain;
+	  OMP_CLAUSE_CHAIN (new_if_clause) = data_clauses;
+	  data_clauses = new_if_clause;
+	}
+    }
+  /* Restore the original order of the clauses.  */
+  data_clauses = nreverse (data_clauses);
+
+  gimple *data_region
+    = gimple_build_omp_target (NULL, GF_OMP_TARGET_KIND_OACC_DATA_KERNELS,
+			       data_clauses);
+  gimple_set_location (data_region, loc);
+
+  /* Transform the body of the kernels region into a sequence of compute
+     constructs.  */
+  gimple *body = decompose_kernels_region_body (kernels_stmt,
+						kernels_clauses);
+
+  /* Put the transformed pieces together.  The entire body of the region is
+     wrapped in a try-finally statement that calls __builtin_GOACC_data_end
+     for cleanup.  */
+  gimple *try_stmt = make_data_region_try_statement (loc, body);
+  gimple_omp_set_body (data_region, try_stmt);
+
+  return data_region;
+}
+
+
+/* Decompose OpenACC 'kernels' constructs in the current function.  */
+
+static tree
+omp_oacc_kernels_decompose_callback_stmt (gimple_stmt_iterator *gsi_p,
+					  bool *handled_ops_p,
+					  struct walk_stmt_info *)
+{
+  gimple *stmt = gsi_stmt (*gsi_p);
+
+  if ((gimple_code (stmt) == GIMPLE_OMP_TARGET)
+      && gimple_omp_target_kind (stmt) == GF_OMP_TARGET_KIND_OACC_KERNELS)
+    {
+      gimple *stmt_new = omp_oacc_kernels_decompose_1 (stmt);
+      gsi_replace (gsi_p, stmt_new, false);
+      *handled_ops_p = true;
+    }
+  else
+    *handled_ops_p = false;
+
+  return NULL;
+}
+
+static unsigned int
+omp_oacc_kernels_decompose (void)
+{
+  gimple_seq body = gimple_body (current_function_decl);
+
+  struct walk_stmt_info wi;
+  memset (&wi, 0, sizeof (wi));
+  walk_gimple_seq_mod (&body, omp_oacc_kernels_decompose_callback_stmt, NULL,
+		       &wi);
+
+  gimple_set_body (current_function_decl, body);
+
+  return 0;
+}
+
+
+namespace {
+
+const pass_data pass_data_omp_oacc_kernels_decompose =
+{
+  GIMPLE_PASS, /* type */
+  "omp_oacc_kernels_decompose", /* name */
+  OPTGROUP_OMP, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_gimple_any, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_omp_oacc_kernels_decompose : public gimple_opt_pass
+{
+public:
+  pass_omp_oacc_kernels_decompose (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_omp_oacc_kernels_decompose, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+  {
+    return (flag_openacc
+	    && flag_openacc_kernels == OPENACC_KERNELS_DECOMPOSE);
+  }
+  virtual unsigned int execute (function *)
+  {
+    return omp_oacc_kernels_decompose ();
+  }
+
+}; // class pass_omp_oacc_kernels_decompose
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_omp_oacc_kernels_decompose (gcc::context *ctxt)
+{
+  return new pass_omp_oacc_kernels_decompose (ctxt);
+}
diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 21583433d6d..90139615c00 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -1771,11 +1771,19 @@ execute_oacc_device_lower ()
   bool is_oacc_serial
     = (lookup_attribute ("oacc serial",
 			 DECL_ATTRIBUTES (current_function_decl)) != NULL);
+  bool is_oacc_parallel_kernels_parallelized
+    = (lookup_attribute ("oacc parallel_kernels_parallelized",
+			 DECL_ATTRIBUTES (current_function_decl)) != NULL);
+  bool is_oacc_parallel_kernels_gang_single
+    = (lookup_attribute ("oacc parallel_kernels_gang_single",
+			 DECL_ATTRIBUTES (current_function_decl)) != NULL);
   int fn_level = oacc_fn_attrib_level (attrs);
   bool is_oacc_routine = (fn_level >= 0);
   gcc_checking_assert (is_oacc_parallel
 		       + is_oacc_kernels
 		       + is_oacc_serial
+		       + is_oacc_parallel_kernels_parallelized
+		       + is_oacc_parallel_kernels_gang_single
 		       + is_oacc_routine
 		       == 1);
 
@@ -1795,6 +1803,12 @@ execute_oacc_device_lower ()
 		  ? "parallelized" : "unparallelized"));
       else if (is_oacc_serial)
 	fprintf (dump_file, "Function is OpenACC serial offload\n");
+      else if (is_oacc_parallel_kernels_parallelized)
+	fprintf (dump_file, "Function is %s OpenACC kernels offload\n",
+		 "parallel_kernels_parallelized");
+      else if (is_oacc_parallel_kernels_gang_single)
+	fprintf (dump_file, "Function is %s OpenACC kernels offload\n",
+		 "parallel_kernels_gang_single");
       else if (is_oacc_routine)
 	fprintf (dump_file, "Function is OpenACC routine level %d\n",
 		 fn_level);
@@ -1838,6 +1852,11 @@ execute_oacc_device_lower ()
       fprintf (dump_file, "]\n");
     }
 
+  /* Verify that for OpenACC 'kernels' decomposed "gang-single" parts we launch
+     a single gang only.  */
+  if (is_oacc_parallel_kernels_gang_single)
+    gcc_checking_assert (dims[GOMP_DIM_GANG] == 1);
+
   oacc_loop_process (loops);
   if (dump_file)
     {
diff --git a/gcc/passes.def b/gcc/passes.def
index c68231287b6..fc56e695b60 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_warn_unused_result);
   NEXT_PASS (pass_diagnose_omp_blocks);
   NEXT_PASS (pass_diagnose_tm_blocks);
+  NEXT_PASS (pass_omp_oacc_kernels_decompose);
   NEXT_PASS (pass_lower_omp);
   NEXT_PASS (pass_lower_cf);
   NEXT_PASS (pass_lower_tm);
diff --git a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
index 5ab8459d732..7bb115316e8 100644
--- a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
@@ -1,11 +1,21 @@
+/* { dg-additional-options "-fdump-tree-gimple" } */
+/* { dg-additional-options "-fopenacc-kernels=decompose" }
+   { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" } */
+
 void
 f (short c)
 {
-#pragma acc parallel if(c)
-  ;
-#pragma acc kernels if(c)
-  ;
-#pragma acc data if(c)
-  ;
-#pragma acc update device(c) if(c)
+#pragma acc parallel if(c) copy(c)
+  ++c;
+
+#pragma acc kernels if(c) copy(c)
+  /* { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_kernels map\(tofrom:c \[len: [0-9]+\]\) if\(_[0-9]+\)$} 1 "gimple" } } */
+  /* { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_data_kernels map\(tofrom:c \[len: [0-9]+\]\) if\(_[0-9]+\)$} 1 "omp_oacc_kernels_decompose" } }
+     { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_parallel_kernels_gang_single async\(-1\) num_gangs\(1\) map\(force_present:c \[len: [0-9]+\]\) if\(_[0-9]+\)$} 1 "omp_oacc_kernels_decompose" } } */
+  ++c;
+
+#pragma acc data if(c) copy(c)
+  ++c;
+
+#pragma acc update if(c) device(c)
 }
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
new file mode 100644
index 00000000000..92db33273eb
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
@@ -0,0 +1,83 @@
+/* Test OpenACC 'kernels' construct decomposition.  */
+
+/* { dg-additional-options "-fopt-info-omp-all" } */
+/* { dg-additional-options "-fdump-tree-gimple" } */
+/* { dg-additional-options "-fopenacc-kernels=decompose" }
+   { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" } */
+
+/* See also '../../gfortran.dg/goacc/kernels-decompose-1.f95'.  */
+
+#define N 1024
+
+unsigned int a[N];
+
+int
+main (void)
+{
+  int i;
+  unsigned int sum = 1;
+
+#pragma acc kernels copyin(a[0:N]) copy(sum)
+  /* { dg-bogus "optimized: assigned OpenACC seq loop parallelism" "TODO" { xfail *-*-* } .-1 }
+     TODO Is this maybe the report that belongs to the XFAILed report further down?  */
+  {
+    #pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
+    for (i = 0; i < N; ++i)
+      sum += a[i];
+
+    sum++; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+    a[0]++;
+
+    #pragma acc loop independent /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
+    for (i = 0; i < N; ++i)
+      sum += a[i];
+
+    if (sum > 10) /* { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" } */
+      { 
+        #pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+	/* { dg-missed "unparallelized loop nest in OpenACC 'kernels' region: it's executed conditionally" "" { target *-*-* } l_loop_i$c_loop_i } */
+	/*TODO { dg-optimized "assigned OpenACC seq loop parallelism" "TODO" { xfail *-*-* } l_loop_i$c_loop_i } */
+        for (i = 0; i < N; ++i)
+          sum += a[i];
+      }
+
+    #pragma acc loop auto /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
+    for (i = 0; i < N; ++i)
+      sum += a[i];
+  }
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_kernels map\(tofrom:sum \[len: [0-9]+\]\) map\(to:a\[0\] \[len: [0-9]+\]\) map\(firstprivate:a \[pointer assign, bias: 0\]\)$} 1 "gimple" } }
+
+   { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(i\)$} 2 "gimple" } }
+   { dg-final { scan-tree-dump-times {(?n)#pragma acc loop independent private\(i\)$} 1 "gimple" } }
+   { dg-final { scan-tree-dump-times {(?n)#pragma acc loop auto private\(i\)$} 1 "gimple" } }
+   { dg-final { scan-tree-dump-times {(?n)#pragma acc loop} 4 "gimple" } } */
+
+/* Check that the OpenACC 'kernels' got decomposed into 'data' and an enclosed
+   sequence of compute constructs.
+   { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_data_kernels map\(tofrom:sum \[len: [0-9]+\]\) map\(to:a\[0\] \[len: [0-9]+\]\)$} 1 "omp_oacc_kernels_decompose" } }
+   As noted above, we get three "old-style" kernel regions, one gang-single region, and one parallelized loop region.
+   { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_kernels async\(-1\) map\(force_present:sum \[len: [0-9]+\]\) map\(force_present:a\[0\] \[len: [0-9]+\]\) map\(firstprivate:a \[pointer assign, bias: 0\]\)$} 3 "omp_oacc_kernels_decompose" } }
+   { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_parallel_kernels_parallelized async\(-1\) map\(force_present:sum \[len: [0-9]+\]\) map\(force_present:a\[0\] \[len: [0-9]+\]\) map\(firstprivate:a \[pointer assign, bias: 0\]\)$} 1 "omp_oacc_kernels_decompose" } }
+   { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_parallel_kernels_gang_single async\(-1\) num_gangs\(1\) map\(force_present:sum \[len: [0-9]+\]\) map\(force_present:a\[0\] \[len: [0-9]+\]\) map\(firstprivate:a \[pointer assign, bias: 0\]\)$} 1 "omp_oacc_kernels_decompose" } }
+
+   'data' plus five CCs.
+   { dg-final { scan-tree-dump-times {(?n)#pragma omp target } 6 "omp_oacc_kernels_decompose" } }
+
+   { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(i\)$} 2 "omp_oacc_kernels_decompose" } }
+   { dg-final { scan-tree-dump-times {(?n)#pragma acc loop independent private\(i\)$} 1 "omp_oacc_kernels_decompose" } }
+   { dg-final { scan-tree-dump-times {(?n)#pragma acc loop auto private\(i\)$} 1 "omp_oacc_kernels_decompose" } }
+   { dg-final { scan-tree-dump-times {(?n)#pragma acc loop} 4 "omp_oacc_kernels_decompose" } }
+
+   Each of the parallel regions is async, and there is a final call to
+   __builtin_GOACC_wait.
+   { dg-final { scan-tree-dump-times "__builtin_GOACC_wait" 1 "omp_oacc_kernels_decompose" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
new file mode 100644
index 00000000000..ec6c4af92aa
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
@@ -0,0 +1,141 @@
+/* Test OpenACC 'kernels' construct decomposition.  */
+
+/* { dg-additional-options "-fopt-info-omp-all" } */
+/* { dg-additional-options "-fopenacc-kernels=decompose" }
+/* { dg-additional-options "-O2" } for 'parloops'.  */
+
+/* See also '../../gfortran.dg/goacc/kernels-decompose-2.f95'.  */
+
+#pragma acc routine gang
+extern int
+f_g (int);
+
+#pragma acc routine worker
+extern int
+f_w (int);
+
+#pragma acc routine vector
+extern int
+f_v (int);
+
+#pragma acc routine seq
+extern int
+f_s (int);
+
+int
+main ()
+{
+  int x, y, z;
+#define N 10
+  int a[N], b[N], c[N];
+
+#pragma acc kernels
+  {
+    x = 0; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+    y = x < 10;
+    z = x++;
+    ;
+  }
+
+  { /*TODO Instead of using 'for (int i = 0; [...])', move 'int i' outside, to work around for ICE detailed in 'kernels-decompose-ice-1.c'.  */
+    int i;
+#pragma acc kernels /* { dg-optimized "assigned OpenACC gang loop parallelism" } */
+  for (i = 0; i < N; i++) /* { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" } */
+    a[i] = 0;
+  }
+
+#pragma acc kernels loop /* { dg-line l_loop_i[incr c_loop_i] } */
+  /* { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
+  for (int i = 0; i < N; i++)
+    b[i] = a[N - i - 1];
+
+#pragma acc kernels
+  {
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
+    for (int i = 0; i < N; i++)
+      b[i] = a[N - i - 1];
+
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
+    for (int i = 0; i < N; i++)
+      c[i] = a[i] * b[i];
+
+    a[z] = 0; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
+    for (int i = 0; i < N; i++)
+      c[i] += a[i];
+
+#pragma acc loop seq /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
+    for (int i = 0 + 1; i < N; i++)
+      c[i] += c[i - 1];
+  }
+
+#pragma acc kernels
+  /*TODO What does this mean?
+    TODO { dg-optimized "assigned OpenACC worker vector loop parallelism" "" { target *-*-* } .-2 } */
+  {
+#pragma acc loop independent /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i } */
+    for (int i = 0; i < N; ++i)
+#pragma acc loop independent /* { dg-line l_loop_j[incr c_loop_j] } */
+      /* { dg-optimized "assigned OpenACC worker loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j } */
+      for (int j = 0; j < N; ++j)
+#pragma acc loop independent /* { dg-line l_loop_k[incr c_loop_k] } */
+	/* { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } l_loop_k$c_loop_k } */
+	/* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_k$c_loop_k } */
+	for (int k = 0; k < N; ++k)
+	  a[(i + j + k) % N]
+	    = b[j]
+	    + f_v (c[k]); /* { dg-optimized "assigned OpenACC vector loop parallelism" } */
+
+    /*TODO Should the following turn into "gang-single" instead of "parloops"?
+      TODO The problem is that the first STMT is 'if (y <= 4) goto <D.2547>; else goto <D.2548>;', thus "parloops".  */
+    if (y < 5) /* { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" } */
+#pragma acc loop independent /* { dg-line l_loop_j[incr c_loop_j] } */
+      /* { dg-missed "unparallelized loop nest in OpenACC 'kernels' region: it's executed conditionally" "" { target *-*-* } l_loop_j$c_loop_j } */
+      for (int j = 0; j < N; ++j)
+	b[j] = f_w (c[j]);
+  }
+
+#pragma acc kernels
+  {
+    y = f_g (a[5]); /* { dg-line l_part[incr c_part] } */
+    /*TODO If such a construct is placed in its own part (like it is, here), can't this actually use gang paralelism, instead of "gang-single"?
+      { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" "" { target *-*-* } l_part$c_part } */
+    /* { dg-optimized "assigned OpenACC gang worker vector loop parallelism" "" { target *-*-* } l_part$c_part } */
+
+#pragma acc loop independent /* { dg-line l_loop_j[incr c_loop_j] } */
+    /* { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_j$c_loop_j } */
+    /* { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j } */
+    for (int j = 0; j < N; ++j)
+      b[j] = y + f_w (c[j]); /* { dg-optimized "assigned OpenACC worker vector loop parallelism" } */
+  }
+
+#pragma acc kernels
+  {
+    y = 3; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+
+#pragma acc loop independent /* { dg-line l_loop_j[incr c_loop_j] } */
+    /* { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_j$c_loop_j } */
+    /* { dg-optimized "assigned OpenACC gang worker loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j } */
+    for (int j = 0; j < N; ++j)
+      b[j] = y + f_v (c[j]); /* { dg-optimized "assigned OpenACC vector loop parallelism" } */
+
+    z = 2; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+  }
+
+#pragma acc kernels /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+  ;
+
+  return 0;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
new file mode 100644
index 00000000000..9e27d1fb9b5
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
@@ -0,0 +1,108 @@
+/* Test OpenACC 'kernels' construct decomposition.  */
+
+/* { dg-additional-options "-fopt-info-omp-all" } */
+/* { dg-additional-options "-fopenacc-kernels=decompose" } */
+/* { dg-ice "TODO" }
+   { dg-prune-output "during GIMPLE pass: omplower" } */
+
+/* Reduced from 'kernels-decompose-2.c'.
+   (Hopefully) similar instances:
+     - 'libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c'
+     - 'libgomp.oacc-c-c++-common/kernels-decompose-1.c'
+*/
+
+int
+main ()
+{
+#define N 10
+
+#pragma acc kernels
+  for (int i = 0; i < N; i++) /* { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" } */
+    ;
+
+  return 0;
+}
+
+/*
+  In 'gimple' we've got:
+
+      main ()
+      {
+        int D.2087;
+      
+        {
+          int a[10];
+      
+          try
+            {
+              #pragma omp target oacc_kernels map(tofrom:a [len: 40])
+                {
+                  {
+                    int i;
+      
+                    i = 0;
+                    goto <D.2085>;
+      [...]
+
+  ..., which in 'omp_oacc_kernels_decompose' we turn into:
+
+      main ()
+      {
+        int D.2087;
+      
+        {
+          int a[10];
+      
+          try
+            {
+              #pragma omp target oacc_data_kernels map(tofrom:a [len: 40])
+                {
+                  try
+                    {
+                      {
+                        int i;
+      
+                        #pragma omp target oacc_data_kernels map(alloc:i [len: 4])
+                          {
+                            try
+                              {
+                                {
+                                  #pragma omp target oacc_kernels async(-1) map(force_present:i [len: 4]) map(force_present:a [len: 40])
+                                    {
+                                      i = 0;
+                                      goto <D.2085>;
+      [...]
+
+  ..., which results in ICE in:
+
+    #1  0x0000000000d2247b in lower_omp_target (gsi_p=gsi_p@entry=0x7fffffffbc90, ctx=ctx@entry=0x2c994c0) at [...]/gcc/omp-low.c:11981
+    11981                       gcc_assert (offloaded);
+    (gdb) list
+    11976                         talign = TYPE_ALIGN_UNIT (TREE_TYPE (TREE_TYPE (ovar)));
+    11977                       gimplify_assign (x, var, &ilist);
+    11978                     }
+    11979                   else if (is_gimple_reg (var))
+    11980                     {
+    11981                       gcc_assert (offloaded);
+    11982                       tree avar = create_tmp_var (TREE_TYPE (var));
+    11983                       mark_addressable (avar);
+    11984                       enum gomp_map_kind map_kind = OMP_CLAUSE_MAP_KIND (c);
+    11985                       if (GOMP_MAP_COPY_TO_P (map_kind)
+    (gdb) call debug_tree(var)
+     <var_decl 0x7ffff7feebd0 i
+        type <integer_type 0x7ffff67be5e8 int sizes-gimplified public SI
+            size <integer_cst 0x7ffff67a5f18 constant 32>
+            unit-size <integer_cst 0x7ffff67a5f30 constant 4>
+            align:32 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type 0x7ffff67be5e8 precision:32 min <integer_cst 0x7ffff67a5ed0 -2147483648> max <integer_cst 0x7ffff67a5ee8 2147483647>
+            pointer_to_this <pointer_type 0x7ffff67c69d8>>
+        used read SI [...]:15:12 size <integer_cst 0x7ffff67a5f18 32> unit-size <integer_cst 0x7ffff67a5f30 4>
+        align:32 warn_if_not_align:0 context <function_decl 0x7ffff68eea00 main>>
+
+  Just defusing the 'assert' is not sufficient:
+
+      libgomp: present clause: !acc_is_present (0x7ffe29cba3ec, 4 (0x4))
+
+  TODO Can't the 'omp_oacc_kernels_decompose' transformation be much simpler, such that we avoid the intermediate 'data' if we've got just one compute construct inside it?
+  TODO But it's not clear if that'd just resolve one simple instance of the general problem?
+
+*/
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
new file mode 100644
index 00000000000..839e6803851
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
@@ -0,0 +1,16 @@
+/* Test OpenACC 'kernels' construct decomposition.  */
+
+/* { dg-additional-options "-fopenacc-kernels=decompose" } */
+/* { dg-ice "TODO" }
+   { dg-prune-output "during GIMPLE pass: omp_oacc_kernels_decompose" } */
+
+/* Reduced from 'kernels-decompose-ice-1.c'.  */
+
+int
+main ()
+{
+#pragma acc kernels
+  {
+    int i;
+  }
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
new file mode 100644
index 00000000000..95a78623ebf
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
@@ -0,0 +1,81 @@
+! Test OpenACC 'kernels' construct decomposition.
+
+! { dg-additional-options "-fopt-info-omp-all" }
+! { dg-additional-options "-fdump-tree-gimple" }
+! { dg-additional-options "-fopenacc-kernels=decompose" }
+! { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" }
+
+! See also '../../c-c++-common/goacc/kernels-decompose-1.c'.
+
+program main
+  implicit none
+  integer, parameter         :: N = 1024
+  integer, dimension (1:N)   :: a
+  integer                    :: i, sum
+
+  !$acc kernels copyin(a(1:N)) copy(sum)
+  ! { dg-bogus "optimized: assigned OpenACC seq loop parallelism" "TODO" { xfail *-*-* } .-1 }
+  !TODO Is this maybe the report that belongs to the XFAILed report further down?  */
+
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  do i = 1, N
+    sum = sum + a(i)
+  end do
+
+  sum = sum + 1 ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
+  a(1) = a(1) + 1
+
+  !$acc loop independent ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  do i = 1, N
+    sum = sum + a(i)
+  end do
+
+  if (sum .gt. 10) then ! { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" }
+    !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+    ! { dg-missed "unparallelized loop nest in OpenACC 'kernels' region: it's executed conditionally" "" { target *-*-* } l_loop_i$c_loop_i }
+    !TODO { dg-optimized "assigned OpenACC seq loop parallelism" "TODO" { xfail *-*-* } l_loop_i$c_loop_i }
+    do i = 1, N
+      sum = sum + a(i)
+    end do
+  end if
+
+  !$acc loop auto ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  do i = 1, N
+    sum = sum + a(i)
+  end do
+
+  !$acc end kernels
+end program main
+
+! { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_kernels map\(to:a\[_[0-9]+\] \[len: _[0-9]+\]\) map\(alloc:a \[pointer assign, bias: _[0-9]+\]\) map\(tofrom:sum \[len: [0-9]+\]\)$} 1 "gimple" } }
+
+! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(i\)$} 2 "gimple" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(i\) independent$} 1 "gimple" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(i\) auto$} 1 "gimple" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop} 4 "gimple" } }
+
+! Check that the OpenACC 'kernels' got decomposed into 'data' and an enclosed
+! sequence of compute constructs.
+! { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_data_kernels map\(to:a\[_[0-9]+\] \[len: _[0-9]+\]\) map\(tofrom:sum \[len: [0-9]+\]\)$} 1 "omp_oacc_kernels_decompose" } }
+! As noted above, we get three "old-style" kernel regions, one gang-single region, and one parallelized loop region.
+! { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_kernels async\(-1\) map\(force_present:a\[_[0-9]+\] \[len: _[0-9]+\]\) map\(alloc:a \[pointer assign, bias: _[0-9]+\]\) map\(force_present:sum \[len: [0-9]+\]\)$} 3 "omp_oacc_kernels_decompose" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_parallel_kernels_parallelized async\(-1\) map\(force_present:a\[_[0-9]+\] \[len: _[0-9]+\]\) map\(alloc:a \[pointer assign, bias: _[0-9]+\]\) map\(force_present:sum \[len: [0-9]+\]\)$} 1 "omp_oacc_kernels_decompose" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_parallel_kernels_gang_single async\(-1\) num_gangs\(1\) map\(force_present:a\[_[0-9]+\] \[len: _[0-9]+\]\) map\(alloc:a \[pointer assign, bias: _[0-9]+\]\) map\(force_present:sum \[len: [0-9]+\]\)$} 1 "omp_oacc_kernels_decompose" } }
+!
+! 'data' plus five CCs.
+! { dg-final { scan-tree-dump-times {(?n)#pragma omp target } 6 "omp_oacc_kernels_decompose" } }
+
+! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(i\)$} 2 "omp_oacc_kernels_decompose" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(i\) independent$} 1 "omp_oacc_kernels_decompose" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop private\(i\) auto} 1 "omp_oacc_kernels_decompose" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma acc loop} 4 "omp_oacc_kernels_decompose" } }
+
+! Each of the parallel regions is async, and there is a final call to
+! __builtin_GOACC_wait.
+! { dg-final { scan-tree-dump-times "__builtin_GOACC_wait" 1 "omp_oacc_kernels_decompose" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
new file mode 100644
index 00000000000..58d687d4a0c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
@@ -0,0 +1,142 @@
+! Test OpenACC 'kernels' construct decomposition.
+
+! { dg-additional-options "-fopt-info-omp-all" }
+! { dg-additional-options "-fopenacc-kernels=decompose" }
+! { dg-additional-options "-O2" } for 'parloops'.
+
+! See also '../../c-c++-common/goacc/kernels-decompose-2.c'.
+
+program main
+  implicit none
+
+  integer, external :: f_g
+  !$acc routine (f_g) gang
+  integer, external :: f_w
+  !$acc routine (f_w) worker
+  integer, external :: f_v
+  !$acc routine (f_v) vector
+  integer, external :: f_s
+  !$acc routine (f_s) seq
+
+  integer :: i, j, k
+  integer :: x, y, z
+  logical :: y_l
+  integer, parameter :: N = 10
+  integer :: a(N), b(N), c(N)
+
+  !$acc kernels
+  x = 0 ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
+  y = 0
+  y_l = x < 10
+  z = x
+  x = x + 1
+  ;
+  !$acc end kernels
+
+  !$acc kernels ! { dg-optimized "assigned OpenACC gang loop parallelism" }
+  do i = 1, N ! { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" }
+     a(i) = 0
+  end do
+  !$acc end kernels
+
+  !$acc kernels loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  do i = 1, N
+     b(i) = a(N - i + 1)
+  end do
+
+  !$acc kernels
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  do i = 1, N
+     b(i) = a(N - i + 1)
+  end do
+
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  do i = 1, N
+     c(i) = a(i) * b(i)
+  end do
+
+  a(z) = 0 ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
+
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  do i = 1, N
+     c(i) = c(i) + a(i)
+  end do
+
+  !$acc loop seq ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  do i = 1 + 1, N
+     c(i) = c(i) + c(i - 1)
+  end do
+  !$acc end kernels
+
+  !$acc kernels
+  !TODO What does this mean?
+  !TODO { dg-optimized "assigned OpenACC worker vector loop parallelism" "" { target *-*-* } .-2 }
+  !$acc loop independent ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
+  do i = 1, N
+     !$acc loop independent ! { dg-line l_loop_j[incr c_loop_j] }
+     ! { dg-optimized "assigned OpenACC worker loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j }
+     do j = 1, N
+        !$acc loop independent ! { dg-line l_loop_k[incr c_loop_k] }
+        ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } l_loop_k$c_loop_k }
+        ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_k$c_loop_k }
+        do k = 1, N
+           a(1 + mod(i + j + k, N)) &
+                = b(j) &
+                + f_v (c(k)) ! { dg-optimized "assigned OpenACC vector loop parallelism" }
+        end do
+     end do
+  end do
+
+  !TODO Should the following turn into "gang-single" instead of "parloops"?
+  !TODO The problem is that the first STMT is 'if (y <= 4) goto <D.2547>; else goto <D.2548>;', thus "parloops".
+  if (y < 5) then ! { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" }
+     !$acc loop independent ! { dg-line l_loop_j[incr c_loop_j] }
+     ! { dg-missed "unparallelized loop nest in OpenACC 'kernels' region: it's executed conditionally" "" { target *-*-* } l_loop_j$c_loop_j }
+     do j = 1, N
+        b(j) = f_w (c(j))
+     end do
+  end if
+  !$acc end kernels
+
+  !$acc kernels
+  y = f_g (a(5)) ! { dg-line l_part[incr c_part] }
+  !TODO If such a construct is placed in its own part (like it is, here), can't this actually use gang paralelism, instead of "gang-single"?
+  ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" "" { target *-*-* } l_part$c_part }
+  ! { dg-optimized "assigned OpenACC gang worker vector loop parallelism" "" { target *-*-* } l_part$c_part }
+
+  !$acc loop independent ! { dg-line l_loop_j[incr c_loop_j] }
+  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_j$c_loop_j }
+  ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j }
+  do j = 1, N
+     b(j) = y + f_w (c(j)) ! { dg-optimized "assigned OpenACC worker vector loop parallelism" }
+  end do
+  !$acc end kernels
+
+  !$acc kernels
+  y = 3 ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
+
+  !$acc loop independent ! { dg-line l_loop_j[incr c_loop_j] }
+  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_j$c_loop_j }
+  ! { dg-optimized "assigned OpenACC gang worker loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j }
+  do j = 1, N
+     b(j) = y + f_v (c(j)) ! { dg-optimized "assigned OpenACC vector loop parallelism" }
+  end do
+
+  z = 2 ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
+  !$acc end kernels
+
+  !$acc kernels ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
+  !$acc end kernels  
+end program main
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
index 5583ffb4d04..d01eee2fa5d 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
@@ -1,5 +1,7 @@
 ! { dg-do compile } 
 ! { dg-additional-options "-fdump-tree-original" } 
+! { dg-additional-options "-fopenacc-kernels=decompose" }
+! { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" }
 
 program test
   implicit none
@@ -34,3 +36,6 @@ end program test
 ! { dg-final { scan-tree-dump-times "map\\(alloc:t\\)" 1 "original" } } 
 
 ! { dg-final { scan-tree-dump-times "map\\(force_deviceptr:u\\)" 1 "original" } } 
+
+! { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_data_kernels if\(D\.[0-9]+\)$} 1 "omp_oacc_kernels_decompose" } }
+! { dg-final { scan-tree-dump-times {(?n)#pragma omp target oacc_parallel_kernels_gang_single num_gangs\(1\) if\(D\.[0-9]+\) async\(-1\)$} 1 "omp_oacc_kernels_decompose" } }
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 9cb22acc243..cc4870e9711 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -416,6 +416,7 @@ extern gimple_opt_pass *make_pass_lower_switch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_switch_O0 (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_vector (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_vector_ssa (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_omp_oacc_kernels_decompose (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
new file mode 100644
index 00000000000..c7eae12ec10
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
@@ -0,0 +1,8 @@
+/* { dg-additional-options "-fopenacc-kernels=decompose" } */
+/* Hopefully, this is the same issue as '../../../gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c'.
+   { dg-ice "TODO" }
+   TODO { dg-prune-output "during GIMPLE pass: omplower" }
+   TODO { dg-do link } */
+
+#undef KERNELS_DECOMPOSE_ICE_HACK
+#include "declare-vla.c"
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c
new file mode 100644
index 00000000000..dd8a1c1d294
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c
@@ -0,0 +1,6 @@
+/* { dg-additional-options "-fopenacc-kernels=decompose" } */
+
+/* See also 'declare-vla-kernels-decompose-ice-1.c'.  */
+
+#define KERNELS_DECOMPOSE_ICE_HACK
+#include "declare-vla.c"
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
index 714935772c1..3bd6331879d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
@@ -38,6 +38,12 @@ f_data (void)
     for (i = 0; i < N; i++)
       A[i] = -i;
 
+    /* See 'declare-vla-kernels-decompose.c'.  */
+#ifdef KERNELS_DECOMPOSE_ICE_HACK
+    (volatile int *) &i;
+    (volatile int *) &N;
+#endif
+
 # pragma acc kernels
     for (i = 0; i < N; i++)
       A[i] = i;
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
new file mode 100644
index 00000000000..fa8ae6c79cd
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
@@ -0,0 +1,38 @@
+/* Test OpenACC 'kernels' construct decomposition.  */
+
+/* { dg-additional-options "-fopt-info-omp-all" } */
+/* { dg-additional-options "-fopenacc-kernels=decompose" } */
+
+#undef NDEBUG
+#include <assert.h>
+
+int main()
+{
+  int a = 0;
+  /*TODO Without making 'a' addressable, for GCN offloading we will not see the expected value copied out.  (But it does work for nvptx offloading, strange...)  */
+  (volatile int *) &a;
+#define N 123
+  int b[N] = { 0 };
+
+#pragma acc kernels
+  {
+    int c = 234; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+
+    /*TODO Hopefully, this is the same issue as '../../../gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c'.  */
+    (volatile int *) &c;
+
+#pragma acc loop independent gang /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
+    for (int i = 0; i < N; ++i)
+      b[i] = c;
+
+    a = c; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+  }
+
+  for (int i = 0; i < N; ++i)
+    assert (b[i] == 234);
+  assert (a == 234);
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
index 5013c5ba04b..82d8351f0e3 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
@@ -1,17 +1,22 @@
 ! { dg-do run }
 ! { dg-additional-options "-fopt-info-omp-all" }
+! { dg-additional-options "-fopenacc-kernels=decompose" }
 
 subroutine kernel(lo, hi, a, b, c)
   implicit none
   integer :: lo, hi, i
   real, dimension(lo:hi) :: a, b, c
 
-  !$acc kernels copyin(lo, hi) ! { dg-optimized "assigned OpenACC seq loop parallelism" }
-  !$acc loop independent
+  !$acc kernels copyin(lo, hi)
+  !$acc loop independent ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = lo, hi
      b(i) = a(i)
   end do
-  !$acc loop independent
+  !$acc loop independent ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = lo, hi
      c(i) = b(i)
   end do
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* In 'gcc/omp-oacc-kernels-decompose.cc', use langhook instead of accessing language-specific decl information (was: [PATCH 04/10, OpenACC] Turn OpenACC kernels regions into a sequence of, parallel regions)
  2019-08-05 21:58       ` Kwok Cheung Yeung
@ 2020-11-13 22:33         ` Thomas Schwinge
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Schwinge @ 2020-11-13 22:33 UTC (permalink / raw)
  To: Kwok Cheung Yeung, Jakub Jelinek, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1942 bytes --]

Hi!

On 2019-08-05T22:51:22+0100, Kwok Cheung Yeung <kcy@codesourcery.com> wrote:
> On 18/07/2019 10:30 am, Jakub Jelinek wrote:
>> On Wed, Jul 17, 2019 at 10:06:07PM +0100, Kwok Cheung Yeung wrote:
>>> --- a/gcc/omp-oacc-kernels.c
>>> +++ b/gcc/omp-oacc-kernels.c
>>> @@ -30,6 +30,7 @@ along with GCC; see the file COPYING3.  If not see
>>>   #include "backend.h"
>>>   #include "target.h"
>>>   #include "tree.h"
>>> +#include "cp/cp-tree.h"
>>
>> No, you certainly don't want to do this.  Use langhooks if needed

ACK.

>> though
>> that can be only for stuff done before IPA.  After IPA, because of LTO FE, you
>> must not rely on anything that is not in the IL generically.

ACK, and this is very early:

       NEXT_PASS (pass_diagnose_omp_blocks);
       NEXT_PASS (pass_diagnose_tm_blocks);
    +  NEXT_PASS (pass_omp_oacc_kernels_decompose);
       NEXT_PASS (pass_lower_omp);

> I have modified the patch to use the get_generic_function_decl langhook
> to determine whether current_function_decl is an instantiation of a
> template (in this case, we don't care what the generic decl is - just
> whether the function decl has one).

To me, it's not obvious that the original:

    (DECL_LANG_SPECIFIC (current_function_decl)
     && DECL_TEMPLATE_INSTANTIATION (current_function_decl)))

... may be replaced with:

    (lang_hooks.decls.get_generic_function_decl (current_function_decl)
     != NULL)

..., so thanks, Kwok, that you've figured that out.  :-)

I've just pushed to master branch commit
ccd56db89806a5f6eb3be99fc3b4fe364cf35e98 "In
'gcc/omp-oacc-kernels-decompose.cc', use langhook instead of accessing
language-specific decl information", see attached.


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-In-gcc-omp-oacc-kernels-decompose.cc-use-langhook-in.patch --]
[-- Type: text/x-diff, Size: 2062 bytes --]

From ccd56db89806a5f6eb3be99fc3b4fe364cf35e98 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <kcy@codesourcery.com>
Date: Mon, 5 Aug 2019 22:51:22 +0100
Subject: [PATCH] In 'gcc/omp-oacc-kernels-decompose.cc', use langhook instead
 of accessing language-specific decl information

	gcc/
	* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
	Use langhook instead of accessing language-specific decl
	information.
---
 gcc/omp-oacc-kernels-decompose.cc | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc
index c585e5d092b..baad1b9a348 100644
--- a/gcc/omp-oacc-kernels-decompose.cc
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -25,7 +25,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "backend.h"
 #include "target.h"
 #include "tree.h"
-#include "cp/cp-tree.h"
+#include "langhooks.h"
 #include "gimple.h"
 #include "tree-pass.h"
 #include "cgraph.h"
@@ -792,6 +792,12 @@ static gimple *
 maybe_build_inner_data_region (location_t loc, gimple *body,
 			       tree inner_bind_vars, gimple *inner_cleanup)
 {
+  /* Is this an instantiation of a template?  (In this case, we don't care what
+     the generic decl is - just whether the function decl has one.)  */
+  bool generic_inst_p
+    = (lang_hooks.decls.get_generic_function_decl (current_function_decl)
+       != NULL);
+
   /* Build data 'create (var)' clauses for these local variables.
      Below we will add these to a data region enclosing the entire body
      of the decomposed kernels region.  */
@@ -802,8 +808,7 @@ maybe_build_inner_data_region (location_t loc, gimple *body,
       next = TREE_CHAIN (v);
       if (DECL_ARTIFICIAL (v)
 	  || TREE_CODE (v) == CONST_DECL
-	  || (DECL_LANG_SPECIFIC (current_function_decl)
-	      && DECL_TEMPLATE_INSTANTIATION (current_function_decl)))
+	  || generic_inst_p)
 	{
 	  /* If this is an artificial temporary, it need not be mapped.  We
 	     move its declaration into the bind inside the data region.
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs (was: [og8] OpenACC 'kernels' construct changes: splitting of the construct into several regions)
  2020-11-13 22:22 ` Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs (was: [og8] OpenACC 'kernels' construct changes: splitting of the construct into several regions) Thomas Schwinge
@ 2020-11-15  9:14   ` Tobias Burnus
  2021-04-19  8:29     ` Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs Thomas Schwinge
  2020-11-27 13:50   ` Thomas Schwinge
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 33+ messages in thread
From: Tobias Burnus @ 2020-11-15  9:14 UTC (permalink / raw)
  To: Thomas Schwinge, gcc-patches, Frederik Harwath; +Cc: Jakub Jelinek, fortran

Hi Thomas,

regarding the new option:

+fopenacc-kernels=
+C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS)
+-fopenacc-kernels=[decompose|parloops]	Specify mode of OpenACC 'kernels' constructs handling.
+

and

On 13.11.20 23:22, Thomas Schwinge wrote:
> Not yet enabled by default: for now, the current mode of OpenACC 'kernels'
> constructs handling still remains '-fopenacc-kernels=parloops', but that is to
> change later.

Do you intent that users will switch between those two options?

Or is this new option only an interim solution until decompose is handled?

If the latter, maybe using a --param makes more sense than keeping the later for ever.

Tobias



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs
  2020-11-13 22:22 ` Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs (was: [og8] OpenACC 'kernels' construct changes: splitting of the construct into several regions) Thomas Schwinge
  2020-11-15  9:14   ` Tobias Burnus
@ 2020-11-27 13:50   ` Thomas Schwinge
  2022-01-13  9:44   ` Enhance OpenACC 'kernels' decomposition testing (was: Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs) Thomas Schwinge
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 33+ messages in thread
From: Thomas Schwinge @ 2020-11-27 13:50 UTC (permalink / raw)
  To: gcc-patches, Frederik Harwath; +Cc: Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 4468 bytes --]

Hi!

On 2020-11-13T23:22:30+0100, I wrote:
> [...] I've pushed to master branch [...] commit
> e898ce7997733c29dcab9c3c62ca102c7f9fa6eb "Decompose OpenACC 'kernels'
> constructs into parts, a sequence of compute constructs", see attached.
>
>> There's more work to be done there, and we're aware of a number of TODO
>> items, but nevertheless: it's a good first step.

> --- /dev/null
> +++ b/gcc/omp-oacc-kernels-decompose.cc

> +/* Eliminate any binds directly inside BIND by adding their statements to
> +   BIND (i.e., modifying it in place), excluding binds that hold only an
> +   OMP_FOR loop and associated setup/cleanup code.  Recurse into binds but
> +   not other statements.  Return a chain of the local variables of eliminated
> +   binds, i.e., the local variables found in nested binds.  If
> +   INCLUDE_TOPLEVEL_VARS is true, this also includes the variables belonging
> +   to BIND itself. */
> +
> +static tree
> +flatten_binds (gbind *bind, bool include_toplevel_vars = false)
> +{
> +  tree vars = NULL, last_var = NULL;
> +
> +  if (include_toplevel_vars)
> +    {
> +      vars = gimple_bind_vars (bind);
> +      last_var = vars;
> +    }
> +
> +  gimple_seq new_body = NULL;
> +  gimple_seq body_sequence = gimple_bind_body (bind);
> +  gimple_stmt_iterator gsi, gsi_n;
> +  for (gsi = gsi_start (body_sequence); !gsi_end_p (gsi); gsi = gsi_n)
> +    {
> +      /* Advance the iterator here because otherwise it would be invalidated
> +      by moving statements below.  */
> +      gsi_n = gsi;
> +      gsi_next (&gsi_n);
> +
> +      gimple *stmt = gsi_stmt (gsi);
> +      /* Flatten bind statements, except the ones that contain only an
> +      OpenACC for loop.  */
> +      if (gimple_code (stmt) == GIMPLE_BIND
> +       && !top_level_omp_for_in_stmt (stmt))
> +     {
> +       gbind *inner_bind = as_a <gbind *> (stmt);
> +       /* Flatten recursively, and collect all variables.  */
> +       tree inner_vars = flatten_binds (inner_bind, true);
> +       gimple_seq inner_sequence = gimple_bind_body (inner_bind);
> +       gcc_assert (gimple_code (inner_sequence) != GIMPLE_BIND
> +                   || top_level_omp_for_in_stmt (inner_sequence));

First, this gives rise to the ICE documented in
'c-c++-common/goacc/kernels-decompose-ice-2.c': 'inner_sequence' is
'NULL' in this case (which is valid, meaning no statements in
'GIMPLE_BIND'), but 'gimple_code' then attempts to dereference 'NULL';
SIGSEGV.

Second, it seems strange to examine only the first statement of inner
'GIMPLE_BIND' (via 'inner_sequence' being a 'typedef gimple *gimple_seq'),
so I changed that to examine all statements contained therein, which I
suppose must've been the intention here.  (This also conveniently fixes
the ICE mentioned above.)

I've pushed "In 'gcc/omp-oacc-kernels-decompose.cc:flatten_binds', don't
choke on empty GIMPLE sequence" to master branch in commit
4b5726fda653d11f882fb9a112e4cffa12f7ed61, see attached.


Grüße
 Thomas


> +       gimple_seq_add_seq (&new_body, inner_sequence);
> +       /* Find the last variable; we will append others to it.  */
> +       while (last_var != NULL && TREE_CHAIN (last_var) != NULL)
> +         last_var = TREE_CHAIN (last_var);
> +       if (last_var != NULL)
> +         {
> +           TREE_CHAIN (last_var) = inner_vars;
> +           last_var = inner_vars;
> +         }
> +       else
> +         {
> +           vars = inner_vars;
> +           last_var = vars;
> +         }
> +     }
> +      else
> +     gimple_seq_add_stmt (&new_body, stmt);
> +    }
> +
> +  /* Put the possibly transformed body back into the bind.  */
> +  gimple_bind_set_body (bind, new_body);
> +  return vars;
> +}

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
> @@ -0,0 +1,16 @@
> +/* Test OpenACC 'kernels' construct decomposition.  */
> +
> +/* { dg-additional-options "-fopenacc-kernels=decompose" } */
> +/* { dg-ice "TODO" }
> +   { dg-prune-output "during GIMPLE pass: omp_oacc_kernels_decompose" } */
> +
> +/* Reduced from 'kernels-decompose-ice-1.c'.  */
> +
> +int
> +main ()
> +{
> +#pragma acc kernels
> +  {
> +    int i;
> +  }
> +}


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-In-gcc-omp-oacc-kernels-decompose.cc-flatten_binds-d.patch --]
[-- Type: text/x-diff, Size: 3479 bytes --]

From 4b5726fda653d11f882fb9a112e4cffa12f7ed61 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Fri, 27 Nov 2020 11:54:50 +0100
Subject: [PATCH] In 'gcc/omp-oacc-kernels-decompose.cc:flatten_binds', don't
 choke on empty GIMPLE sequence

Also, instead of just examining the first statement of inner 'GIMPLE_BIND' (via
'inner_sequence' being a 'typedef gimple *gimple_seq'), in fact examine all
statements contained therein, which I suppose must've been the intention here.

This "fixes" the testcase 'c-c++-common/goacc/kernels-decompose-ice-2.c' (which
now runs into the same ICE as 'c-c++-common/goacc/kernels-decompose-ice-1.c',
etc.).

	gcc/
	* omp-oacc-kernels-decompose.cc (flatten_binds): Don't choke on
	empty GIMPLE sequence, and examine all statements contained in
	inner 'GIMPLE_BIND'.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-ice-1.c: Adjust.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
---
 gcc/omp-oacc-kernels-decompose.cc                   | 13 +++++++++++--
 .../c-c++-common/goacc/kernels-decompose-ice-1.c    |  1 +
 .../c-c++-common/goacc/kernels-decompose-ice-2.c    |  2 +-
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc
index baad1b9a348..c46168e063a 100644
--- a/gcc/omp-oacc-kernels-decompose.cc
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -740,8 +740,17 @@ flatten_binds (gbind *bind, bool include_toplevel_vars = false)
 	  /* Flatten recursively, and collect all variables.  */
 	  tree inner_vars = flatten_binds (inner_bind, true);
 	  gimple_seq inner_sequence = gimple_bind_body (inner_bind);
-	  gcc_assert (gimple_code (inner_sequence) != GIMPLE_BIND
-		      || top_level_omp_for_in_stmt (inner_sequence));
+	  if (flag_checking)
+	    {
+	      for (gimple_stmt_iterator inner_gsi = gsi_start (inner_sequence);
+		   !gsi_end_p (inner_gsi);
+		   gsi_next (&inner_gsi))
+		{
+		  gimple *inner_stmt = gsi_stmt (inner_gsi);
+		  gcc_assert (gimple_code (inner_stmt) != GIMPLE_BIND
+			      || top_level_omp_for_in_stmt (inner_stmt));
+		}
+	    }
 	  gimple_seq_add_seq (&new_body, inner_sequence);
 	  /* Find the last variable; we will append others to it.  */
 	  while (last_var != NULL && TREE_CHAIN (last_var) != NULL)
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
index 9e27d1fb9b5..82e7bd1495b 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
@@ -7,6 +7,7 @@
 
 /* Reduced from 'kernels-decompose-2.c'.
    (Hopefully) similar instances:
+     - 'kernels-decompose-ice-2.c'
      - 'libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c'
      - 'libgomp.oacc-c-c++-common/kernels-decompose-1.c'
 */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
index 839e6803851..569f87a59c9 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
@@ -2,7 +2,7 @@
 
 /* { dg-additional-options "-fopenacc-kernels=decompose" } */
 /* { dg-ice "TODO" }
-   { dg-prune-output "during GIMPLE pass: omp_oacc_kernels_decompose" } */
+   { dg-prune-output "during GIMPLE pass: omplower" } */
 
 /* Reduced from 'kernels-decompose-ice-1.c'.  */
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs
  2020-11-15  9:14   ` Tobias Burnus
@ 2021-04-19  8:29     ` Thomas Schwinge
  2021-04-19 12:38       ` Thomas Schwinge
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Schwinge @ 2021-04-19  8:29 UTC (permalink / raw)
  To: Tobias Burnus, gcc-patches, Jakub Jelinek, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 1488 bytes --]

Hi!

Would still like to get the following "cleanup" into GCC 11:

On 2020-11-15T09:14:32+0100, Tobias Burnus <burnus@net-b.de> wrote:
> regarding the new option:
>
> +fopenacc-kernels=
> +C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS)
> +-fopenacc-kernels=[decompose|parloops]       Specify mode of OpenACC 'kernels' constructs handling.
>
> and
>
> On 13.11.20 23:22, Thomas Schwinge wrote:
>> Not yet enabled by default: for now, the current mode of OpenACC 'kernels'
>> constructs handling still remains '-fopenacc-kernels=parloops', but that is to
>> change later.
>
> Do you intent that users will switch between those two options?
>
> Or is this new option only an interim solution until decompose is handled?
>
> If the latter, maybe using a --param makes more sense than keeping the later for ever.

Indeed the latter (had hoped to get more done in this GCC release cycle),
so thanks for suggesting to use a '--param', "subject to change without
notice in future releases".  Is that OK like in the attached "[OpenACC
'kernels'] '-fopenacc-kernels=[...]' -> '--param=openacc-kernels=[...]'"
(currently testing)?  (My first patch related to a '--param' -- how
exciting!)  ;-)


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-OpenACC-kernels-fopenacc-kernels-.-param-openacc-ker.patch --]
[-- Type: text/x-diff, Size: 15139 bytes --]

From ff83af3471c721575351c43b1e542256e1ae6229 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 19 Apr 2021 10:24:49 +0200
Subject: [PATCH] [OpenACC 'kernels'] '-fopenacc-kernels=[...]' ->
 '--param=openacc-kernels=[...]'

This configuration knob is temporary, and isn't really meant to be exposed to
users.

	gcc/
	* params.opt (-param=openacc-kernels=): Add.
	* omp-oacc-kernels-decompose.cc
	(pass_omp_oacc_kernels_decompose::gate): Use it.
	* doc/invoke.texi (-fopenacc-kernels=@var{mode}): Move...
	(--param): ... here, 'openacc-kernels'.
	gcc/c-family/
	* c.opt (fopenacc-kernels=): Remove.
	gcc/fortran/
	* lang.opt (fopenacc-kernels=): Remove.
	gcc/testsuite/
	* c-c++-common/goacc/if-clause-2.c: '-fopenacc-kernels=[...]' ->
	'--param=openacc-kernels=[...]'.
	* c-c++-common/goacc/kernels-decompose-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-ice-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
	* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
	* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	'-fopenacc-kernels=[...]' -> '--param=openacc-kernels=[...]'.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.
---
 gcc/c-family/c.opt                            | 13 ----------
 gcc/doc/invoke.texi                           | 24 +++++++++----------
 gcc/fortran/lang.opt                          |  4 ----
 gcc/omp-oacc-kernels-decompose.cc             |  2 +-
 gcc/params.opt                                | 13 ++++++++++
 .../c-c++-common/goacc/if-clause-2.c          |  2 +-
 .../c-c++-common/goacc/kernels-decompose-1.c  |  2 +-
 .../c-c++-common/goacc/kernels-decompose-2.c  |  2 +-
 .../goacc/kernels-decompose-ice-1.c           |  2 +-
 .../goacc/kernels-decompose-ice-2.c           |  2 +-
 .../gfortran.dg/goacc/kernels-decompose-1.f95 |  2 +-
 .../gfortran.dg/goacc/kernels-decompose-2.f95 |  2 +-
 .../gfortran.dg/goacc/kernels-tree.f95        |  2 +-
 .../declare-vla-kernels-decompose-ice-1.c     |  2 +-
 .../declare-vla-kernels-decompose.c           |  2 +-
 .../kernels-decompose-1.c                     |  2 +-
 .../libgomp.oacc-fortran/pr94358-1.f90        |  2 +-
 17 files changed, 37 insertions(+), 43 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index ed9a82599ef..3f8b72cdc00 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1873,19 +1873,6 @@ fopenacc-dim=
 C ObjC C++ ObjC++ LTO Joined Var(flag_openacc_dims)
 Specify default OpenACC compute dimensions.
 
-fopenacc-kernels=
-C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS)
--fopenacc-kernels=[decompose|parloops]	Specify mode of OpenACC 'kernels' constructs handling.
-
-Enum
-Name(openacc_kernels) Type(enum openacc_kernels)
-
-EnumValue
-Enum(openacc_kernels) String(decompose) Value(OPENACC_KERNELS_DECOMPOSE)
-
-EnumValue
-Enum(openacc_kernels) String(parloops) Value(OPENACC_KERNELS_PARLOOPS)
-
 fopenmp
 C ObjC C++ ObjC++ LTO Var(flag_openmp)
 Enable OpenMP (implies -frecursive in Fortran).
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 17551246477..390bebe7cf1 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -202,7 +202,7 @@ in the following sections.
 -aux-info @var{filename}  -fallow-parameterless-variadic-functions @gol
 -fno-asm  -fno-builtin  -fno-builtin-@var{function}  -fgimple@gol
 -fhosted  -ffreestanding @gol
--fopenacc  -fopenacc-dim=@var{geom}  -fopenacc-kernels=@var{mode} @gol
+-fopenacc  -fopenacc-dim=@var{geom} @gol
 -fopenmp  -fopenmp-simd @gol
 -fms-extensions  -fplan9-extensions  -fsso-struct=@var{endianness} @gol
 -fallow-single-precision  -fcond-mismatch  -flax-vector-conversions @gol
@@ -2619,18 +2619,6 @@ not explicitly specify.  The @var{geom} value is a triple of
 ':'-separated sizes, in order 'gang', 'worker' and, 'vector'.  A size
 can be omitted, to use a target-specific default value.
 
-@item -fopenacc-kernels=@var{mode}
-@opindex fopenacc-kernels
-@cindex OpenACC accelerator programming
-Specify mode of OpenACC `kernels' constructs handling.
-With @option{-fopenacc-kernels=decompose}, OpenACC `kernels'
-constructs are decomposed into parts, a sequence of compute
-constructs, each then handled individually.
-This is work in progress.
-With @option{-fopenacc-kernels=parloops}, OpenACC `kernels' constructs
-are handled by the @samp{parloops} pass, en bloc.
-This is the current default.
-
 @item -fopenmp
 @opindex fopenmp
 @cindex OpenMP parallel
@@ -14377,6 +14365,16 @@ The parameter is used only in GIMPLE FE.
 The maximum number of 'after supernode' exploded nodes within the analyzer
 per supernode, before terminating analysis.
 
+@item openacc-kernels
+Specify mode of OpenACC `kernels' constructs handling.
+With @option{--param=openacc-kernels=decompose}, OpenACC `kernels'
+constructs are decomposed into parts, a sequence of compute
+constructs, each then handled individually.
+This is work in progress.
+With @option{--param=openacc-kernels=parloops}, OpenACC `kernels'
+constructs are handled by the @samp{parloops} pass, en bloc.
+This is the current default.
+
 @end table
 
 The following choices of @var{name} are available on AArch64 targets:
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index 2b1977c523b..388ef8c0fdb 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -691,10 +691,6 @@ fopenacc-dim=
 Fortran LTO Joined Var(flag_openacc_dims)
 ; Documented in C
 
-fopenacc-kernels=
-Fortran RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS)
-; Documented in C
-
 fopenmp
 Fortran LTO
 ; Documented in C
diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc
index c624e26be88..4ba5758a906 100644
--- a/gcc/omp-oacc-kernels-decompose.cc
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -1527,7 +1527,7 @@ public:
   virtual bool gate (function *)
   {
     return (flag_openacc
-	    && flag_openacc_kernels == OPENACC_KERNELS_DECOMPOSE);
+	    && param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE);
   }
   virtual unsigned int execute (function *)
   {
diff --git a/gcc/params.opt b/gcc/params.opt
index 0dd9ac406eb..b516489bc8e 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -777,6 +777,19 @@ The minimum probability of reaching a source block for interblock speculative sc
 Common Joined UInteger Var(param_min_vect_loop_bound) Param Optimization
 If -ftree-vectorize is used, the minimal loop bound of a loop to be considered for vectorization.
 
+-param=openacc-kernels=
+Common Joined Enum(openacc_kernels) Var(param_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS) Param
+--param=openacc-kernels=[decompose|parloops]	Specify mode of OpenACC 'kernels' constructs handling.
+
+Enum
+Name(openacc_kernels) Type(enum openacc_kernels)
+
+EnumValue
+Enum(openacc_kernels) String(decompose) Value(OPENACC_KERNELS_DECOMPOSE)
+
+EnumValue
+Enum(openacc_kernels) String(parloops) Value(OPENACC_KERNELS_PARLOOPS)
+
 -param=parloops-chunk-size=
 Common Joined UInteger Var(param_parloops_chunk_size) Param Optimization
 Chunk size of omp schedule for loops parallelized by parloops.
diff --git a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
index 7bb115316e8..a48072509e1 100644
--- a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
@@ -1,5 +1,5 @@
 /* { dg-additional-options "-fdump-tree-gimple" } */
-/* { dg-additional-options "-fopenacc-kernels=decompose" }
+/* { dg-additional-options "--param=openacc-kernels=decompose" }
    { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" } */
 
 void
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
index e906443cceb..87219c88fac 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
@@ -2,7 +2,7 @@
 
 /* { dg-additional-options "-fopt-info-omp-all" } */
 /* { dg-additional-options "-fdump-tree-gimple" } */
-/* { dg-additional-options "-fopenacc-kernels=decompose" }
+/* { dg-additional-options "--param=openacc-kernels=decompose" }
    { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" } */
 
 /* See also '../../gfortran.dg/goacc/kernels-decompose-1.f95'.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
index ec0f75c4a61..3781e75d0f2 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
@@ -1,7 +1,7 @@
 /* Test OpenACC 'kernels' construct decomposition.  */
 
 /* { dg-additional-options "-fopt-info-omp-all" } */
-/* { dg-additional-options "-fopenacc-kernels=decompose" }
+/* { dg-additional-options "--param=openacc-kernels=decompose" }
 /* { dg-additional-options "-O2" } for 'parloops'.  */
 
 /* See also '../../gfortran.dg/goacc/kernels-decompose-2.f95'.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
index 82e7bd1495b..d770b91dd09 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
@@ -1,7 +1,7 @@
 /* Test OpenACC 'kernels' construct decomposition.  */
 
 /* { dg-additional-options "-fopt-info-omp-all" } */
-/* { dg-additional-options "-fopenacc-kernels=decompose" } */
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
 /* { dg-ice "TODO" }
    { dg-prune-output "during GIMPLE pass: omplower" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
index 569f87a59c9..ae059eb354b 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
@@ -1,6 +1,6 @@
 /* Test OpenACC 'kernels' construct decomposition.  */
 
-/* { dg-additional-options "-fopenacc-kernels=decompose" } */
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
 /* { dg-ice "TODO" }
    { dg-prune-output "during GIMPLE pass: omplower" } */
 
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
index 7e513f84083..e2523504ef5 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
@@ -2,7 +2,7 @@
 
 ! { dg-additional-options "-fopt-info-omp-all" }
 ! { dg-additional-options "-fdump-tree-gimple" }
-! { dg-additional-options "-fopenacc-kernels=decompose" }
+! { dg-additional-options "--param=openacc-kernels=decompose" }
 ! { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" }
 
 ! See also '../../c-c++-common/goacc/kernels-decompose-1.c'.
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
index 22f65e5c694..cc12b77817b 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
@@ -1,7 +1,7 @@
 ! Test OpenACC 'kernels' construct decomposition.
 
 ! { dg-additional-options "-fopt-info-omp-all" }
-! { dg-additional-options "-fopenacc-kernels=decompose" }
+! { dg-additional-options "--param=openacc-kernels=decompose" }
 ! { dg-additional-options "-O2" } for 'parloops'.
 
 ! See also '../../c-c++-common/goacc/kernels-decompose-2.c'.
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
index d01eee2fa5d..63ef7e17a9e 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
@@ -1,6 +1,6 @@
 ! { dg-do compile } 
 ! { dg-additional-options "-fdump-tree-original" } 
-! { dg-additional-options "-fopenacc-kernels=decompose" }
+! { dg-additional-options "--param=openacc-kernels=decompose" }
 ! { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" }
 
 program test
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
index c7eae12ec10..0777b612b63 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "-fopenacc-kernels=decompose" } */
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
 /* Hopefully, this is the same issue as '../../../gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c'.
    { dg-ice "TODO" }
    TODO { dg-prune-output "during GIMPLE pass: omplower" }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c
index dd8a1c1d294..0369ae91f14 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "-fopenacc-kernels=decompose" } */
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
 
 /* See also 'declare-vla-kernels-decompose-ice-1.c'.  */
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
index e76e4099f3a..dd83557b6aa 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
@@ -1,7 +1,7 @@
 /* Test OpenACC 'kernels' construct decomposition.  */
 
 /* { dg-additional-options "-fopt-info-omp-all" } */
-/* { dg-additional-options "-fopenacc-kernels=decompose" } */
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
 
 /* It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
    passed to 'incr' may be unset, and in that case, it will be set to [...]",
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
index 99a70418a4d..cf1d0e56927 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
@@ -1,6 +1,6 @@
 ! { dg-do run }
 ! { dg-additional-options "-fopt-info-omp-all" }
-! { dg-additional-options "-fopenacc-kernels=decompose" }
+! { dg-additional-options "--param=openacc-kernels=decompose" }
 
 ! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
 ! passed to 'incr' may be unset, and in that case, it will be set to [...]",
-- 
2.17.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs
  2021-04-19  8:29     ` Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs Thomas Schwinge
@ 2021-04-19 12:38       ` Thomas Schwinge
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Schwinge @ 2021-04-19 12:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: Tobias Burnus, Jakub Jelinek, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 1985 bytes --]

Hi!

On 2021-04-19T10:29:17+0200, Thomas Schwinge <thomas@codesourcery.com> wrote:
> On 2020-11-15T09:14:32+0100, Tobias Burnus <burnus@net-b.de> wrote:
>> regarding the new option:
>>
>> +fopenacc-kernels=
>> +C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS)
>> +-fopenacc-kernels=[decompose|parloops]       Specify mode of OpenACC 'kernels' constructs handling.
>>
>> and
>>
>> On 13.11.20 23:22, Thomas Schwinge wrote:
>>> Not yet enabled by default: for now, the current mode of OpenACC 'kernels'
>>> constructs handling still remains '-fopenacc-kernels=parloops', but that is to
>>> change later.
>>
>> Do you intent that users will switch between those two options?
>>
>> Or is this new option only an interim solution until decompose is handled?
>>
>> If the latter, maybe using a --param makes more sense than keeping the later for ever.
>
> Indeed the latter (had hoped to get more done in this GCC release cycle),
> so thanks for suggesting to use a '--param', "subject to change without
> notice in future releases".  Is that OK like in the attached "[OpenACC
> 'kernels'] '-fopenacc-kernels=[...]' -> '--param=openacc-kernels=[...]'"
> (currently testing)?  (My first patch related to a '--param' -- how
> exciting!)  ;-)

Tested fine, and Tobias approved privately.  As posted, I've pushed
"[OpenACC 'kernels'] '-fopenacc-kernels=[...]' ->
'--param=openacc-kernels=[...]'" to master branch in commit
3395dfc4da8ad1fccd346c62dfc9bd44b2b48c62, see attached.

(Well, no, actually not "as posted": I had forgotten to list one testcase
update in the ChangeLog snippet as "Likewise".  Looking forward to the
day when this strict GNU ChangeLog format is gone...)  %-)


Grüße
 Thomas


-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-OpenACC-kernels-fopenacc-kernels-.-param-openacc-ker.patch --]
[-- Type: text/x-diff, Size: 15212 bytes --]

From 3395dfc4da8ad1fccd346c62dfc9bd44b2b48c62 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 19 Apr 2021 10:24:49 +0200
Subject: [PATCH] [OpenACC 'kernels'] '-fopenacc-kernels=[...]' ->
 '--param=openacc-kernels=[...]'

This configuration knob is temporary, and isn't really meant to be exposed to
users.

	gcc/
	* params.opt (-param=openacc-kernels=): Add.
	* omp-oacc-kernels-decompose.cc
	(pass_omp_oacc_kernels_decompose::gate): Use it.
	* doc/invoke.texi (-fopenacc-kernels=@var{mode}): Move...
	(--param): ... here, 'openacc-kernels'.
	gcc/c-family/
	* c.opt (fopenacc-kernels=): Remove.
	gcc/fortran/
	* lang.opt (fopenacc-kernels=): Remove.
	gcc/testsuite/
	* c-c++-common/goacc/if-clause-2.c: '-fopenacc-kernels=[...]' ->
	'--param=openacc-kernels=[...]'.
	* c-c++-common/goacc/kernels-decompose-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-ice-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
	* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
	* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
	* gfortran.dg/goacc/kernels-tree.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	'-fopenacc-kernels=[...]' -> '--param=openacc-kernels=[...]'.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.
---
 gcc/c-family/c.opt                            | 13 ----------
 gcc/doc/invoke.texi                           | 24 +++++++++----------
 gcc/fortran/lang.opt                          |  4 ----
 gcc/omp-oacc-kernels-decompose.cc             |  2 +-
 gcc/params.opt                                | 13 ++++++++++
 .../c-c++-common/goacc/if-clause-2.c          |  2 +-
 .../c-c++-common/goacc/kernels-decompose-1.c  |  2 +-
 .../c-c++-common/goacc/kernels-decompose-2.c  |  2 +-
 .../goacc/kernels-decompose-ice-1.c           |  2 +-
 .../goacc/kernels-decompose-ice-2.c           |  2 +-
 .../gfortran.dg/goacc/kernels-decompose-1.f95 |  2 +-
 .../gfortran.dg/goacc/kernels-decompose-2.f95 |  2 +-
 .../gfortran.dg/goacc/kernels-tree.f95        |  2 +-
 .../declare-vla-kernels-decompose-ice-1.c     |  2 +-
 .../declare-vla-kernels-decompose.c           |  2 +-
 .../kernels-decompose-1.c                     |  2 +-
 .../libgomp.oacc-fortran/pr94358-1.f90        |  2 +-
 17 files changed, 37 insertions(+), 43 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index ed9a82599ef..3f8b72cdc00 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1873,19 +1873,6 @@ fopenacc-dim=
 C ObjC C++ ObjC++ LTO Joined Var(flag_openacc_dims)
 Specify default OpenACC compute dimensions.
 
-fopenacc-kernels=
-C ObjC C++ ObjC++ RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS)
--fopenacc-kernels=[decompose|parloops]	Specify mode of OpenACC 'kernels' constructs handling.
-
-Enum
-Name(openacc_kernels) Type(enum openacc_kernels)
-
-EnumValue
-Enum(openacc_kernels) String(decompose) Value(OPENACC_KERNELS_DECOMPOSE)
-
-EnumValue
-Enum(openacc_kernels) String(parloops) Value(OPENACC_KERNELS_PARLOOPS)
-
 fopenmp
 C ObjC C++ ObjC++ LTO Var(flag_openmp)
 Enable OpenMP (implies -frecursive in Fortran).
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 096cebc8562..8b70fdf580d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -202,7 +202,7 @@ in the following sections.
 -aux-info @var{filename}  -fallow-parameterless-variadic-functions @gol
 -fno-asm  -fno-builtin  -fno-builtin-@var{function}  -fgimple@gol
 -fhosted  -ffreestanding @gol
--fopenacc  -fopenacc-dim=@var{geom}  -fopenacc-kernels=@var{mode} @gol
+-fopenacc  -fopenacc-dim=@var{geom} @gol
 -fopenmp  -fopenmp-simd @gol
 -fms-extensions  -fplan9-extensions  -fsso-struct=@var{endianness} @gol
 -fallow-single-precision  -fcond-mismatch  -flax-vector-conversions @gol
@@ -2619,18 +2619,6 @@ not explicitly specify.  The @var{geom} value is a triple of
 ':'-separated sizes, in order 'gang', 'worker' and, 'vector'.  A size
 can be omitted, to use a target-specific default value.
 
-@item -fopenacc-kernels=@var{mode}
-@opindex fopenacc-kernels
-@cindex OpenACC accelerator programming
-Specify mode of OpenACC `kernels' constructs handling.
-With @option{-fopenacc-kernels=decompose}, OpenACC `kernels'
-constructs are decomposed into parts, a sequence of compute
-constructs, each then handled individually.
-This is work in progress.
-With @option{-fopenacc-kernels=parloops}, OpenACC `kernels' constructs
-are handled by the @samp{parloops} pass, en bloc.
-This is the current default.
-
 @item -fopenmp
 @opindex fopenmp
 @cindex OpenMP parallel
@@ -14376,6 +14364,16 @@ The parameter is used only in GIMPLE FE.
 The maximum number of 'after supernode' exploded nodes within the analyzer
 per supernode, before terminating analysis.
 
+@item openacc-kernels
+Specify mode of OpenACC `kernels' constructs handling.
+With @option{--param=openacc-kernels=decompose}, OpenACC `kernels'
+constructs are decomposed into parts, a sequence of compute
+constructs, each then handled individually.
+This is work in progress.
+With @option{--param=openacc-kernels=parloops}, OpenACC `kernels'
+constructs are handled by the @samp{parloops} pass, en bloc.
+This is the current default.
+
 @end table
 
 The following choices of @var{name} are available on AArch64 targets:
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index 2b1977c523b..388ef8c0fdb 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -691,10 +691,6 @@ fopenacc-dim=
 Fortran LTO Joined Var(flag_openacc_dims)
 ; Documented in C
 
-fopenacc-kernels=
-Fortran RejectNegative Joined Enum(openacc_kernels) Var(flag_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS)
-; Documented in C
-
 fopenmp
 Fortran LTO
 ; Documented in C
diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc
index c624e26be88..4ba5758a906 100644
--- a/gcc/omp-oacc-kernels-decompose.cc
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -1527,7 +1527,7 @@ public:
   virtual bool gate (function *)
   {
     return (flag_openacc
-	    && flag_openacc_kernels == OPENACC_KERNELS_DECOMPOSE);
+	    && param_openacc_kernels == OPENACC_KERNELS_DECOMPOSE);
   }
   virtual unsigned int execute (function *)
   {
diff --git a/gcc/params.opt b/gcc/params.opt
index 0dd9ac406eb..b516489bc8e 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -777,6 +777,19 @@ The minimum probability of reaching a source block for interblock speculative sc
 Common Joined UInteger Var(param_min_vect_loop_bound) Param Optimization
 If -ftree-vectorize is used, the minimal loop bound of a loop to be considered for vectorization.
 
+-param=openacc-kernels=
+Common Joined Enum(openacc_kernels) Var(param_openacc_kernels) Init(OPENACC_KERNELS_PARLOOPS) Param
+--param=openacc-kernels=[decompose|parloops]	Specify mode of OpenACC 'kernels' constructs handling.
+
+Enum
+Name(openacc_kernels) Type(enum openacc_kernels)
+
+EnumValue
+Enum(openacc_kernels) String(decompose) Value(OPENACC_KERNELS_DECOMPOSE)
+
+EnumValue
+Enum(openacc_kernels) String(parloops) Value(OPENACC_KERNELS_PARLOOPS)
+
 -param=parloops-chunk-size=
 Common Joined UInteger Var(param_parloops_chunk_size) Param Optimization
 Chunk size of omp schedule for loops parallelized by parloops.
diff --git a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
index 7bb115316e8..a48072509e1 100644
--- a/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/if-clause-2.c
@@ -1,5 +1,5 @@
 /* { dg-additional-options "-fdump-tree-gimple" } */
-/* { dg-additional-options "-fopenacc-kernels=decompose" }
+/* { dg-additional-options "--param=openacc-kernels=decompose" }
    { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" } */
 
 void
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
index e906443cceb..87219c88fac 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
@@ -2,7 +2,7 @@
 
 /* { dg-additional-options "-fopt-info-omp-all" } */
 /* { dg-additional-options "-fdump-tree-gimple" } */
-/* { dg-additional-options "-fopenacc-kernels=decompose" }
+/* { dg-additional-options "--param=openacc-kernels=decompose" }
    { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" } */
 
 /* See also '../../gfortran.dg/goacc/kernels-decompose-1.f95'.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
index ec0f75c4a61..3781e75d0f2 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
@@ -1,7 +1,7 @@
 /* Test OpenACC 'kernels' construct decomposition.  */
 
 /* { dg-additional-options "-fopt-info-omp-all" } */
-/* { dg-additional-options "-fopenacc-kernels=decompose" }
+/* { dg-additional-options "--param=openacc-kernels=decompose" }
 /* { dg-additional-options "-O2" } for 'parloops'.  */
 
 /* See also '../../gfortran.dg/goacc/kernels-decompose-2.f95'.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
index 82e7bd1495b..d770b91dd09 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
@@ -1,7 +1,7 @@
 /* Test OpenACC 'kernels' construct decomposition.  */
 
 /* { dg-additional-options "-fopt-info-omp-all" } */
-/* { dg-additional-options "-fopenacc-kernels=decompose" } */
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
 /* { dg-ice "TODO" }
    { dg-prune-output "during GIMPLE pass: omplower" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
index 569f87a59c9..ae059eb354b 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
@@ -1,6 +1,6 @@
 /* Test OpenACC 'kernels' construct decomposition.  */
 
-/* { dg-additional-options "-fopenacc-kernels=decompose" } */
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
 /* { dg-ice "TODO" }
    { dg-prune-output "during GIMPLE pass: omplower" } */
 
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
index 7e513f84083..e2523504ef5 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
@@ -2,7 +2,7 @@
 
 ! { dg-additional-options "-fopt-info-omp-all" }
 ! { dg-additional-options "-fdump-tree-gimple" }
-! { dg-additional-options "-fopenacc-kernels=decompose" }
+! { dg-additional-options "--param=openacc-kernels=decompose" }
 ! { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" }
 
 ! See also '../../c-c++-common/goacc/kernels-decompose-1.c'.
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
index 22f65e5c694..cc12b77817b 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
@@ -1,7 +1,7 @@
 ! Test OpenACC 'kernels' construct decomposition.
 
 ! { dg-additional-options "-fopt-info-omp-all" }
-! { dg-additional-options "-fopenacc-kernels=decompose" }
+! { dg-additional-options "--param=openacc-kernels=decompose" }
 ! { dg-additional-options "-O2" } for 'parloops'.
 
 ! See also '../../c-c++-common/goacc/kernels-decompose-2.c'.
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
index d01eee2fa5d..63ef7e17a9e 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-tree.f95
@@ -1,6 +1,6 @@
 ! { dg-do compile } 
 ! { dg-additional-options "-fdump-tree-original" } 
-! { dg-additional-options "-fopenacc-kernels=decompose" }
+! { dg-additional-options "--param=openacc-kernels=decompose" }
 ! { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" }
 
 program test
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
index c7eae12ec10..0777b612b63 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "-fopenacc-kernels=decompose" } */
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
 /* Hopefully, this is the same issue as '../../../gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c'.
    { dg-ice "TODO" }
    TODO { dg-prune-output "during GIMPLE pass: omplower" }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c
index dd8a1c1d294..0369ae91f14 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c
@@ -1,4 +1,4 @@
-/* { dg-additional-options "-fopenacc-kernels=decompose" } */
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
 
 /* See also 'declare-vla-kernels-decompose-ice-1.c'.  */
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
index e76e4099f3a..dd83557b6aa 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
@@ -1,7 +1,7 @@
 /* Test OpenACC 'kernels' construct decomposition.  */
 
 /* { dg-additional-options "-fopt-info-omp-all" } */
-/* { dg-additional-options "-fopenacc-kernels=decompose" } */
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
 
 /* It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
    passed to 'incr' may be unset, and in that case, it will be set to [...]",
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
index 99a70418a4d..cf1d0e56927 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
@@ -1,6 +1,6 @@
 ! { dg-do run }
 ! { dg-additional-options "-fopt-info-omp-all" }
-! { dg-additional-options "-fopenacc-kernels=decompose" }
+! { dg-additional-options "--param=openacc-kernels=decompose" }
 
 ! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
 ! passed to 'incr' may be unset, and in that case, it will be set to [...]",
-- 
2.30.2


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Enhance OpenACC 'kernels' decomposition testing (was: Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs)
  2020-11-13 22:22 ` Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs (was: [og8] OpenACC 'kernels' construct changes: splitting of the construct into several regions) Thomas Schwinge
  2020-11-15  9:14   ` Tobias Burnus
  2020-11-27 13:50   ` Thomas Schwinge
@ 2022-01-13  9:44   ` Thomas Schwinge
  2022-01-19 22:29   ` Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition [PR100400, PR103836, PR104061] " Thomas Schwinge
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 33+ messages in thread
From: Thomas Schwinge @ 2022-01-13  9:44 UTC (permalink / raw)
  To: gcc-patches, fortran

[-- Attachment #1: Type: text/plain, Size: 980 bytes --]

Hi!

On 2020-11-13T23:22:30+0100, I wrote:
> I've pushed to master branch [...] commit
> e898ce7997733c29dcab9c3c62ca102c7f9fa6eb "Decompose OpenACC 'kernels'
> constructs into parts, a sequence of compute constructs", see attached.
>
> On 2019-02-01T00:59:30+0100, I wrote:
>> There's more work to be done there, and we're aware of a number of TODO
>> items, but nevertheless: it's a good first step.
>
> That's still the case...  :-)

... and still is, but we're getting closer.

In preparation for a forthcoming ICE fix, I've pushed to master branch
commit 862e5f398b7e0a62460e8bc3fe4045e9da6cbf3b
"Enhance OpenACC 'kernels' decomposition testing", see attached.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Enhance-OpenACC-kernels-decomposition-testing.patch --]
[-- Type: text/x-diff, Size: 97418 bytes --]

From 862e5f398b7e0a62460e8bc3fe4045e9da6cbf3b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Mon, 20 Dec 2021 16:14:46 +0100
Subject: [PATCH] Enhance OpenACC 'kernels' decomposition testing

	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-1.c: Enhance.
	* c-c++-common/goacc/kernels-decompose-2.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-ice-1.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
	* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
	* gfortran.dg/goacc/kernels-decompose-2.f95: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	Enhance.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Likewise.
---
 .../c-c++-common/goacc/kernels-decompose-1.c  |  29 ++--
 .../c-c++-common/goacc/kernels-decompose-2.c  |  82 +++++++----
 .../goacc/kernels-decompose-ice-1.c           |   7 +-
 .../goacc/kernels-decompose-ice-2.c           |   6 +
 .../gfortran.dg/goacc/kernels-decompose-1.f95 |  29 ++--
 .../gfortran.dg/goacc/kernels-decompose-2.f95 |  68 ++++++---
 .../declare-vla-kernels-decompose-ice-1.c     |  14 ++
 .../declare-vla-kernels-decompose.c           |  23 ++++
 .../libgomp.oacc-c-c++-common/declare-vla.c   |  16 +++
 .../libgomp.oacc-c-c++-common/f-asyncwait-1.c | 129 +++++++++++++-----
 .../libgomp.oacc-c-c++-common/f-asyncwait-2.c |  70 ++++++++--
 .../libgomp.oacc-c-c++-common/f-asyncwait-3.c |  59 ++++++--
 .../kernels-decompose-1.c                     |  14 +-
 .../libgomp.oacc-fortran/asyncwait-1.f90      |  86 ++++++++++--
 .../libgomp.oacc-fortran/asyncwait-2.f90      |  47 ++++++-
 .../libgomp.oacc-fortran/asyncwait-3.f90      |  47 ++++++-
 .../libgomp.oacc-fortran/pr94358-1.f90        |  20 ++-
 17 files changed, 593 insertions(+), 153 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
index f549cbadfa7..e58bc179f30 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-1.c
@@ -1,10 +1,16 @@
 /* Test OpenACC 'kernels' construct decomposition.  */
 
 /* { dg-additional-options "-fopt-info-omp-all" } */
+
 /* { dg-additional-options "-fdump-tree-gimple" } */
+
 /* { dg-additional-options "--param=openacc-kernels=decompose" }
    { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" } */
 
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
    aspects of that functionality.  */
 
@@ -14,7 +20,7 @@
    passed to 'incr' may be unset, and in that case, it will be set to [...]",
    so to maintain compatibility with earlier Tcl releases, we manually
    initialize counter variables:
-   { dg-line l_dummy[variable c_loop_i 0] }
+   { dg-line l_dummy[variable c_compute 0 c_loop_i 0] }
    { dg-message "dummy" "" { target iN-VAl-Id } l_dummy } to avoid
    "WARNING: dg-line var l_dummy defined, but not used".  */
 
@@ -28,36 +34,43 @@ main (void)
   int i;
   unsigned int sum = 1;
 
-#pragma acc kernels copyin(a[0:N]) copy(sum)
-  /* { dg-bogus "optimized: assigned OpenACC seq loop parallelism" "TODO" { xfail *-*-* } .-1 }
+#pragma acc kernels copyin(a[0:N]) copy(sum) /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-note {variable 'sum\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
+  /* { dg-bogus {optimized: assigned OpenACC seq loop parallelism} TODO { xfail *-*-* } l_compute$c_compute }
      TODO Is this maybe the report that belongs to the XFAILed report further down?  */
   {
     #pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
-    /* { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
     /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
     for (i = 0; i < N; ++i)
       sum += a[i];
 
-    sum++; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+    sum++;
     a[0]++;
 
     #pragma acc loop independent /* { dg-line l_loop_i[incr c_loop_i] } */
-    /* { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
     /* { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
     for (i = 0; i < N; ++i)
       sum += a[i];
 
-    if (sum > 10) /* { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" } */
+    /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+    if (sum > 10)
       { 
         #pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
 	/* { dg-missed "unparallelized loop nest in OpenACC 'kernels' region: it's executed conditionally" "" { target *-*-* } l_loop_i$c_loop_i } */
+	/* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
 	/*TODO { dg-optimized "assigned OpenACC seq loop parallelism" "TODO" { xfail *-*-* } l_loop_i$c_loop_i } */
         for (i = 0; i < N; ++i)
           sum += a[i];
       }
 
     #pragma acc loop auto /* { dg-line l_loop_i[incr c_loop_i] } */
-    /* { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
     /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
     for (i = 0; i < N; ++i)
       sum += a[i];
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
index cdf85d4bafa..4dd55eb4680 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-2.c
@@ -1,9 +1,14 @@
 /* Test OpenACC 'kernels' construct decomposition.  */
 
 /* { dg-additional-options "-fopt-info-omp-all" } */
+
 /* { dg-additional-options "--param=openacc-kernels=decompose" }
 /* { dg-additional-options "-O2" } for 'parloops'.  */
 
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
 /* { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
    aspects of that functionality.  */
 
@@ -13,7 +18,7 @@
    passed to 'incr' may be unset, and in that case, it will be set to [...]",
    so to maintain compatibility with earlier Tcl releases, we manually
    initialize counter variables:
-   { dg-line l_dummy[variable c_loop_i 0 c_loop_j 0 c_loop_k 0 c_part 0] }
+   { dg-line l_dummy[variable c_compute 0 c_loop_i 0 c_loop_j 0 c_loop_k 0 c_part 0] }
    { dg-message "dummy" "" { target iN-VAl-Id } l_dummy } to avoid
    "WARNING: dg-line var l_dummy defined, but not used".  */
 
@@ -40,9 +45,11 @@ main ()
 #define N 10
   int a[N], b[N], c[N];
 
-#pragma acc kernels
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'x\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
   {
-    x = 0; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+    x = 0;
     y = x < 10;
     z = x++;
     ;
@@ -50,13 +57,17 @@ main ()
 
   { /*TODO Instead of using 'for (int i = 0; [...])', move 'int i' outside, to work around for ICE detailed in 'kernels-decompose-ice-1.c'.  */
     int i;
-#pragma acc kernels /* { dg-optimized "assigned OpenACC gang loop parallelism" } */
-  for (i = 0; i < N; i++) /* { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" } */
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-optimized {assigned OpenACC gang loop parallelism} {} { target *-*-* } l_compute$c_compute } */
+  /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+  for (i = 0; i < N; i++)
     a[i] = 0;
   }
 
 #pragma acc kernels loop /* { dg-line l_loop_i[incr c_loop_i] } */
-  /* { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
   /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
   for (int i = 0; i < N; i++)
     b[i] = a[N - i - 1];
@@ -64,44 +75,60 @@ main ()
 #pragma acc kernels
   {
 #pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
-    /* { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
     /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0; i < N; i++)
       b[i] = a[N - i - 1];
 
 #pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
-    /* { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
     /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0; i < N; i++)
       c[i] = a[i] * b[i];
 
-    a[z] = 0; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+    a[z] = 0;
 
 #pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
-    /* { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
     /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0; i < N; i++)
       c[i] += a[i];
 
 #pragma acc loop seq /* { dg-line l_loop_i[incr c_loop_i] } */
-    /* { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
     /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0 + 1; i < N; i++)
       c[i] += c[i - 1];
   }
 
-#pragma acc kernels
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-note {variable 'j' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
   /*TODO What does this mean?
-    TODO { dg-optimized "assigned OpenACC worker vector loop parallelism" "" { target *-*-* } .-2 } */
+    TODO { dg-optimized "assigned OpenACC worker vector loop parallelism" "" { target *-*-* } l_compute$c_compute } */
   {
 #pragma acc loop independent /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {variable 'j' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
     /* { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
-    /* { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0; i < N; ++i)
 #pragma acc loop independent /* { dg-line l_loop_j[incr c_loop_j] } */
+      /* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
+      /* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
       /* { dg-optimized "assigned OpenACC worker loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j } */
       for (int j = 0; j < N; ++j)
 #pragma acc loop independent /* { dg-line l_loop_k[incr c_loop_k] } */
+	/* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_k$c_loop_k } */
 	/* { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } l_loop_k$c_loop_k } */
 	/* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_k$c_loop_k } */
 	for (int k = 0; k < N; ++k)
@@ -111,23 +138,27 @@ main ()
 
     /*TODO Should the following turn into "gang-single" instead of "parloops"?
       TODO The problem is that the first STMT is 'if (y <= 4) goto <D.2547>; else goto <D.2548>;', thus "parloops".  */
-    if (y < 5) /* { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" } */
+    /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+    if (y < 5)
 #pragma acc loop independent /* { dg-line l_loop_j[incr c_loop_j] } */
+      /* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
       /* { dg-missed "unparallelized loop nest in OpenACC 'kernels' region: it's executed conditionally" "" { target *-*-* } l_loop_j$c_loop_j } */
       for (int j = 0; j < N; ++j)
 	b[j] = f_w (c[j]);
   }
 
-#pragma acc kernels
-  /* { dg-bogus "warning: region contains gang partitioned code but is not gang partitioned" "TODO 'kernels'" { xfail *-*-* } .-1 } */
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-bogus "warning: region contains gang partitioned code but is not gang partitioned" "TODO 'kernels'" { xfail *-*-* } l_compute$c_compute } */
   {
     y = f_g (a[5]); /* { dg-line l_part[incr c_part] } */
     /*TODO If such a construct is placed in its own part (like it is, here), can't this actually use gang paralelism, instead of "gang-single"?
-      { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" "" { target *-*-* } l_part$c_part } */
+      { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } l_part$c_part } */
     /* { dg-optimized "assigned OpenACC gang worker vector loop parallelism" "" { target *-*-* } l_part$c_part } */
 
 #pragma acc loop independent /* { dg-line l_loop_j[incr c_loop_j] } */
-    /* { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_j$c_loop_j } */
+    /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } l_loop_j$c_loop_j } */
+    /* { dg-note {variable 'j' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
+    /* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
     /* { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j } */
     for (int j = 0; j < N; ++j)
       b[j] = y + f_w (c[j]); /* { dg-optimized "assigned OpenACC worker vector loop parallelism" } */
@@ -135,18 +166,23 @@ main ()
 
 #pragma acc kernels
   {
-    y = 3; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+    y = 3;
 
 #pragma acc loop independent /* { dg-line l_loop_j[incr c_loop_j] } */
-    /* { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_j$c_loop_j } */
+    /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } l_loop_j$c_loop_j } */
+    /* { dg-note {variable 'j' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
+    /* { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j } */
     /* { dg-optimized "assigned OpenACC gang worker loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j } */
     for (int j = 0; j < N; ++j)
       b[j] = y + f_v (c[j]); /* { dg-optimized "assigned OpenACC vector loop parallelism" } */
 
-    z = 2; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+    z = 2;
   }
 
-#pragma acc kernels /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+#pragma acc kernels
   ;
 
   return 0;
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
index 8c3884bdc00..e83b451f2b8 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
@@ -1,10 +1,13 @@
 /* Test OpenACC 'kernels' construct decomposition.  */
 
 /* { dg-additional-options "-fopt-info-omp-all" } */
+
 /* { dg-additional-options "-fchecking --param=openacc-kernels=decompose" } */
 /* { dg-ice "TODO" }
    { dg-prune-output "during GIMPLE pass: omplower" } */
 
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
 /* Reduced from 'kernels-decompose-2.c'.
    (Hopefully) similar instances:
      - 'kernels-decompose-ice-2.c'
@@ -18,7 +21,9 @@ main ()
 #define N 10
 
 #pragma acc kernels
-  for (int i = 0; i < N; i++) /* { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" } */
+  /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+  /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+  for (int i = 0; i < N; i++)
     ;
 
   return 0;
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
index 8bf60a9a509..16af57d5f87 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
@@ -1,15 +1,21 @@
 /* Test OpenACC 'kernels' construct decomposition.  */
 
+/* { dg-additional-options "-fopt-info-omp-all" } */
+
 /* { dg-additional-options "-fchecking --param=openacc-kernels=decompose" } */
 /* { dg-ice "TODO" }
    { dg-prune-output "during GIMPLE pass: omplower" } */
 
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
 /* Reduced from 'kernels-decompose-ice-1.c'.  */
 
 int
 main ()
 {
 #pragma acc kernels
+  /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .-1 } */
+  /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
   {
     int i;
   }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
index ddaf7f8e43d..1a26844f96d 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1.f95
@@ -1,10 +1,16 @@
 ! Test OpenACC 'kernels' construct decomposition.
 
 ! { dg-additional-options "-fopt-info-omp-all" }
+
 ! { dg-additional-options "-fdump-tree-gimple" }
+
 ! { dg-additional-options "--param=openacc-kernels=decompose" }
 ! { dg-additional-options "-fdump-tree-omp_oacc_kernels_decompose" }
 
+! { dg-additional-options "--param=openacc-privatization=noisy" }
+! Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+! { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} }
+
 ! { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
 ! aspects of that functionality.
 
@@ -14,7 +20,7 @@
 ! passed to 'incr' may be unset, and in that case, it will be set to [...]",
 ! so to maintain compatibility with earlier Tcl releases, we manually
 ! initialize counter variables:
-! { dg-line l_dummy[variable c_loop_i 0] }
+! { dg-line l_dummy[variable c_compute 0 c_loop_i 0] }
 ! { dg-message "dummy" "" { target iN-VAl-Id } l_dummy } to avoid
 ! "WARNING: dg-line var l_dummy defined, but not used".
 
@@ -24,30 +30,36 @@ program main
   integer, dimension (1:N)   :: a
   integer                    :: i, sum
 
-  !$acc kernels copyin(a(1:N)) copy(sum)
-  ! { dg-bogus "optimized: assigned OpenACC seq loop parallelism" "TODO" { xfail *-*-* } .-1 }
+  !$acc kernels copyin(a(1:N)) copy(sum) ! { dg-line l_compute[incr c_compute] }
+  ! { dg-note {variable 'sum\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute }
+  ! { dg-bogus "optimized: assigned OpenACC seq loop parallelism" "TODO" { xfail *-*-* } l_compute$c_compute }
   !TODO Is this maybe the report that belongs to the XFAILed report further down?  */
 
   !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i }
   ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
     sum = sum + a(i)
   end do
 
-  sum = sum + 1 ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
+  ! { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 }
+  sum = sum + 1
   a(1) = a(1) + 1
 
   !$acc loop independent ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i }
   ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
     sum = sum + a(i)
   end do
 
-  if (sum .gt. 10) then ! { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" }
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 }
+  if (sum .gt. 10) then
     !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
     ! { dg-missed "unparallelized loop nest in OpenACC 'kernels' region: it's executed conditionally" "" { target *-*-* } l_loop_i$c_loop_i }
+    ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i }
     !TODO { dg-optimized "assigned OpenACC seq loop parallelism" "TODO" { xfail *-*-* } l_loop_i$c_loop_i }
     do i = 1, N
       sum = sum + a(i)
@@ -55,7 +67,8 @@ program main
   end if
 
   !$acc loop auto ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i }
   ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
     sum = sum + a(i)
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
index 238482b91a4..59990135c75 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-decompose-2.f95
@@ -1,9 +1,14 @@
 ! Test OpenACC 'kernels' construct decomposition.
 
 ! { dg-additional-options "-fopt-info-omp-all" }
+
 ! { dg-additional-options "--param=openacc-kernels=decompose" }
 ! { dg-additional-options "-O2" } for 'parloops'.
 
+! { dg-additional-options "--param=openacc-privatization=noisy" }
+! Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+! { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} }
+
 ! { dg-additional-options "-Wopenacc-parallelism" } for testing/documenting
 ! aspects of that functionality.
 
@@ -13,7 +18,7 @@
 ! passed to 'incr' may be unset, and in that case, it will be set to [...]",
 ! so to maintain compatibility with earlier Tcl releases, we manually
 ! initialize counter variables:
-! { dg-line l_dummy[variable c_loop_i 0 c_loop_j 0 c_loop_k 0 c_part 0] }
+! { dg-line l_dummy[variable c_compute 0 c_loop_i 0 c_loop_j 0 c_loop_k 0 c_part 0] }
 ! { dg-message "dummy" "" { target iN-VAl-Id } l_dummy } to avoid
 ! "WARNING: dg-line var l_dummy defined, but not used".
 
@@ -36,7 +41,8 @@ program main
   integer :: a(N), b(N), c(N)
 
   !$acc kernels
-  x = 0 ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
+  ! { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 }
+  x = 0
   y = 0
   y_l = x < 10
   z = x
@@ -44,14 +50,17 @@ program main
   ;
   !$acc end kernels
 
-  !$acc kernels ! { dg-optimized "assigned OpenACC gang loop parallelism" }
-  do i = 1, N ! { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" }
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+  ! { dg-optimized {assigned OpenACC gang loop parallelism} {} { target *-*-* } l_compute$c_compute }
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 }
+  do i = 1, N
      a(i) = 0
   end do
   !$acc end kernels
 
   !$acc kernels loop ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i }
   ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      b(i) = a(N - i + 1)
@@ -59,47 +68,55 @@ program main
 
   !$acc kernels
   !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i }
   ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      b(i) = a(N - i + 1)
   end do
 
   !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i }
   ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      c(i) = a(i) * b(i)
   end do
 
-  a(z) = 0 ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
+  ! { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 }
+  a(z) = 0
 
   !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i }
   ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      c(i) = c(i) + a(i)
   end do
 
   !$acc loop seq ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i }
   ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1 + 1, N
      c(i) = c(i) + c(i - 1)
   end do
   !$acc end kernels
 
-  !$acc kernels
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
   !TODO What does this mean?
-  !TODO { dg-optimized "assigned OpenACC worker vector loop parallelism" "" { target *-*-* } .-2 }
+  !TODO { dg-optimized "assigned OpenACC worker vector loop parallelism" "" { target *-*-* } l_compute$c_compute }
   !$acc loop independent ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i }
   ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      !$acc loop independent ! { dg-line l_loop_j[incr c_loop_j] }
+     ! { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j }
      ! { dg-optimized "assigned OpenACC worker loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j }
      do j = 1, N
         !$acc loop independent ! { dg-line l_loop_k[incr c_loop_k] }
+        ! { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_k$c_loop_k }
         ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } l_loop_k$c_loop_k }
         ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_k$c_loop_k }
         do k = 1, N
@@ -112,24 +129,27 @@ program main
 
   !TODO Should the following turn into "gang-single" instead of "parloops"?
   !TODO The problem is that the first STMT is 'if (y <= 4) goto <D.2547>; else goto <D.2548>;', thus "parloops".
-  if (y < 5) then ! { dg-message "note: beginning 'parloops' part in OpenACC 'kernels' region" }
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 }
+  if (y < 5) then
      !$acc loop independent ! { dg-line l_loop_j[incr c_loop_j] }
      ! { dg-missed "unparallelized loop nest in OpenACC 'kernels' region: it's executed conditionally" "" { target *-*-* } l_loop_j$c_loop_j }
+     ! { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j }
      do j = 1, N
         b(j) = f_w (c(j))
      end do
   end if
   !$acc end kernels
 
-  !$acc kernels
-  ! { dg-bogus "\[Ww\]arning: region contains gang partitioned code but is not gang partitioned" "TODO 'kernels'" { xfail *-*-* } .-1 }
+  !$acc kernels ! { dg-line l_compute[incr c_compute] }
+  ! { dg-bogus "\[Ww\]arning: region contains gang partitioned code but is not gang partitioned" "TODO 'kernels'" { xfail *-*-* } l_compute$c_compute }
   y = f_g (a(5)) ! { dg-line l_part[incr c_part] }
   !TODO If such a construct is placed in its own part (like it is, here), can't this actually use gang paralelism, instead of "gang-single"?
-  ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" "" { target *-*-* } l_part$c_part }
+  ! { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } l_part$c_part }
   ! { dg-optimized "assigned OpenACC gang worker vector loop parallelism" "" { target *-*-* } l_part$c_part }
 
   !$acc loop independent ! { dg-line l_loop_j[incr c_loop_j] }
-  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_j$c_loop_j }
+  ! { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } l_loop_j$c_loop_j }
+  ! { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j }
   ! { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j }
   do j = 1, N
      b(j) = y + f_w (c(j)) ! { dg-optimized "assigned OpenACC worker vector loop parallelism" }
@@ -137,18 +157,22 @@ program main
   !$acc end kernels
 
   !$acc kernels
-  y = 3 ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
+  ! { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 }
+  y = 3
 
   !$acc loop independent ! { dg-line l_loop_j[incr c_loop_j] }
-  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_j$c_loop_j }
+  ! { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } l_loop_j$c_loop_j }
+  ! { dg-note {variable 'j' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_j$c_loop_j }
   ! { dg-optimized "assigned OpenACC gang worker loop parallelism" "" { target *-*-* } l_loop_j$c_loop_j }
   do j = 1, N
      b(j) = y + f_v (c(j)) ! { dg-optimized "assigned OpenACC vector loop parallelism" }
   end do
 
-  z = 2 ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
+  ! { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 }
+  z = 2
   !$acc end kernels
 
-  !$acc kernels ! { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" }
+  ! { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 }
+  !$acc kernels
   !$acc end kernels  
 end program main
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
index 0777b612b63..a6eb82b8719 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
@@ -4,5 +4,19 @@
    TODO { dg-prune-output "during GIMPLE pass: omplower" }
    TODO { dg-do link } */
 
+/* { dg-additional-options "-fopt-info-omp-all" }
+   { dg-additional-options "-foffload=-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   { dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
 #undef KERNELS_DECOMPOSE_ICE_HACK
 #include "declare-vla.c"
+
+/* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } 27 } */
+
+/* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } 61 } */
+
+/* { dg-bogus {note: variable [^\n\r]+ candidate for adjusting OpenACC privatization level} {TODO 'data'} { xfail *-*-* } 42 } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c
index 0369ae91f14..142aceec9cd 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c
@@ -2,5 +2,28 @@
 
 /* See also 'declare-vla-kernels-decompose-ice-1.c'.  */
 
+/* { dg-additional-options "-fopt-info-omp-all" }
+   { dg-additional-options "-foffload=-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   { dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
 #define KERNELS_DECOMPOSE_ICE_HACK
 #include "declare-vla.c"
+
+/* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } 27 } */
+
+/* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } 61 } */
+
+/* { dg-bogus {note: variable [^\n\r]+ candidate for adjusting OpenACC privatization level} {TODO 'data'} { xfail *-*-* } 42 } */
+
+/* { dg-note {variable 'i\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } 58 }
+   { dg-note {variable 'N\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } 58 } */
+
+/* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target { ! __OPTIMIZE__ } } 24 }
+   { dg-optimized {assigned OpenACC gang loop parallelism} {} { target { __OPTIMIZE__ } } 24 } */
+
+/* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target { ! __OPTIMIZE__ } } 58 }
+   { dg-optimized {assigned OpenACC gang loop parallelism} {} { target { __OPTIMIZE__ } } 58 } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
index 3bd6331879d..4ce2e6d1f18 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
@@ -1,5 +1,13 @@
 /* Verify OpenACC 'declare' with VLAs.  */
 
+/* { dg-additional-options "-fopt-info-omp-all" }
+   { dg-additional-options "-foffload=-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   { dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
 #include <assert.h>
 
 
@@ -14,6 +22,8 @@ f (void)
     A[i] = -i;
 
 #pragma acc kernels
+  /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target { ! __OPTIMIZE__ } } .-1 }
+     { dg-optimized {assigned OpenACC gang loop parallelism} {} { target { __OPTIMIZE__ } } .-2 } */
   for (i = 0; i < N; i++)
     A[i] = i;
 
@@ -30,6 +40,7 @@ void
 f_data (void)
 {
 #pragma acc data
+  /* { dg-bogus {note: variable [^\n\r]+ candidate for adjusting OpenACC privatization level} {TODO 'data'} { xfail *-*-* } .-1 } */
   {
     int N = 1000;
     int i, A[N];
@@ -45,6 +56,8 @@ f_data (void)
 #endif
 
 # pragma acc kernels
+  /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target { ! __OPTIMIZE__ } } .-1 }
+     { dg-optimized {assigned OpenACC gang loop parallelism} {} { target { __OPTIMIZE__ } } .-2 } */
     for (i = 0; i < N; i++)
       A[i] = i;
 
@@ -65,3 +78,6 @@ main ()
 
   return 0;
 }
+
+
+/* { dg-note dummy "" { target n-on-e } } to disable 'prune_notes'.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c
index cf851707dc7..e4e58158cf7 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c
@@ -1,6 +1,25 @@
 /* { dg-do run } */
 
-/* Based on asyncwait-1.f90.  */
+/* Based on '../libgomp.oacc-fortran/asyncwait-1.f90'.  */
+
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
+/* TODO To avoid PR100280 ICE { dg-additional-options "--param=openacc-kernels=parloops" } */
+
+/* { dg-additional-options "-fopt-info-all-omp" }
+   { dg-additional-options "-foffload=-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   { dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable '[Di]\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+/* It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+   passed to 'incr' may be unset, and in that case, it will be set to [...]",
+   so to maintain compatibility with earlier Tcl releases, we manually
+   initialize counter variables:
+   { dg-line l_dummy[variable c_compute 0 c_loop_i 0] }
+   { dg-message "dummy" "" { target iN-VAl-Id } l_dummy } to avoid
+   "WARNING: dg-line var l_dummy defined, but not used".  */
 
 #include <stdlib.h>
 
@@ -26,8 +45,11 @@ main (void)
 #pragma acc data copy (a[0:N]) copy (b[0:N])
   {
 
-#pragma acc parallel async
-#pragma acc loop
+#pragma acc parallel async /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0; i < N; ++i)
       b[i] = a[i];
 
@@ -50,8 +72,11 @@ main (void)
 
 #pragma acc data copy (a[0:N]) copy (b[0:N])
   {
-#pragma acc parallel async (1)
-#pragma acc loop
+#pragma acc parallel async (1) /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0; i < N; ++i)
       b[i] = a[i];
 
@@ -75,17 +100,22 @@ main (void)
 #pragma acc data copy (a[0:N]) copy (b[0:N]) copy (c[0:N]) copy (d[0:N])
   {
 
-#pragma acc parallel async (1)
+#pragma acc parallel async (1) /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
     for (int i = 0; i < N; ++i)
       b[i] = (a[i] * a[i] * a[i]) / a[i];
 
-#pragma acc parallel async (1)
+#pragma acc parallel async (1) /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
     for (int i = 0; i < N; ++i)
       c[i] = (a[i] * 4) / a[i];
 
 
-#pragma acc parallel async (1)
-#pragma acc loop
+#pragma acc parallel async (1) /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0; i < N; ++i)
       d[i] = ((a[i] * a[i] + a[i]) / a[i]) - a[i];
 
@@ -116,23 +146,33 @@ main (void)
 #pragma acc data copy (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N])
   {
 
-#pragma acc parallel async (1)
+#pragma acc parallel async (1) /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
     for (int i = 0; i < N; ++i)
       b[i] = (a[i] * a[i] * a[i]) / a[i];
 
-#pragma acc parallel async (1)
-#pragma acc loop
+#pragma acc parallel async (1) /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0; i < N; ++i)
       c[i] = (a[i] * 4) / a[i];
 
-#pragma acc parallel async (1)
-#pragma acc loop
+#pragma acc parallel async (1) /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0; i < N; ++i)
       d[i] = ((a[i] * a[i] + a[i]) / a[i]) - a[i];
 
 
-#pragma acc parallel wait (1) async (1)
-#pragma acc loop
+#pragma acc parallel wait (1) async (1) /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0; i < N; ++i)
       e[i] = a[i] + b[i] + c[i] + d[i];
 
@@ -162,8 +202,11 @@ main (void)
 #pragma acc data copy (a[0:N]) copy (b[0:N])
   {
 
-#pragma acc kernels async
-#pragma acc loop
+#pragma acc kernels async /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+    /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0; i < N; ++i)
       b[i] = a[i];
 
@@ -186,8 +229,11 @@ main (void)
 
 #pragma acc data copy (a[0:N]) copy (b[0:N])
   {
-#pragma acc kernels async (1)
-#pragma acc loop
+#pragma acc kernels async (1) /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+    /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0; i < N; ++i)
       b[i] = a[i];
 
@@ -212,16 +258,25 @@ main (void)
 
 #pragma acc data copy (a[0:N]) copy (b[0:N]) copy (c[0:N]) copy (d[0:N])
   {
-#pragma acc kernels async (1)
+#pragma acc kernels async (1) /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+    /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target { ! __OPTIMIZE__ } } l_compute$c_compute }
+       { dg-optimized "assigned OpenACC gang loop parallelism" "" { target { __OPTIMIZE__ } } l_compute$c_compute } */
     for (int i = 0; i < N; ++i)
       b[i] = (a[i] * a[i] * a[i]) / a[i];
 
-#pragma acc kernels async (1)
+#pragma acc kernels async (1) /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+    /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target { ! __OPTIMIZE__ } } l_compute$c_compute }
+       { dg-optimized "assigned OpenACC gang loop parallelism" "" { target { __OPTIMIZE__ } } l_compute$c_compute } */
     for (int i = 0; i < N; ++i)
       c[i] = (a[i] * 4) / a[i];
 
-#pragma acc kernels async (1)
-#pragma acc loop
+#pragma acc kernels async (1) /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+    /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0; i < N; ++i)
       d[i] = ((a[i] * a[i] + a[i]) / a[i]) - a[i];
 
@@ -251,22 +306,34 @@ main (void)
 
 #pragma acc data copy (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N])
   {
-#pragma acc kernels async (1)
+#pragma acc kernels async (1) /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+    /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target { ! __OPTIMIZE__ } } l_compute$c_compute }
+       { dg-optimized "assigned OpenACC gang loop parallelism" "" { target { __OPTIMIZE__ } } l_compute$c_compute } */
     for (int i = 0; i < N; ++i)
       b[i] = (a[i] * a[i] * a[i]) / a[i];
 
-#pragma acc kernels async (1)
-#pragma acc loop
+#pragma acc kernels async (1) /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+    /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0; i < N; ++i)
       c[i] = (a[i] * 4) / a[i];
 
-#pragma acc kernels async (1)
-#pragma acc loop
+#pragma acc kernels async (1) /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+    /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0; i < N; ++i)
       d[i] = ((a[i] * a[i] + a[i]) / a[i]) - a[i];
 
-#pragma acc kernels wait (1) async (1)
-#pragma acc loop
+#pragma acc kernels wait (1) async (1) /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+    /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0; i < N; ++i)
       e[i] = a[i] + b[i] + c[i] + d[i];
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/f-asyncwait-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/f-asyncwait-2.c
index 5298e4c54f7..2dd7b5257be 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/f-asyncwait-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/f-asyncwait-2.c
@@ -1,6 +1,24 @@
 /* { dg-do run } */
 
-/* Based on asyncwait-2.f90.  */
+/* Based on '../libgomp.oacc-fortran/asyncwait-2.f90'.  */
+
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fopt-info-all-omp" }
+   { dg-additional-options "-foffload=-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   { dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable '[Di]\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+/* It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+   passed to 'incr' may be unset, and in that case, it will be set to [...]",
+   so to maintain compatibility with earlier Tcl releases, we manually
+   initialize counter variables:
+   { dg-line l_dummy[variable c_compute 0 c_loop_i 0] }
+   { dg-message "dummy" "" { target iN-VAl-Id } l_dummy } to avoid
+   "WARNING: dg-line var l_dummy defined, but not used".  */
 
 #include <stdlib.h>
 
@@ -15,18 +33,27 @@ main (void)
   b = (int *)malloc (N * sizeof (*b));
   c = (int *)malloc (N * sizeof (*c));
 
-#pragma acc parallel copy (a[0:N]) async (0)
-#pragma acc loop
+#pragma acc parallel copy (a[0:N]) async (0) /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+  /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
   for (int i = 0; i < N; ++i)
     a[i] = 1;
 
-#pragma acc parallel copy (b[0:N]) async (1)
-#pragma acc loop
+#pragma acc parallel copy (b[0:N]) async (1) /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+  /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
   for (int i = 0; i < N; ++i)
     b[i] = 1;
 
-#pragma acc parallel copy (a[0:N], b[0:N], c[0:N]) wait (0, 1)
-#pragma acc loop
+#pragma acc parallel copy (a[0:N], b[0:N], c[0:N]) wait (0, 1) /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+  /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
   for (int i = 0; i < N; ++i)
     c[i] = a[i] + b[i];
 
@@ -35,18 +62,35 @@ main (void)
       abort ();
 
 #if 1
-#pragma acc kernels copy (a[0:N]) async (0)
-#pragma acc loop
+#pragma acc kernels copy (a[0:N]) async (0) /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-note {variable 'a\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+  /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
   for (int i = 0; i < N; ++i)
     a[i] = 1;
 
-#pragma acc kernels copy (b[0:N]) async (1)
-#pragma acc loop
+#pragma acc kernels copy (b[0:N]) async (1) /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-note {variable 'b\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+  /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
   for (int i = 0; i < N; ++i)
     b[i] = 1;
 
-#pragma acc kernels copy (a[0:N], b[0:N], c[0:N]) wait (0, 1)
-#pragma acc loop
+#pragma acc kernels copy (a[0:N], b[0:N], c[0:N]) wait (0, 1) /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-note {variable 'a\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+  /* { dg-note {variable 'b\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+  /* { dg-note {variable 'c\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+  /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
   for (int i = 0; i < N; ++i)
     c[i] = a[i] + b[i];
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/f-asyncwait-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/f-asyncwait-3.c
index 319eea61dc7..9d35250b02c 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/f-asyncwait-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/f-asyncwait-3.c
@@ -1,6 +1,24 @@
 /* { dg-do run } */
 
-/* Based on asyncwait-3.f90.  */
+/* Based on '../libgomp.oacc-fortran/asyncwait-3.f90'.  */
+
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fopt-info-all-omp" }
+   { dg-additional-options "-foffload=-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   { dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable '[Di]\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+/* It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+   passed to 'incr' may be unset, and in that case, it will be set to [...]",
+   so to maintain compatibility with earlier Tcl releases, we manually
+   initialize counter variables:
+   { dg-line l_dummy[variable c_compute 0 c_loop_i 0] }
+   { dg-message "dummy" "" { target iN-VAl-Id } l_dummy } to avoid
+   "WARNING: dg-line var l_dummy defined, but not used".  */
 
 #include <stdlib.h>
 
@@ -15,20 +33,29 @@ main (void)
   b = (int *)malloc (N * sizeof (*b));
   c = (int *)malloc (N * sizeof (*c));
 
-#pragma acc parallel copy (a[0:N]) async (0)
-#pragma acc loop
+#pragma acc parallel copy (a[0:N]) async (0) /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+  /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
   for (int i = 0; i < N; ++i)
     a[i] = 1;
 
-#pragma acc parallel copy (b[0:N]) async (1)
-#pragma acc loop
+#pragma acc parallel copy (b[0:N]) async (1) /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+  /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
   for (int i = 0; i < N; ++i)
     b[i] = 1;
 
 #pragma acc wait (0, 1)
 
-#pragma acc parallel copy (a[0:N], b[0:N], c[0:N])
-#pragma acc loop
+#pragma acc parallel copy (a[0:N], b[0:N], c[0:N]) /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+  /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
   for (int i = 0; i < N; ++i)
     c[i] = a[i] + b[i];
 
@@ -37,19 +64,31 @@ main (void)
       abort ();
 
 #pragma acc kernels copy (a[0:N]) async (0)
-#pragma acc loop
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+  /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
   for (int i = 0; i < N; ++i)
     a[i] = 1;
 
 #pragma acc kernels copy (b[0:N]) async (1)
-#pragma acc loop
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+  /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
   for (int i = 0; i < N; ++i)
     b[i] = 1;
 
 #pragma acc wait (0, 1)
 
 #pragma acc kernels copy (a[0:N], b[0:N], c[0:N])
-#pragma acc loop
+#pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+  /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
+  /* { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
   for (int i = 0; i < N; ++i)
     c[i] = a[i] + b[i];
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
index e08cfa56e3c..b3b4c490f7f 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
@@ -1,11 +1,11 @@
 /* Test OpenACC 'kernels' construct decomposition.  */
 
-/* { dg-additional-options "-fopt-info-omp-all" } */
 /* { dg-additional-options "--param=openacc-kernels=decompose" } */
 
 /* { dg-additional-options "-fopt-info-all-omp" }
-   { dg-additional-options "--param=openacc-privatization=noisy" }
-   { dg-additional-options "-foffload=-fopt-info-all-omp" }
+   { dg-additional-options "-foffload=-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
    { dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
    for testing/documenting aspects of that functionality.  */
 
@@ -30,7 +30,8 @@ int main()
 
 #pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
   {
-    int c = 234; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+    int c = 234;
     /* { dg-note {variable 'c' declared in block is candidate for adjusting OpenACC privatization level} "" { target *-*-* } l_compute$c_compute }
        { dg-note {variable 'c\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_compute$c_compute } */
 
@@ -38,14 +39,15 @@ int main()
     (volatile int *) &c;
 
 #pragma acc loop independent gang /* { dg-line l_loop_i[incr c_loop_i] } */
-    /* { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } l_loop_i$c_loop_i } */
     /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
     /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i } */
     /* { dg-optimized "assigned OpenACC gang loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i } */
     for (int i = 0; i < N; ++i)
       b[i] = c;
 
-    a = c; /* { dg-message "note: beginning 'gang-single' part in OpenACC 'kernels' region" } */
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+    a = c;
   }
 
   for (int i = 0; i < N; ++i)
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90
index f027c31b4ea..9440cd7f1b5 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90
@@ -1,5 +1,25 @@
 ! { dg-do run }
 
+! See also '../libgomp.oacc-c-c++-common/f-asyncwait-1.c'.
+
+! { dg-additional-options "--param=openacc-kernels=decompose" } */
+
+! { dg-additional-options "-fopt-info-all-omp" }
+! { dg-additional-options "-foffload=-fopt-info-all-omp" } */
+
+! { dg-additional-options "--param=openacc-privatization=noisy" }
+! { dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
+! Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+! { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+! passed to 'incr' may be unset, and in that case, it will be set to [...]",
+! so to maintain compatibility with earlier Tcl releases, we manually
+! initialize counter variables:
+! { dg-line l_dummy[variable c_compute 0 c_loop_i 0] }
+! { dg-message "dummy" "" { target iN-VAl-Id } l_dummy } to avoid
+! "WARNING: dg-line var l_dummy defined, but not used".  */
+
 program asyncwait
   integer, parameter :: N = 64
   real, allocatable :: a(:), b(:), c(:), d(:), e(:)
@@ -17,7 +37,9 @@ program asyncwait
   !$acc data copy (a(1:N)) copy (b(1:N))
 
   !$acc parallel async
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      b(i) = a(i)
   end do
@@ -37,7 +59,9 @@ program asyncwait
   !$acc data copy (a(1:N)) copy (b(1:N))
 
   !$acc parallel async (1)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      b(i) = a(i)
   end do
@@ -71,7 +95,9 @@ program asyncwait
   !$acc end parallel
 
   !$acc parallel async (1)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      d(i) = ((a(i) * a(i) + a(i)) / a(i)) - a(i)
   end do
@@ -102,21 +128,27 @@ program asyncwait
   !$acc end parallel
 
   !$acc parallel async (1)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      c(i) = (a(i) * 4) / a(i)
   end do
   !$acc end parallel
 
   !$acc parallel async (1)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      d(i) = ((a(i) * a(i) + a(i)) / a(i)) - a(i)
   end do
   !$acc end parallel
 
   !$acc parallel wait (1) async (1)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      e(i) = a(i) + b(i) + c(i) + d(i)
   end do
@@ -139,7 +171,10 @@ program asyncwait
   !$acc data copy (a(1:N)) copy (b(1:N))
 
   !$acc kernels async
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      b(i) = a(i)
   end do
@@ -159,7 +194,10 @@ program asyncwait
   !$acc data copy (a(1:N)) copy (b(1:N))
 
   !$acc kernels async (1)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      b(i) = a(i)
   end do
@@ -180,20 +218,27 @@ program asyncwait
 
   !$acc data copy (a(1:N)) copy (b(1:N)) copy (c(1:N)) copy (d(1:N))
 
-  !$acc kernels async (1)
+  !$acc kernels async (1) ! { dg-line l_compute[incr c_compute] }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_compute$c_compute }
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 }
   do i = 1, N
      b(i) = (a(i) * a(i) * a(i)) / a(i)
   end do
   !$acc end kernels
 
-  !$acc kernels async (1)
+  !$acc kernels async (1) ! { dg-line l_compute[incr c_compute] }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_compute$c_compute }
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 }
   do i = 1, N
      c(i) = (a(i) * 4) / a(i)
   end do
   !$acc end kernels
 
   !$acc kernels async (1)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      d(i) = ((a(i) * a(i) + a(i)) / a(i)) - a(i)
   end do
@@ -217,28 +262,39 @@ program asyncwait
 
   !$acc data copy (a(1:N), b(1:N), c(1:N), d(1:N), e(1:N))
 
-  !$acc kernels async (1)
+  !$acc kernels async (1) ! { dg-line l_compute[incr c_compute] }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_compute$c_compute }
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 }
   do i = 1, N
      b(i) = (a(i) * a(i) * a(i)) / a(i)
   end do
   !$acc end kernels
 
   !$acc kernels async (1)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      c(i) = (a(i) * 4) / a(i)
   end do
   !$acc end kernels
 
   !$acc kernels async (1)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      d(i) = ((a(i) * a(i) + a(i)) / a(i)) - a(i)
   end do
   !$acc end kernels
 
   !$acc kernels wait (1) async (1)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
      e(i) = a(i) + b(i) + c(i) + d(i)
   end do
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90
index 7f5080a21b6..0cc07ad4d2a 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90
@@ -1,5 +1,25 @@
 ! { dg-do run }
 
+! See also '../libgomp.oacc-c-c++-common/f-asyncwait-2.c'.
+
+! { dg-additional-options "--param=openacc-kernels=decompose" } */
+
+! { dg-additional-options "-fopt-info-all-omp" }
+! { dg-additional-options "-foffload=-fopt-info-all-omp" } */
+
+! { dg-additional-options "--param=openacc-privatization=noisy" }
+! { dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
+! Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+! { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+! passed to 'incr' may be unset, and in that case, it will be set to [...]",
+! so to maintain compatibility with earlier Tcl releases, we manually
+! initialize counter variables:
+! { dg-line l_dummy[variable c_compute 0 c_loop_i 0] }
+! { dg-message "dummy" "" { target iN-VAl-Id } l_dummy } to avoid
+! "WARNING: dg-line var l_dummy defined, but not used".  */
+
 program asyncwait
   integer, parameter :: N = 64
   real, allocatable :: a(:), b(:), c(:)
@@ -10,21 +30,27 @@ program asyncwait
   allocate (c(N))
 
   !$acc parallel async (0)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
     a(i) = 1
   end do
   !$acc end parallel
 
   !$acc parallel async (1)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
     b(i) = 1
   end do
   !$acc end parallel
 
   !$acc parallel wait (0, 1)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
     c(i) = a(i) + b(i)
   end do
@@ -35,21 +61,30 @@ program asyncwait
   end do
 
   !$acc kernels async (0)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
     a(i) = 1
   end do
   !$acc end kernels
 
   !$acc kernels async (1)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
     b(i) = 1
   end do
   !$acc end kernels
 
   !$acc kernels wait (0, 1)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
     c(i) = a(i) + b(i)
   end do
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90 b/libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90
index 6d9ed0cf078..dbccec206da 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90
@@ -1,5 +1,25 @@
 ! { dg-do run }
 
+! See also '../libgomp.oacc-c-c++-common/f-asyncwait-3.c'.
+
+! { dg-additional-options "--param=openacc-kernels=decompose" } */
+
+! { dg-additional-options "-fopt-info-all-omp" }
+! { dg-additional-options "-foffload=-fopt-info-all-omp" } */
+
+! { dg-additional-options "--param=openacc-privatization=noisy" }
+! { dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
+! Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+! { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+! passed to 'incr' may be unset, and in that case, it will be set to [...]",
+! so to maintain compatibility with earlier Tcl releases, we manually
+! initialize counter variables:
+! { dg-line l_dummy[variable c_compute 0 c_loop_i 0] }
+! { dg-message "dummy" "" { target iN-VAl-Id } l_dummy } to avoid
+! "WARNING: dg-line var l_dummy defined, but not used".  */
+
 program asyncwait
   integer, parameter :: N = 64
   real, allocatable :: a(:), b(:), c(:)
@@ -10,14 +30,18 @@ program asyncwait
   allocate (c(N))
 
   !$acc parallel async (0)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
     a(i) = 1
   end do
   !$acc end parallel
 
   !$acc parallel async (1)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
     b(i) = 1
   end do
@@ -26,7 +50,9 @@ program asyncwait
   !$acc wait (0, 1)
 
   !$acc parallel
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
     c(i) = a(i) + b(i)
   end do
@@ -37,14 +63,20 @@ program asyncwait
   end do
 
   !$acc kernels async (0)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
     a(i) = 1
   end do
   !$acc end kernels
 
   !$acc kernels async (1)
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
     b(i) = 1
   end do
@@ -53,7 +85,10 @@ program asyncwait
   !$acc wait (0, 1)
 
   !$acc kernels
-  !$acc loop
+  !$acc loop ! { dg-line l_loop_i[incr c_loop_i] }
+  ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-optimized "assigned OpenACC seq loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = 1, N
     c(i) = a(i) + b(i)
   end do
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
index cf1d0e56927..6db25719a97 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/pr94358-1.f90
@@ -1,12 +1,20 @@
 ! { dg-do run }
+
 ! { dg-additional-options "-fopt-info-omp-all" }
+! { dg-additional-options "-foffload=-fopt-info-all-omp" }
+
 ! { dg-additional-options "--param=openacc-kernels=decompose" }
 
+! { dg-additional-options "--param=openacc-privatization=noisy" }
+! { dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
+! Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+! { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} }
+
 ! It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
 ! passed to 'incr' may be unset, and in that case, it will be set to [...]",
 ! so to maintain compatibility with earlier Tcl releases, we manually
 ! initialize counter variables:
-! { dg-line l_dummy[variable c_loop_i 0] }
+! { dg-line l_dummy[variable c_compute 0 c_loop_i 0] }
 ! { dg-message "dummy" "" { target iN-VAl-Id } l_dummy } to avoid
 ! "WARNING: dg-line var l_dummy defined, but not used".
 
@@ -15,15 +23,19 @@ subroutine kernel(lo, hi, a, b, c)
   integer :: lo, hi, i
   real, dimension(lo:hi) :: a, b, c
 
-  !$acc kernels copyin(lo, hi)
+  !$acc kernels copyin(lo, hi) ! { dg-line l_compute[incr c_compute] }
+  ! { dg-note {variable 'lo\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute }
+  ! { dg-note {variable 'hi\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute }
   !$acc loop independent ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i }
   ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = lo, hi
      b(i) = a(i)
   end do
   !$acc loop independent ! { dg-line l_loop_i[incr c_loop_i] }
-  ! { dg-message "note: parallelized loop nest in OpenACC 'kernels' region" "" { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } l_loop_i$c_loop_i }
+  ! { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i }
   ! { dg-optimized "assigned OpenACC gang vector loop parallelism" "" { target *-*-* } l_loop_i$c_loop_i }
   do i = lo, hi
      c(i) = b(i)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition [PR100400, PR103836, PR104061] (was: Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs)
  2020-11-13 22:22 ` Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs (was: [og8] OpenACC 'kernels' construct changes: splitting of the construct into several regions) Thomas Schwinge
                     ` (2 preceding siblings ...)
  2022-01-13  9:44   ` Enhance OpenACC 'kernels' decomposition testing (was: Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs) Thomas Schwinge
@ 2022-01-19 22:29   ` Thomas Schwinge
  2022-01-19 23:00     ` Jakub Jelinek
  2022-03-12 12:38   ` Add 'c-c++-common/goacc/kernels-decompose-pr104086-1.c' [PR104086] Thomas Schwinge
  2022-03-17  8:04   ` Enhance further testcases to verify Openacc 'kernels' decomposition Thomas Schwinge
  5 siblings, 1 reply; 33+ messages in thread
From: Thomas Schwinge @ 2022-01-19 22:29 UTC (permalink / raw)
  To: gcc-patches
  Cc: Frederik Harwath, Richard Biener, Jakub Jelinek, Arseny Solokha,
	Andrew Pinski

[-- Attachment #1: Type: text/plain, Size: 1308 bytes --]

Hi!

On 2020-11-13T23:22:30+0100, I wrote:
> I've pushed to master branch [...] commit
> e898ce7997733c29dcab9c3c62ca102c7f9fa6eb "Decompose OpenACC 'kernels'
> constructs into parts, a sequence of compute constructs", see attached.
>
> On 2019-02-01T00:59:30+0100, I wrote:
>> There's more work to be done there, and we're aware of a number of TODO
>> items, but nevertheless: it's a good first step.
>
> That's still the case...  :-)

(The pass is still disabled by default, by the way.)

We've found that 'gcc/omp-oacc-kernels-decompose.cc' is currently not at
all considerate of 'GIMPLE_DEBUG' statements -- and it's not always
straight forward how to handle these (not rocket science either; but
needs proper understanding and testing).

Actually fixing it is a separate task, but it seems prudent to at least
catch it, and document via a few test cases.  OK to push
"Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition
[PR100400, PR103836, PR104061]", see attached?


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Catch-GIMPLE_DEBUG-misbehavior-in-OpenACC-kernels-de.patch --]
[-- Type: text/x-diff, Size: 29014 bytes --]

From 568808ef7ccc97ebeae90bc7cb1aba6bd7659b24 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 19 Jan 2022 14:04:42 +0100
Subject: [PATCH] Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels'
 decomposition [PR100400, PR103836, PR104061]

Actually fixing it is a separate task, but it seems prudent to at least catch
it, and document via a few test cases.

	gcc/
	PR middle-end/100400
	PR middle-end/103836
	PR middle-end/104061
	* omp-oacc-kernels-decompose.cc (decompose_kernels_region_body):
	Catch 'GIMPLE_DEBUG'.
	gcc/testsuite/
	PR middle-end/100400
	PR middle-end/103836
	PR middle-end/104061
	* c-c++-common/goacc/kernels-decompose-pr100400-1-1.c: New.
	* c-c++-common/goacc/kernels-decompose-pr100400-1-2.c: New.
	* c-c++-common/goacc/kernels-decompose-pr100400-1-3.c: New.
	* c-c++-common/goacc/kernels-decompose-pr100400-1-4.c: New.
	* c-c++-common/goacc/kernels-decompose-pr103836-1-1.c: New.
	* c-c++-common/goacc/kernels-decompose-pr103836-1-2.c: New.
	* c-c++-common/goacc/kernels-decompose-pr103836-1-3.c: New.
	* c-c++-common/goacc/kernels-decompose-pr103836-1-4.c: New.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-1.c: New.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: New.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: New.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: New.
---
 gcc/omp-oacc-kernels-decompose.cc             | 10 +++++
 .../goacc/kernels-decompose-pr100400-1-1.c    | 33 ++++++++++++++
 .../goacc/kernels-decompose-pr100400-1-2.c    | 40 +++++++++++++++++
 .../goacc/kernels-decompose-pr100400-1-3.c    | 42 ++++++++++++++++++
 .../goacc/kernels-decompose-pr100400-1-4.c    | 40 +++++++++++++++++
 .../goacc/kernels-decompose-pr103836-1-1.c    | 26 +++++++++++
 .../goacc/kernels-decompose-pr103836-1-2.c    | 29 +++++++++++++
 .../goacc/kernels-decompose-pr103836-1-3.c    | 30 +++++++++++++
 .../goacc/kernels-decompose-pr103836-1-4.c    | 30 +++++++++++++
 .../goacc/kernels-decompose-pr104061-1-1.c    | 30 +++++++++++++
 .../goacc/kernels-decompose-pr104061-1-2.c    | 33 ++++++++++++++
 .../goacc/kernels-decompose-pr104061-1-3.c    | 43 +++++++++++++++++++
 .../goacc/kernels-decompose-pr104061-1-4.c    | 41 ++++++++++++++++++
 13 files changed, 427 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-2.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-3.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-4.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-2.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-3.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-4.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-2.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-3.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-4.c

diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc
index 21872db3ed3..98eafdbe3a1 100644
--- a/gcc/omp-oacc-kernels-decompose.cc
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -1255,6 +1255,16 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
       gsi_next (&gsi_n);
 
       gimple *stmt = gsi_stmt (gsi);
+      if (gimple_code (stmt) == GIMPLE_DEBUG)
+	{
+	  if (flag_compare_debug_opt || flag_compare_debug)
+	    /* Let the usual '-fcompare-debug' analysis bail out, as
+	       necessary.  */
+	    ;
+	  else
+	    sorry_at (loc, "%qs not yet supported",
+		      gimple_code_name[gimple_code (stmt)]);
+	}
       gimple *omp_for = top_level_omp_for_in_stmt (stmt);
       bool is_unconditional_oacc_for_loop = false;
       if (omp_for != NULL)
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-1.c
new file mode 100644
index 00000000000..f63800514c4
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-1.c
@@ -0,0 +1,33 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-g0" } */
+/* { dg-additional-options "-O1" } */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+int *p;
+
+void
+foo (void)
+{
+#pragma acc kernels
+  /* { dg-note {variable 'c' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } .-1 } */
+  /* { dg-note {variable 'c\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
+  {
+    int c;
+
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+    p = &c;
+
+    /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+#pragma acc loop independent
+    /* { dg-note {variable 'c\.0' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+    /* { dg-note {variable 'c' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } .-2 }
+       { dg-note {variable 'c' ought to be adjusted for OpenACC privatization level: 'vector'} {} { target *-*-* } .-3 } */
+    /* { dg-optimized {assigned OpenACC gang vector loop parallelism} {} { target *-*-* } .-4 } */
+    for (c = 0; c < 1; ++c)
+      ;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-2.c
new file mode 100644
index 00000000000..1eee3b07a75
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-2.c
@@ -0,0 +1,40 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fchecking" }
+   { dg-ice TODO { c++ } }
+   { dg-prune-output "during GIMPLE pass: omp_oacc_kernels_decompose" } */
+
+/* { dg-additional-options "-g" } */
+/* { dg-additional-options "-O1" } so that we may get some 'GIMPLE_DEBUG's.  */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+int *p;
+
+void
+foo (void)
+{
+  /* { dg-bogus {sorry, unimplemented: 'gimple_debug' not yet supported} TODO { xfail *-*-* } .+1 } */
+#pragma acc kernels
+  /* { dg-note {variable 'c' declared in block is candidate for adjusting OpenACC privatization level} {} { xfail *-*-* } .-1 } */
+  /* { dg-note {variable 'c\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail *-*-* } .-2 } */
+  {
+    /* { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c++ } .-1 }
+       { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c } .+1 } */
+    int c;
+
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { xfail *-*-* } .+1 } */
+    p = &c;
+
+    /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { xfail c++ } .+1 } */
+#pragma acc loop independent
+    /* { dg-note {variable 'c\.0' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail *-*-* } .-1 } */
+    /* { dg-note {variable 'c' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { xfail *-*-* } .-2 }
+       { dg-note {variable 'c' ought to be adjusted for OpenACC privatization level: 'vector'} {} { xfail *-*-* } .-3 } */
+    /* { dg-optimized {assigned OpenACC gang vector loop parallelism} {} { xfail *-*-* } .-4 } */
+    for (c = 0; c < 1; ++c)
+      ;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-3.c
new file mode 100644
index 00000000000..dce4e399fbe
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-3.c
@@ -0,0 +1,42 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fchecking" }
+   { dg-ice TODO { c++ } }
+   { dg-prune-output "during GIMPLE pass: omp_oacc_kernels_decompose" } */
+
+/* { dg-additional-options "-fcompare-debug" } -- w/o debug compiled first.
+   { dg-bogus {error: during '-fcompare-debug' recompilation} TODO { xfail c++ } 0 }
+   { dg-bogus {error: [^\n\r]+: '-fcompare-debug' failure \(length\)} TODO { xfail c++ } 0 } */
+/* { dg-additional-options "-O1" } so that we may get some 'GIMPLE_DEBUG's.  */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+int *p;
+
+void
+foo (void)
+{
+  /* { dg-bogus {sorry, unimplemented: 'gimple_debug' not yet supported} {} { target *-*-* } .+1 } suppressed via '-fcompare-debug'.  */
+#pragma acc kernels
+  /* { dg-note {variable 'c' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } .-1 } */
+  /* { dg-note {variable 'c\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
+  {
+    /* { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c++ } .-1 }
+       { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c } .+1 } */
+    int c;
+
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+    p = &c;
+
+    /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+#pragma acc loop independent
+    /* { dg-note {variable 'c\.0' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+    /* { dg-note {variable 'c' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } .-2 }
+       { dg-note {variable 'c' ought to be adjusted for OpenACC privatization level: 'vector'} {} { target *-*-* } .-3 } */
+    /* { dg-optimized {assigned OpenACC gang vector loop parallelism} {} { target *-*-* } .-4 } */
+    for (c = 0; c < 1; ++c)
+      ;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-4.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-4.c
new file mode 100644
index 00000000000..7ca4440d075
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-4.c
@@ -0,0 +1,40 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fchecking" }
+   { dg-ice TODO { c++ } }
+   { dg-prune-output "during GIMPLE pass: omp_oacc_kernels_decompose" } */
+
+/* { dg-additional-options "-g -fcompare-debug" } -- w/ debug compiled first.  */
+/* { dg-additional-options "-O1" } so that we may get some 'GIMPLE_DEBUG's.  */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+int *p;
+
+void
+foo (void)
+{
+  /* { dg-bogus {sorry, unimplemented: 'gimple_debug' not yet supported} {} { target *-*-* } .+1 } suppressed via '-fcompare-debug'.  */
+#pragma acc kernels
+  /* { dg-note {variable 'c' declared in block is candidate for adjusting OpenACC privatization level} {} { xfail c++ } .-1 } */
+  /* { dg-note {variable 'c\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail c++ } .-2 } */
+  {
+    /* { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c++ } .-1 }
+       { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c } .+1 } */
+    int c;
+
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { xfail c++ } .+1 } */
+    p = &c;
+
+    /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { xfail c++ } .+1 } */
+#pragma acc loop independent
+    /* { dg-note {variable 'c\.0' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail c++ } .-1 } */
+    /* { dg-note {variable 'c' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { xfail c++ } .-2 }
+       { dg-note {variable 'c' ought to be adjusted for OpenACC privatization level: 'vector'} {} { xfail c++ } .-3 } */
+    /* { dg-optimized {assigned OpenACC gang vector loop parallelism} {} { xfail c++ } .-4 } */
+    for (c = 0; c < 1; ++c)
+      ;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-1.c
new file mode 100644
index 00000000000..46ca0c99d2f
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-1.c
@@ -0,0 +1,26 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-g0" } */
+/* { dg-additional-options "-O1" } */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+extern int i;
+
+void
+f_acc_kernels (void)
+{
+#pragma acc kernels
+  /* { dg-note {variable 'i\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+  {
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop
+    /* { dg-note {variable 'i\.0' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } .-3 } */
+    for (i = 0; i < 2; ++i)
+      ;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-2.c
new file mode 100644
index 00000000000..e0f24cee2db
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-2.c
@@ -0,0 +1,29 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-g" } */
+/* { dg-additional-options "-O1" } so that we may get some 'GIMPLE_DEBUG's.  */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+extern int i;
+
+void
+f_acc_kernels (void)
+{
+  /* { dg-bogus {sorry, unimplemented: 'gimple_debug' not yet supported} TODO { xfail c++ } .+1 } */
+#pragma acc kernels
+  /* { dg-note {variable 'i\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail c++ } .-1 } */
+  {
+    /* { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c++ } .-1 } */
+
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop
+    /* { dg-note {variable 'i\.0' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail c++ } .-1 } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail c++ } .-2 } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { xfail c++ } .-3 } */
+    for (i = 0; i < 2; ++i)
+      ;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-3.c
new file mode 100644
index 00000000000..cbf1b7c3e25
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-3.c
@@ -0,0 +1,30 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fcompare-debug" } -- w/o debug compiled first.
+   { dg-bogus {error: [^\n\r]+: '-fcompare-debug' failure \(length\)} TODO { xfail c++ } 0 } */
+/* { dg-additional-options "-O1" } so that we may get some 'GIMPLE_DEBUG's.  */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+extern int i;
+
+void
+f_acc_kernels (void)
+{
+  /* { dg-bogus {sorry, unimplemented: 'gimple_debug' not yet supported} {} { target *-*-* } .+1 } */
+#pragma acc kernels
+  /* { dg-note {variable 'i\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+  {
+    /* { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c++ } .-1 } */
+
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop
+    /* { dg-note {variable 'i\.0' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } .-3 } */
+    for (i = 0; i < 2; ++i)
+      ;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-4.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-4.c
new file mode 100644
index 00000000000..21bbe37723f
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-4.c
@@ -0,0 +1,30 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-g -fcompare-debug" } -- w/ debug compiled first.
+   { dg-bogus {error: [^\n\r]+: '-fcompare-debug' failure \(length\)} TODO { xfail c++ } 0 } */
+/* { dg-additional-options "-O1" } so that we may get some 'GIMPLE_DEBUG's.  */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+extern int i;
+
+void
+f_acc_kernels (void)
+{
+  /* { dg-bogus {sorry, unimplemented: 'gimple_debug' not yet supported} {} { target *-*-* } .+1 } */
+#pragma acc kernels
+  /* { dg-note {variable 'i\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+  {
+    /* { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c++ } .-1 } */
+
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop
+    /* { dg-note {variable 'i\.0' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } .-3 } */
+    for (i = 0; i < 2; ++i)
+      ;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-1.c
new file mode 100644
index 00000000000..a58fce33426
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-1.c
@@ -0,0 +1,30 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-g0" } */
+/* { dg-additional-options "-O1" } */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+int arr_0;
+
+void
+foo (void)
+{
+#pragma acc kernels
+  /* { dg-note {variable 'arr_0\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+  {
+    int k;
+
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop
+    /* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+    /* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } .-3 } */
+    for (k = 0; k < 2; k++)
+      arr_0 += k;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-2.c
new file mode 100644
index 00000000000..d66dee6f8a7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-2.c
@@ -0,0 +1,33 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-g" } */
+/* { dg-additional-options "-O1" } so that we may get some 'GIMPLE_DEBUG's.  */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+int arr_0;
+
+void
+foo (void)
+{
+  /* { dg-bogus {sorry, unimplemented: 'gimple_debug' not yet supported} TODO { xfail *-*-* } .+1 } */
+#pragma acc kernels
+  /* { dg-note {variable 'arr_0\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail *-*-* } .-1 } */
+  {
+    /* { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c++ } .-1 }
+       { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c } .+1 } */
+    int k;
+
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop
+    /* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail *-*-* } .-1 } */
+    /* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail *-*-* } .-2 } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { xfail *-*-* } .-3 } */
+    for (k = 0; k < 2; k++)
+      arr_0 += k;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-3.c
new file mode 100644
index 00000000000..20c84e2f3db
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-3.c
@@ -0,0 +1,43 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fchecking" }
+   { dg-ice TODO }
+   { dg-prune-output {D\.[0-9]+ = arr_0\.0 \+ k;} }
+   { dg-prune-output {during GIMPLE pass: lower} } */
+
+/* { dg-additional-options "-fcompare-debug" } -- w/o debug compiled first.
+   { dg-bogus {error: during '-fcompare-debug' recompilation} TODO { xfail *-*-* } 0 }
+   { dg-bogus {error: [^\n\r]+: '-fcompare-debug' failure \(length\)} TODO { xfail *-*-* } 0 } */
+/* { dg-additional-options "-O1" } so that we may get some 'GIMPLE_DEBUG's.  */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+int arr_0;
+
+void
+foo (void)
+{
+  /* { dg-bogus {sorry, unimplemented: 'gimple_debug' not yet supported} {} { target *-*-* } .+1 } suppressed via '-fcompare-debug'.  */
+#pragma acc kernels
+  /* { dg-bogus {note: variable 'k' declared in block is candidate for adjusting OpenACC privatization level} {w/ debug} { xfail *-*-* } .-1 } */
+  /* { dg-note {variable 'arr_0\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
+  {
+    /* { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c++ } .-1 }
+       { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c } .+1 } */
+    int k;
+
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop
+    /* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+    /* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
+    /* { dg-bogus {note: variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {w/ debug} { xfail *-*-* } .-3 } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } .-4 } */
+    for (k = 0; k < 2; k++)
+      arr_0 += k;
+      /* { dg-bogus {error: invalid operands in binary operation} {w/ debug} { xfail *-*-* } .-1 } */
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-4.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-4.c
new file mode 100644
index 00000000000..6b6effe1791
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-4.c
@@ -0,0 +1,41 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fchecking" }
+   { dg-ice TODO }
+   { dg-prune-output {D\.[0-9]+ = arr_0\.0 \+ k;} }
+   { dg-prune-output {during GIMPLE pass: lower} } */
+
+/* { dg-additional-options "-g -fcompare-debug" } -- w/ debug compiled first.  */
+/* { dg-additional-options "-O1" } so that we may get some 'GIMPLE_DEBUG's.  */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+int arr_0;
+
+void
+foo (void)
+{
+  /* { dg-bogus {sorry, unimplemented: 'gimple_debug' not yet supported} {} { target *-*-* } .+1 } suppressed via '-fcompare-debug'.  */
+#pragma acc kernels
+  /* { dg-bogus {note: variable 'k' declared in block is candidate for adjusting OpenACC privatization level} {w/ debug} { xfail *-*-* } .-1 } */
+  /* { dg-note {variable 'arr_0\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
+  {
+    /* { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c++ } .-1 }
+       { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c } .+1 } */
+    int k;
+
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop
+    /* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail *-*-* } .-1 } */
+    /* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail *-*-* } .-2 } */
+    /* { dg-bogus {note: variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {w/ debug} { xfail *-*-* } .-3 } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { xfail *-*-* } .-4 } */
+    for (k = 0; k < 2; k++)
+      arr_0 += k;
+      /* { dg-bogus {error: invalid operands in binary operation} {w/ debug} { xfail *-*-* } .-1 } */
+  }
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition [PR100400, PR103836, PR104061] (was: Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs)
  2022-01-19 22:29   ` Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition [PR100400, PR103836, PR104061] " Thomas Schwinge
@ 2022-01-19 23:00     ` Jakub Jelinek
  2022-01-20  8:26       ` Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition [PR100400, PR103836, PR104061] Thomas Schwinge
  0 siblings, 1 reply; 33+ messages in thread
From: Jakub Jelinek @ 2022-01-19 23:00 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: gcc-patches, Frederik Harwath, Richard Biener, Arseny Solokha,
	Andrew Pinski

On Wed, Jan 19, 2022 at 11:29:18PM +0100, Thomas Schwinge wrote:
> (The pass is still disabled by default, by the way.)
> 
> We've found that 'gcc/omp-oacc-kernels-decompose.cc' is currently not at
> all considerate of 'GIMPLE_DEBUG' statements -- and it's not always
> straight forward how to handle these (not rocket science either; but
> needs proper understanding and testing).

The general rule is that debug stmts shouldn't affect code generation
decisions, so when deciding what to optimize/how, they should be ignored,
and during actual transformation adjusted or worst case reset as needed.

> Actually fixing it is a separate task, but it seems prudent to at least
> catch it, and document via a few test cases.  OK to push
> "Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition
> [PR100400, PR103836, PR104061]", see attached?

> --- a/gcc/omp-oacc-kernels-decompose.cc
> +++ b/gcc/omp-oacc-kernels-decompose.cc
> @@ -1255,6 +1255,16 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
>        gsi_next (&gsi_n);
>  
>        gimple *stmt = gsi_stmt (gsi);
> +      if (gimple_code (stmt) == GIMPLE_DEBUG)
> +	{
> +	  if (flag_compare_debug_opt || flag_compare_debug)
> +	    /* Let the usual '-fcompare-debug' analysis bail out, as
> +	       necessary.  */
> +	    ;
> +	  else
> +	    sorry_at (loc, "%qs not yet supported",
> +		      gimple_code_name[gimple_code (stmt)]);
> +	}

This is wrong.  It shouldn't be dependent on flag_compare_debug* options,
those are just debugging aids to verify that -g/-g0 don't affect code
generation.  With the above you'd pretend they don't, but they actually
would (with -g you'd get sorry, without it it would compile fine).

If this code is analysing whether the kernels region body should be
decomposed or not, it should be if (is_gimple_debug (stmt)) continue;
or whatever else to just ignore them (in some opts already during analysis
phase we remember they are present and something about them, but not in
a way that would actually affect the code generation decisions).
And then when actually transforming it, it depends on what transformations
are done to the variables/values referenced in the debug stmts.
gimple_debug_bind_reset_value (stmt); update_stmt (stmt); is
what resets them and can be used as last resort, it will keep saying
that it describes some var, but will say that the var is optimized out.

	Jakub


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition [PR100400, PR103836, PR104061]
  2022-01-19 23:00     ` Jakub Jelinek
@ 2022-01-20  8:26       ` Thomas Schwinge
  2022-01-20  9:58         ` Jakub Jelinek
  0 siblings, 1 reply; 33+ messages in thread
From: Thomas Schwinge @ 2022-01-20  8:26 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: gcc-patches, Frederik Harwath, Richard Biener, Arseny Solokha,
	Andrew Pinski

[-- Attachment #1: Type: text/plain, Size: 4113 bytes --]

Hi Jakub!

Thanks for looking into this.

On 2022-01-20T00:00:23+0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Jan 19, 2022 at 11:29:18PM +0100, Thomas Schwinge wrote:
>> (The pass is still disabled by default, by the way.)
>>
>> We've found that 'gcc/omp-oacc-kernels-decompose.cc' is currently not at
>> all considerate of 'GIMPLE_DEBUG' statements -- and it's not always
>> straight forward how to handle these (not rocket science either; but
>> needs proper understanding and testing).
>
> The general rule is that debug stmts shouldn't affect code generation
> decisions, so when deciding what to optimize/how, they should be ignored

ACK.  (... and I'm confused why we didn't run into this when originally
doing the OpenACC 'kernels' decomposition work, three years ago...)

> and during actual transformation adjusted or worst case reset as needed.

That's what we need to look into, in particular: if we decompose (GIMPLE
sequence) an OpenACC 'kernels' region into parts, how to move or
otherwise handle any 'GIMPLE_DEBUG's.

>> Actually fixing it is a separate task, but it seems prudent to at least
>> catch it, and document via a few test cases.  OK to push
>> "Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition
>> [PR100400, PR103836, PR104061]", see attached?
>
>> --- a/gcc/omp-oacc-kernels-decompose.cc
>> +++ b/gcc/omp-oacc-kernels-decompose.cc
>> @@ -1255,6 +1255,16 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
>>        gsi_next (&gsi_n);
>>
>>        gimple *stmt = gsi_stmt (gsi);
>> +      if (gimple_code (stmt) == GIMPLE_DEBUG)
>> +    {
>> +      if (flag_compare_debug_opt || flag_compare_debug)
>> +        /* Let the usual '-fcompare-debug' analysis bail out, as
>> +           necessary.  */
>> +        ;
>> +      else
>> +        sorry_at (loc, "%qs not yet supported",
>> +                  gimple_code_name[gimple_code (stmt)]);
>> +    }
>
> This is wrong.

I have a different understanding what "wrong" means.  ;-)

> It shouldn't be dependent on flag_compare_debug* options,
> those are just debugging aids to verify that -g/-g0 don't affect code
> generation.  With the above you'd pretend they don't, but they actually
> would (with -g you'd get sorry, without it it would compile fine).

The idea there is: not all 'GIMPLE_DEBUG's are mishandled in the pass,
just some.  If '-fcompare-debug' is in effect, we know that it will
detect any cases of mishandling (code generation difference), so it's
thus fine in that case to skip the coarse-grained 'sorry' here.

> If this code is analysing whether the kernels region body should be
> decomposed or not

This place here is just a convenient one, where we iterate through the
whole GIMPLE sequence.

With these things now hopfully clarified, is the attached
"Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition
[PR100400, PR103836, PR104061]" OK to push?  It's of course not the final
fix, but it at least makes obvious any current silent miscompilation, and
incremental improvement over the current status.

> it should be if (is_gimple_debug (stmt)) continue;
> or whatever else to just ignore them (in some opts already during analysis
> phase we remember they are present and something about them, but not in
> a way that would actually affect the code generation decisions).
> And then when actually transforming it, it depends on what transformations
> are done to the variables/values referenced in the debug stmts.
> gimple_debug_bind_reset_value (stmt); update_stmt (stmt); is
> what resets them and can be used as last resort, it will keep saying
> that it describes some var, but will say that the var is optimized out.

Thanks, that'll be helpful later.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Catch-GIMPLE_DEBUG-misbehavior-in-OpenACC-kernels-de.patch --]
[-- Type: text/x-diff, Size: 29014 bytes --]

From 568808ef7ccc97ebeae90bc7cb1aba6bd7659b24 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 19 Jan 2022 14:04:42 +0100
Subject: [PATCH] Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels'
 decomposition [PR100400, PR103836, PR104061]

Actually fixing it is a separate task, but it seems prudent to at least catch
it, and document via a few test cases.

	gcc/
	PR middle-end/100400
	PR middle-end/103836
	PR middle-end/104061
	* omp-oacc-kernels-decompose.cc (decompose_kernels_region_body):
	Catch 'GIMPLE_DEBUG'.
	gcc/testsuite/
	PR middle-end/100400
	PR middle-end/103836
	PR middle-end/104061
	* c-c++-common/goacc/kernels-decompose-pr100400-1-1.c: New.
	* c-c++-common/goacc/kernels-decompose-pr100400-1-2.c: New.
	* c-c++-common/goacc/kernels-decompose-pr100400-1-3.c: New.
	* c-c++-common/goacc/kernels-decompose-pr100400-1-4.c: New.
	* c-c++-common/goacc/kernels-decompose-pr103836-1-1.c: New.
	* c-c++-common/goacc/kernels-decompose-pr103836-1-2.c: New.
	* c-c++-common/goacc/kernels-decompose-pr103836-1-3.c: New.
	* c-c++-common/goacc/kernels-decompose-pr103836-1-4.c: New.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-1.c: New.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-2.c: New.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-3.c: New.
	* c-c++-common/goacc/kernels-decompose-pr104061-1-4.c: New.
---
 gcc/omp-oacc-kernels-decompose.cc             | 10 +++++
 .../goacc/kernels-decompose-pr100400-1-1.c    | 33 ++++++++++++++
 .../goacc/kernels-decompose-pr100400-1-2.c    | 40 +++++++++++++++++
 .../goacc/kernels-decompose-pr100400-1-3.c    | 42 ++++++++++++++++++
 .../goacc/kernels-decompose-pr100400-1-4.c    | 40 +++++++++++++++++
 .../goacc/kernels-decompose-pr103836-1-1.c    | 26 +++++++++++
 .../goacc/kernels-decompose-pr103836-1-2.c    | 29 +++++++++++++
 .../goacc/kernels-decompose-pr103836-1-3.c    | 30 +++++++++++++
 .../goacc/kernels-decompose-pr103836-1-4.c    | 30 +++++++++++++
 .../goacc/kernels-decompose-pr104061-1-1.c    | 30 +++++++++++++
 .../goacc/kernels-decompose-pr104061-1-2.c    | 33 ++++++++++++++
 .../goacc/kernels-decompose-pr104061-1-3.c    | 43 +++++++++++++++++++
 .../goacc/kernels-decompose-pr104061-1-4.c    | 41 ++++++++++++++++++
 13 files changed, 427 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-2.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-3.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-4.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-2.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-3.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-4.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-2.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-3.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-4.c

diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc
index 21872db3ed3..98eafdbe3a1 100644
--- a/gcc/omp-oacc-kernels-decompose.cc
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -1255,6 +1255,16 @@ decompose_kernels_region_body (gimple *kernels_region, tree kernels_clauses)
       gsi_next (&gsi_n);
 
       gimple *stmt = gsi_stmt (gsi);
+      if (gimple_code (stmt) == GIMPLE_DEBUG)
+	{
+	  if (flag_compare_debug_opt || flag_compare_debug)
+	    /* Let the usual '-fcompare-debug' analysis bail out, as
+	       necessary.  */
+	    ;
+	  else
+	    sorry_at (loc, "%qs not yet supported",
+		      gimple_code_name[gimple_code (stmt)]);
+	}
       gimple *omp_for = top_level_omp_for_in_stmt (stmt);
       bool is_unconditional_oacc_for_loop = false;
       if (omp_for != NULL)
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-1.c
new file mode 100644
index 00000000000..f63800514c4
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-1.c
@@ -0,0 +1,33 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-g0" } */
+/* { dg-additional-options "-O1" } */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+int *p;
+
+void
+foo (void)
+{
+#pragma acc kernels
+  /* { dg-note {variable 'c' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } .-1 } */
+  /* { dg-note {variable 'c\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
+  {
+    int c;
+
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+    p = &c;
+
+    /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+#pragma acc loop independent
+    /* { dg-note {variable 'c\.0' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+    /* { dg-note {variable 'c' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } .-2 }
+       { dg-note {variable 'c' ought to be adjusted for OpenACC privatization level: 'vector'} {} { target *-*-* } .-3 } */
+    /* { dg-optimized {assigned OpenACC gang vector loop parallelism} {} { target *-*-* } .-4 } */
+    for (c = 0; c < 1; ++c)
+      ;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-2.c
new file mode 100644
index 00000000000..1eee3b07a75
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-2.c
@@ -0,0 +1,40 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fchecking" }
+   { dg-ice TODO { c++ } }
+   { dg-prune-output "during GIMPLE pass: omp_oacc_kernels_decompose" } */
+
+/* { dg-additional-options "-g" } */
+/* { dg-additional-options "-O1" } so that we may get some 'GIMPLE_DEBUG's.  */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+int *p;
+
+void
+foo (void)
+{
+  /* { dg-bogus {sorry, unimplemented: 'gimple_debug' not yet supported} TODO { xfail *-*-* } .+1 } */
+#pragma acc kernels
+  /* { dg-note {variable 'c' declared in block is candidate for adjusting OpenACC privatization level} {} { xfail *-*-* } .-1 } */
+  /* { dg-note {variable 'c\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail *-*-* } .-2 } */
+  {
+    /* { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c++ } .-1 }
+       { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c } .+1 } */
+    int c;
+
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { xfail *-*-* } .+1 } */
+    p = &c;
+
+    /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { xfail c++ } .+1 } */
+#pragma acc loop independent
+    /* { dg-note {variable 'c\.0' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail *-*-* } .-1 } */
+    /* { dg-note {variable 'c' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { xfail *-*-* } .-2 }
+       { dg-note {variable 'c' ought to be adjusted for OpenACC privatization level: 'vector'} {} { xfail *-*-* } .-3 } */
+    /* { dg-optimized {assigned OpenACC gang vector loop parallelism} {} { xfail *-*-* } .-4 } */
+    for (c = 0; c < 1; ++c)
+      ;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-3.c
new file mode 100644
index 00000000000..dce4e399fbe
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-3.c
@@ -0,0 +1,42 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fchecking" }
+   { dg-ice TODO { c++ } }
+   { dg-prune-output "during GIMPLE pass: omp_oacc_kernels_decompose" } */
+
+/* { dg-additional-options "-fcompare-debug" } -- w/o debug compiled first.
+   { dg-bogus {error: during '-fcompare-debug' recompilation} TODO { xfail c++ } 0 }
+   { dg-bogus {error: [^\n\r]+: '-fcompare-debug' failure \(length\)} TODO { xfail c++ } 0 } */
+/* { dg-additional-options "-O1" } so that we may get some 'GIMPLE_DEBUG's.  */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+int *p;
+
+void
+foo (void)
+{
+  /* { dg-bogus {sorry, unimplemented: 'gimple_debug' not yet supported} {} { target *-*-* } .+1 } suppressed via '-fcompare-debug'.  */
+#pragma acc kernels
+  /* { dg-note {variable 'c' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } .-1 } */
+  /* { dg-note {variable 'c\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
+  {
+    /* { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c++ } .-1 }
+       { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c } .+1 } */
+    int c;
+
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+    p = &c;
+
+    /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+#pragma acc loop independent
+    /* { dg-note {variable 'c\.0' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+    /* { dg-note {variable 'c' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } .-2 }
+       { dg-note {variable 'c' ought to be adjusted for OpenACC privatization level: 'vector'} {} { target *-*-* } .-3 } */
+    /* { dg-optimized {assigned OpenACC gang vector loop parallelism} {} { target *-*-* } .-4 } */
+    for (c = 0; c < 1; ++c)
+      ;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-4.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-4.c
new file mode 100644
index 00000000000..7ca4440d075
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100400-1-4.c
@@ -0,0 +1,40 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fchecking" }
+   { dg-ice TODO { c++ } }
+   { dg-prune-output "during GIMPLE pass: omp_oacc_kernels_decompose" } */
+
+/* { dg-additional-options "-g -fcompare-debug" } -- w/ debug compiled first.  */
+/* { dg-additional-options "-O1" } so that we may get some 'GIMPLE_DEBUG's.  */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+int *p;
+
+void
+foo (void)
+{
+  /* { dg-bogus {sorry, unimplemented: 'gimple_debug' not yet supported} {} { target *-*-* } .+1 } suppressed via '-fcompare-debug'.  */
+#pragma acc kernels
+  /* { dg-note {variable 'c' declared in block is candidate for adjusting OpenACC privatization level} {} { xfail c++ } .-1 } */
+  /* { dg-note {variable 'c\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail c++ } .-2 } */
+  {
+    /* { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c++ } .-1 }
+       { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c } .+1 } */
+    int c;
+
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { xfail c++ } .+1 } */
+    p = &c;
+
+    /* { dg-note {parallelized loop nest in OpenACC 'kernels' region} {} { xfail c++ } .+1 } */
+#pragma acc loop independent
+    /* { dg-note {variable 'c\.0' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail c++ } .-1 } */
+    /* { dg-note {variable 'c' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { xfail c++ } .-2 }
+       { dg-note {variable 'c' ought to be adjusted for OpenACC privatization level: 'vector'} {} { xfail c++ } .-3 } */
+    /* { dg-optimized {assigned OpenACC gang vector loop parallelism} {} { xfail c++ } .-4 } */
+    for (c = 0; c < 1; ++c)
+      ;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-1.c
new file mode 100644
index 00000000000..46ca0c99d2f
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-1.c
@@ -0,0 +1,26 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-g0" } */
+/* { dg-additional-options "-O1" } */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+extern int i;
+
+void
+f_acc_kernels (void)
+{
+#pragma acc kernels
+  /* { dg-note {variable 'i\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+  {
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop
+    /* { dg-note {variable 'i\.0' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } .-3 } */
+    for (i = 0; i < 2; ++i)
+      ;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-2.c
new file mode 100644
index 00000000000..e0f24cee2db
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-2.c
@@ -0,0 +1,29 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-g" } */
+/* { dg-additional-options "-O1" } so that we may get some 'GIMPLE_DEBUG's.  */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+extern int i;
+
+void
+f_acc_kernels (void)
+{
+  /* { dg-bogus {sorry, unimplemented: 'gimple_debug' not yet supported} TODO { xfail c++ } .+1 } */
+#pragma acc kernels
+  /* { dg-note {variable 'i\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail c++ } .-1 } */
+  {
+    /* { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c++ } .-1 } */
+
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop
+    /* { dg-note {variable 'i\.0' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail c++ } .-1 } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail c++ } .-2 } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { xfail c++ } .-3 } */
+    for (i = 0; i < 2; ++i)
+      ;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-3.c
new file mode 100644
index 00000000000..cbf1b7c3e25
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-3.c
@@ -0,0 +1,30 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fcompare-debug" } -- w/o debug compiled first.
+   { dg-bogus {error: [^\n\r]+: '-fcompare-debug' failure \(length\)} TODO { xfail c++ } 0 } */
+/* { dg-additional-options "-O1" } so that we may get some 'GIMPLE_DEBUG's.  */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+extern int i;
+
+void
+f_acc_kernels (void)
+{
+  /* { dg-bogus {sorry, unimplemented: 'gimple_debug' not yet supported} {} { target *-*-* } .+1 } */
+#pragma acc kernels
+  /* { dg-note {variable 'i\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+  {
+    /* { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c++ } .-1 } */
+
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop
+    /* { dg-note {variable 'i\.0' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } .-3 } */
+    for (i = 0; i < 2; ++i)
+      ;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-4.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-4.c
new file mode 100644
index 00000000000..21bbe37723f
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr103836-1-4.c
@@ -0,0 +1,30 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-g -fcompare-debug" } -- w/ debug compiled first.
+   { dg-bogus {error: [^\n\r]+: '-fcompare-debug' failure \(length\)} TODO { xfail c++ } 0 } */
+/* { dg-additional-options "-O1" } so that we may get some 'GIMPLE_DEBUG's.  */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+extern int i;
+
+void
+f_acc_kernels (void)
+{
+  /* { dg-bogus {sorry, unimplemented: 'gimple_debug' not yet supported} {} { target *-*-* } .+1 } */
+#pragma acc kernels
+  /* { dg-note {variable 'i\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+  {
+    /* { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c++ } .-1 } */
+
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop
+    /* { dg-note {variable 'i\.0' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+    /* { dg-note {variable 'i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } .-3 } */
+    for (i = 0; i < 2; ++i)
+      ;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-1.c
new file mode 100644
index 00000000000..a58fce33426
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-1.c
@@ -0,0 +1,30 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-g0" } */
+/* { dg-additional-options "-O1" } */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+int arr_0;
+
+void
+foo (void)
+{
+#pragma acc kernels
+  /* { dg-note {variable 'arr_0\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+  {
+    int k;
+
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop
+    /* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+    /* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } .-3 } */
+    for (k = 0; k < 2; k++)
+      arr_0 += k;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-2.c
new file mode 100644
index 00000000000..d66dee6f8a7
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-2.c
@@ -0,0 +1,33 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-g" } */
+/* { dg-additional-options "-O1" } so that we may get some 'GIMPLE_DEBUG's.  */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+int arr_0;
+
+void
+foo (void)
+{
+  /* { dg-bogus {sorry, unimplemented: 'gimple_debug' not yet supported} TODO { xfail *-*-* } .+1 } */
+#pragma acc kernels
+  /* { dg-note {variable 'arr_0\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail *-*-* } .-1 } */
+  {
+    /* { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c++ } .-1 }
+       { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c } .+1 } */
+    int k;
+
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop
+    /* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail *-*-* } .-1 } */
+    /* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail *-*-* } .-2 } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { xfail *-*-* } .-3 } */
+    for (k = 0; k < 2; k++)
+      arr_0 += k;
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-3.c
new file mode 100644
index 00000000000..20c84e2f3db
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-3.c
@@ -0,0 +1,43 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fchecking" }
+   { dg-ice TODO }
+   { dg-prune-output {D\.[0-9]+ = arr_0\.0 \+ k;} }
+   { dg-prune-output {during GIMPLE pass: lower} } */
+
+/* { dg-additional-options "-fcompare-debug" } -- w/o debug compiled first.
+   { dg-bogus {error: during '-fcompare-debug' recompilation} TODO { xfail *-*-* } 0 }
+   { dg-bogus {error: [^\n\r]+: '-fcompare-debug' failure \(length\)} TODO { xfail *-*-* } 0 } */
+/* { dg-additional-options "-O1" } so that we may get some 'GIMPLE_DEBUG's.  */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+int arr_0;
+
+void
+foo (void)
+{
+  /* { dg-bogus {sorry, unimplemented: 'gimple_debug' not yet supported} {} { target *-*-* } .+1 } suppressed via '-fcompare-debug'.  */
+#pragma acc kernels
+  /* { dg-bogus {note: variable 'k' declared in block is candidate for adjusting OpenACC privatization level} {w/ debug} { xfail *-*-* } .-1 } */
+  /* { dg-note {variable 'arr_0\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
+  {
+    /* { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c++ } .-1 }
+       { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c } .+1 } */
+    int k;
+
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop
+    /* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-1 } */
+    /* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
+    /* { dg-bogus {note: variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {w/ debug} { xfail *-*-* } .-3 } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } .-4 } */
+    for (k = 0; k < 2; k++)
+      arr_0 += k;
+      /* { dg-bogus {error: invalid operands in binary operation} {w/ debug} { xfail *-*-* } .-1 } */
+  }
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-4.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-4.c
new file mode 100644
index 00000000000..6b6effe1791
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104061-1-4.c
@@ -0,0 +1,41 @@
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fchecking" }
+   { dg-ice TODO }
+   { dg-prune-output {D\.[0-9]+ = arr_0\.0 \+ k;} }
+   { dg-prune-output {during GIMPLE pass: lower} } */
+
+/* { dg-additional-options "-g -fcompare-debug" } -- w/ debug compiled first.  */
+/* { dg-additional-options "-O1" } so that we may get some 'GIMPLE_DEBUG's.  */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" }
+   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
+   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
+
+int arr_0;
+
+void
+foo (void)
+{
+  /* { dg-bogus {sorry, unimplemented: 'gimple_debug' not yet supported} {} { target *-*-* } .+1 } suppressed via '-fcompare-debug'.  */
+#pragma acc kernels
+  /* { dg-bogus {note: variable 'k' declared in block is candidate for adjusting OpenACC privatization level} {w/ debug} { xfail *-*-* } .-1 } */
+  /* { dg-note {variable 'arr_0\.0' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } .-2 } */
+  {
+    /* { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c++ } .-1 }
+       { dg-bogus {note: beginning 'gang-single' part in OpenACC 'kernels' region} {w/ debug} { xfail c } .+1 } */
+    int k;
+
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } .+1 } */
+#pragma acc loop
+    /* { dg-note {variable 'k' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail *-*-* } .-1 } */
+    /* { dg-note {variable 'k' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { xfail *-*-* } .-2 } */
+    /* { dg-bogus {note: variable 'k' in 'private' clause is candidate for adjusting OpenACC privatization level} {w/ debug} { xfail *-*-* } .-3 } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { xfail *-*-* } .-4 } */
+    for (k = 0; k < 2; k++)
+      arr_0 += k;
+      /* { dg-bogus {error: invalid operands in binary operation} {w/ debug} { xfail *-*-* } .-1 } */
+  }
+}
-- 
2.25.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition [PR100400, PR103836, PR104061]
  2022-01-20  8:26       ` Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition [PR100400, PR103836, PR104061] Thomas Schwinge
@ 2022-01-20  9:58         ` Jakub Jelinek
  0 siblings, 0 replies; 33+ messages in thread
From: Jakub Jelinek @ 2022-01-20  9:58 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches, Arseny Solokha

On Thu, Jan 20, 2022 at 09:26:50AM +0100, Thomas Schwinge wrote:
> That's what we need to look into, in particular: if we decompose (GIMPLE
> sequence) an OpenACC 'kernels' region into parts, how to move or
> otherwise handle any 'GIMPLE_DEBUG's.

I admit I haven't looked at the pass except now for the toplevel comment.
It says that OpenACC constructs in the region are perhaps adjusted but
their body is unchanged, so that suggests that debug stmts inside of those
bodies should be kept as is.
Next it says that sequential code in between those loops/whatever are
put into some sequential construct, so I guess if you decide so because
of some non-debug stmts, you can just move the debug stmts into that
construct as well, including those debug stmts before the first such
non-debug stmt and debug stmts after the last such non-debug stmts.
It is not a perfect solution, because normally debug stmts before
loops would affect also what is in the loop unless overridden, but
what the pass does seems terribly destructive for debug experience anyway.
There is then another case, only debug stmts e.g. in between or before
the loops or after them and nothing else.  Perhaps throwing them away at
this point is the best thing to do (but, all of this only after the pass
decides that it will change something).

Another thing is, this is apparently a very early pass, so most real
debug stmts don't exist, they are typically created later.
I'd expect you mostly see gimple_debug_begin_stmt_p stmts.
Those can be removed more easily, it doesn't mean var has this value
for the following code until stated otherwise, but it just said here was
the start of some source code statement.  So, if you drop them, all that
will work worse is break some_line.
So citing from e.g. PR100400:
void foo ()
{
  # DEBUG BEGIN_STMT // Outside of region, don't touch this
  #pragma omp target oacc_kernels map(force_tofrom:p [len: 8])
    {
      int c.0;

      # DEBUG BEGIN_STMT   // Drop this
      try
        {
          # DEBUG BEGIN_STMT  // If p = &c; is moved somewhere, move the surrounding DEBUG BEGIN_STMTs with it
          # DEBUG BEGIN_STMT
          p = &c;
          # DEBUG BEGIN_STMT  // Up to here
          #pragma acc loop independent private(c.0) private(c)
          for (c.0 = 0; c.0 < 1; c.0 = c.0 + 1)
            {
              c = c.0;
              # DEBUG BEGIN_STMT // Keep this in the body
            }
        }
      finally
        {
          c = {CLOBBER};
        }
    }
}
If you don't have time for it right now, after deciding you are
going to transform it just gsi_remove gimple_debug_begin_stmt_p stmts
you don't know how to handle.

> With these things now hopfully clarified, is the attached
> "Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition
> [PR100400, PR103836, PR104061]" OK to push?  It's of course not the final
> fix, but it at least makes obvious any current silent miscompilation, and
> incremental improvement over the current status.

No, users really don't want to see sorry messages just because they turned
-g on their code.  They might be ok with their kernels not being easily
debuggable, but they surely will not be ok with not being able to debug
the host code in the same TU.

	Jakub


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Add 'c-c++-common/goacc/kernels-decompose-pr104086-1.c' [PR104086]
  2020-11-13 22:22 ` Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs (was: [og8] OpenACC 'kernels' construct changes: splitting of the construct into several regions) Thomas Schwinge
                     ` (3 preceding siblings ...)
  2022-01-19 22:29   ` Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition [PR100400, PR103836, PR104061] " Thomas Schwinge
@ 2022-03-12 12:38   ` Thomas Schwinge
  2022-03-12 12:42     ` OpenACC 'kernels' decomposition: Mark variables used in 'present' clauses as addressable [PR100280, PR104086] Thomas Schwinge
  2022-03-17  8:04   ` Enhance further testcases to verify Openacc 'kernels' decomposition Thomas Schwinge
  5 siblings, 1 reply; 33+ messages in thread
From: Thomas Schwinge @ 2022-03-12 12:38 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Arseny Solokha

[-- Attachment #1: Type: text/plain, Size: 2551 bytes --]

Hi!

On 2020-11-13T23:22:30+0100, I wrote:
> On 2019-02-01T00:59:30+0100, I wrote:
>> I've just pushed the attached nine patches to openacc-gcc-8-branch:
>> OpenACC 'kernels' construct changes: splitting of the construct into
>> several regions.
>
> Now, slightly more polished, I've pushed to master branch a variant of
> most of these patches combined in commit
> e898ce7997733c29dcab9c3c62ca102c7f9fa6eb "Decompose OpenACC 'kernels'
> constructs into parts, a sequence of compute constructs", see attached.
>
>> There's more work to be done there, and we're aware of a number of TODO
>> items, but nevertheless: it's a good first step.
>
> That's still the case...  :-)

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
> @@ -0,0 +1,8 @@
> +/* { dg-additional-options "-fopenacc-kernels=decompose" } */
> +/* Hopefully, this is the same issue as '../../../gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c'.
> +   { dg-ice "TODO" }
> +   TODO { dg-prune-output "during GIMPLE pass: omplower" }
> +   TODO { dg-do link } */
> +
> +#undef KERNELS_DECOMPOSE_ICE_HACK
> +#include "declare-vla.c"

Arseny had later reduced that, and filed <https://gcc.gnu.org/PR104086>.
To document the status quo, pushed to master branch
commit 9781ae3a254a8c17ef4ffa70f21ed1728ff3c707
"Add 'c-c++-common/goacc/kernels-decompose-pr104086-1.c' [PR104086]",
see attached.


Grüße
 Thomas


> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c
> @@ -0,0 +1,6 @@
> +/* { dg-additional-options "-fopenacc-kernels=decompose" } */
> +
> +/* See also 'declare-vla-kernels-decompose-ice-1.c'.  */
> +
> +#define KERNELS_DECOMPOSE_ICE_HACK
> +#include "declare-vla.c"

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
> @@ -38,6 +38,12 @@ f_data (void)
>      for (i = 0; i < N; i++)
>        A[i] = -i;
>
> +    /* See 'declare-vla-kernels-decompose.c'.  */
> +#ifdef KERNELS_DECOMPOSE_ICE_HACK
> +    (volatile int *) &i;
> +    (volatile int *) &N;
> +#endif
> +
>  # pragma acc kernels
>      for (i = 0; i < N; i++)
>        A[i] = i;


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Add-c-c-common-goacc-kernels-decompose-pr104086-1.c-.patch --]
[-- Type: text/x-diff, Size: 1903 bytes --]

From 9781ae3a254a8c17ef4ffa70f21ed1728ff3c707 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Tue, 18 Jan 2022 17:22:14 +0100
Subject: [PATCH] Add 'c-c++-common/goacc/kernels-decompose-pr104086-1.c'
 [PR104086]

..., currently XFAILed with 'dg-ice', as it runs into
'gcc/omp-low.cc:lower_omp_target':

    13125			else if (is_gimple_reg (var))
    13126			  {
    13127			    gcc_assert (offloaded);

This means, the recent PR100280 etc. changes are still not sufficient.

	gcc/testsuite/
	PR middle-end/104086
	* c-c++-common/goacc/kernels-decompose-pr104086-1.c: New file.
---
 .../goacc/kernels-decompose-pr104086-1.c      | 25 +++++++++++++++++++
 1 file changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104086-1.c

diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104086-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104086-1.c
new file mode 100644
index 00000000000..eab10cf6c72
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104086-1.c
@@ -0,0 +1,25 @@
+/* Reduced from 'libgomp.oacc-c-c++-common/declare-vla.c'.  */
+
+/* { dg-additional-options "-fchecking" }
+   { dg-ice TODO }
+   { dg-prune-output {during GIMPLE pass: omplower} } */
+
+/* { dg-additional-options "--param openacc-kernels=decompose" } */
+
+/* { dg-additional-options "-fopt-info-all-omp" } */
+
+/* { dg-additional-options "--param=openacc-privatization=noisy" } */
+
+void
+foo (void)
+{
+#pragma acc data /* { dg-line l_data1 } */
+  /* { dg-bogus {note: variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {TODO 'data'} { xfail *-*-* } l_data1 } */
+  {
+    int i;
+
+#pragma acc kernels
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+    i = 0;
+  }
+}
-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* OpenACC 'kernels' decomposition: Mark variables used in 'present' clauses as addressable [PR100280, PR104086]
  2022-03-12 12:38   ` Add 'c-c++-common/goacc/kernels-decompose-pr104086-1.c' [PR104086] Thomas Schwinge
@ 2022-03-12 12:42     ` Thomas Schwinge
  0 siblings, 0 replies; 33+ messages in thread
From: Thomas Schwinge @ 2022-03-12 12:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Arseny Solokha

[-- Attachment #1: Type: text/plain, Size: 2871 bytes --]

Hi!

On 2022-03-12T13:38:38+0100, I wrote:
> On 2020-11-13T23:22:30+0100, I wrote:
>> On 2019-02-01T00:59:30+0100, I wrote:
>>> I've just pushed the attached nine patches to openacc-gcc-8-branch:
>>> OpenACC 'kernels' construct changes: splitting of the construct into
>>> several regions.
>>
>> Now, slightly more polished, I've pushed to master branch a variant of
>> most of these patches combined in commit
>> e898ce7997733c29dcab9c3c62ca102c7f9fa6eb "Decompose OpenACC 'kernels'
>> constructs into parts, a sequence of compute constructs", see attached.
>>
>>> There's more work to be done there, and we're aware of a number of TODO
>>> items, but nevertheless: it's a good first step.
>>
>> That's still the case...  :-)
>
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
>> @@ -0,0 +1,8 @@
>> +/* { dg-additional-options "-fopenacc-kernels=decompose" } */
>> +/* Hopefully, this is the same issue as '../../../gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c'.

(Related, but not the same.)

>> +   { dg-ice "TODO" }
>> +   TODO { dg-prune-output "during GIMPLE pass: omplower" }
>> +   TODO { dg-do link } */
>> +
>> +#undef KERNELS_DECOMPOSE_ICE_HACK
>> +#include "declare-vla.c"
>
> Arseny had later reduced that, and filed <https://gcc.gnu.org/PR104086>.
> To document the status quo, pushed to master branch
> commit 9781ae3a254a8c17ef4ffa70f21ed1728ff3c707
> "Add 'c-c++-common/goacc/kernels-decompose-pr104086-1.c' [PR104086]"

>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c
>> @@ -0,0 +1,6 @@
>> +/* { dg-additional-options "-fopenacc-kernels=decompose" } */
>> +
>> +/* See also 'declare-vla-kernels-decompose-ice-1.c'.  */
>> +
>> +#define KERNELS_DECOMPOSE_ICE_HACK
>> +#include "declare-vla.c"

>> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
>> @@ -38,6 +38,12 @@ f_data (void)
>>      for (i = 0; i < N; i++)
>>        A[i] = -i;
>>
>> +    /* See 'declare-vla-kernels-decompose.c'.  */
>> +#ifdef KERNELS_DECOMPOSE_ICE_HACK
>> +    (volatile int *) &i;
>> +    (volatile int *) &N;
>> +#endif
>> +
>>  # pragma acc kernels
>>      for (i = 0; i < N; i++)
>>        A[i] = i;

Pushed to master branch commit 337ed336d7dd83526891bdb436f0bfe9e351f69d
"OpenACC 'kernels' decomposition: Mark variables used in 'present'
clauses as addressable [PR100280, PR104086]", see attached.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-OpenACC-kernels-decomposition-Mark-variables-used-in.patch --]
[-- Type: text/x-diff, Size: 18431 bytes --]

From 337ed336d7dd83526891bdb436f0bfe9e351f69d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 17 Feb 2022 14:18:57 +0100
Subject: [PATCH] OpenACC 'kernels' decomposition: Mark variables used in
 'present' clauses as addressable [PR100280, PR104086]

... like in recent commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in synthesized
data clauses as addressable [PR100280]".  Otherwise, we may run into
'gcc/omp-low.cc:lower_omp_target':

    13125                       else if (is_gimple_reg (var))
    13126                         {
    13127                           gcc_assert (offloaded);

	PR middle-end/100280
	PR middle-end/104086
	gcc/
	* omp-oacc-kernels-decompose.cc (omp_oacc_kernels_decompose_1):
	Mark variables used in 'present' clauses as addressable.
	* omp-low.cc (scan_sharing_clauses) <OMP_CLAUSE_MAP>: Gracefully
	handle duplicate 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-decompose-pr104086-1.c: Adjust,
	extend.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	Merge this...
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c:
	..., and this...
	* testsuite/libgomp.oacc-c-c++-common/declare-vla.c: ... into
	this, and adjust.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Extend.
---
 gcc/omp-low.cc                                | 27 +++++---
 gcc/omp-oacc-kernels-decompose.cc             | 32 +++++++++
 .../goacc/kernels-decompose-pr104086-1.c      | 37 +++++++++--
 .../declare-vla-kernels-decompose-ice-1.c     | 22 -------
 .../declare-vla-kernels-decompose.c           | 29 --------
 .../libgomp.oacc-c-c++-common/declare-vla.c   | 38 ++++++-----
 .../kernels-decompose-1.c                     | 66 ++++++++++++++++++-
 7 files changed, 168 insertions(+), 83 deletions(-)
 delete mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
 delete mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c

diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index d932d74cb03..cfc63d6a104 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -1501,11 +1501,14 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	    {
 	      gcc_checking_assert (DECL_P (decl));
 
-	      gcc_checking_assert (!TREE_ADDRESSABLE (decl));
-	      if (!make_addressable_vars)
-		make_addressable_vars = BITMAP_ALLOC (NULL);
-	      bitmap_set_bit (make_addressable_vars, DECL_UID (decl));
-	      TREE_ADDRESSABLE (decl) = 1;
+	      bool decl_addressable = TREE_ADDRESSABLE (decl);
+	      if (!decl_addressable)
+		{
+		  if (!make_addressable_vars)
+		    make_addressable_vars = BITMAP_ALLOC (NULL);
+		  bitmap_set_bit (make_addressable_vars, DECL_UID (decl));
+		  TREE_ADDRESSABLE (decl) = 1;
+		}
 
 	      if (dump_enabled_p ())
 		{
@@ -1517,10 +1520,16 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 # pragma GCC diagnostic push
 # pragma GCC diagnostic ignored "-Wformat"
 #endif
-		  dump_printf_loc (MSG_NOTE, d_u_loc,
-				   "variable %<%T%>"
-				   " made addressable\n",
-				   decl);
+		  if (!decl_addressable)
+		    dump_printf_loc (MSG_NOTE, d_u_loc,
+				     "variable %<%T%>"
+				     " made addressable\n",
+				     decl);
+		  else
+		    dump_printf_loc (MSG_NOTE, d_u_loc,
+				     "variable %<%T%>"
+				     " already made addressable\n",
+				     decl);
 #if __GNUC__ >= 10
 # pragma GCC diagnostic pop
 #endif
diff --git a/gcc/omp-oacc-kernels-decompose.cc b/gcc/omp-oacc-kernels-decompose.cc
index ecbd3071e5d..40b04539894 100644
--- a/gcc/omp-oacc-kernels-decompose.cc
+++ b/gcc/omp-oacc-kernels-decompose.cc
@@ -1468,6 +1468,38 @@ omp_oacc_kernels_decompose_1 (gimple *kernels_stmt)
 		  /* Now that this data is mapped, turn the data clause on the
 		     inner OpenACC 'kernels' into a 'present' clause.  */
 		  OMP_CLAUSE_SET_MAP_KIND (c, GOMP_MAP_FORCE_PRESENT);
+
+		  /* See <https://gcc.gnu.org/PR100280>,
+		     <https://gcc.gnu.org/PR104086>.  */
+		  if (DECL_P (decl)
+		      && !TREE_ADDRESSABLE (decl))
+		    {
+		      /* Request that OMP lowering make 'decl' addressable.  */
+		      OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE (new_clause) = 1;
+
+		      if (dump_enabled_p ())
+			{
+			  location_t loc = OMP_CLAUSE_LOCATION (new_clause);
+			  const dump_user_location_t d_u_loc
+			    = dump_user_location_t::from_location_t (loc);
+			  /* PR100695 "Format decoder, quoting in 'dump_printf'
+			     etc." */
+#if __GNUC__ >= 10
+# pragma GCC diagnostic push
+# pragma GCC diagnostic ignored "-Wformat"
+#endif
+			  dump_printf_loc
+			    (MSG_NOTE, d_u_loc,
+			     "OpenACC %<kernels%> decomposition:"
+			     " variable %<%T%> in %qs clause"
+			     " requested to be made addressable\n",
+			     decl,
+			     user_omp_clause_code_name (new_clause, true));
+#if __GNUC__ >= 10
+# pragma GCC diagnostic pop
+#endif
+			}
+		    }
 		}
 	      break;
 
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104086-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104086-1.c
index eab10cf6c72..83fb75e28b2 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104086-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr104086-1.c
@@ -1,8 +1,5 @@
-/* Reduced from 'libgomp.oacc-c-c++-common/declare-vla.c'.  */
-
-/* { dg-additional-options "-fchecking" }
-   { dg-ice TODO }
-   { dg-prune-output {during GIMPLE pass: omplower} } */
+/* Reduced from 'libgomp.oacc-c-c++-common/declare-vla.c', and then
+   extended.  */
 
 /* { dg-additional-options "--param openacc-kernels=decompose" } */
 
@@ -14,12 +11,38 @@ void
 foo (void)
 {
 #pragma acc data /* { dg-line l_data1 } */
-  /* { dg-bogus {note: variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {TODO 'data'} { xfail *-*-* } l_data1 } */
+  /* { dg-bogus {note: variable 'i' declared in block is candidate for adjusting OpenACC privatization level} {TODO 'data'} { xfail *-*-* } l_data1 } */
   {
     int i;
 
-#pragma acc kernels
+#pragma acc kernels /* { dg-line l_compute1 } */
+    /* { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute1 }
+       { dg-note {variable 'i' made addressable} {} { target *-*-* } l_compute1 } */
     /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
     i = 0;
+
+#pragma acc kernels /* { dg-line l_compute2 } */
+    /* { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute2 }
+       { dg-note {variable 'i' already made addressable} {} { target *-*-* } l_compute2 } */
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+    i = -1;
   }
 }
+
+void
+foo2 (void)
+{
+  int i[1];
+
+#pragma acc kernels /* { dg-line l2_compute1 } */
+  /* { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l2_compute1 }
+     { dg-note {variable 'i' made addressable} {} { target *-*-* } l2_compute1 } */
+  /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+  i[0] = 0;
+
+#pragma acc kernels /* { dg-line l2_compute2 } */
+  /* { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l2_compute2 }
+     { dg-note {variable 'i' already made addressable} {} { target *-*-* } l2_compute2 } */
+  /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+  i[0] = -1;
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
deleted file mode 100644
index 3e5b6bab233..00000000000
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c
+++ /dev/null
@@ -1,22 +0,0 @@
-/* { dg-additional-options "--param=openacc-kernels=decompose" } */
-/* ICE similar to PR100280, but not the same.
-   { dg-ice "TODO" }
-   TODO { dg-prune-output "during GIMPLE pass: omplower" }
-   TODO { dg-do link } */
-
-/* { dg-additional-options "-fopt-info-omp-all" }
-   { dg-additional-options "-foffload=-fopt-info-all-omp" } */
-
-/* { dg-additional-options "--param=openacc-privatization=noisy" }
-   { dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
-   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
-   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
-
-#undef KERNELS_DECOMPOSE_ICE_HACK
-#include "declare-vla.c"
-
-/* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } 27 } */
-
-/* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } 61 } */
-
-/* { dg-bogus {note: variable [^\n\r]+ candidate for adjusting OpenACC privatization level} {TODO 'data'} { xfail *-*-* } 42 } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c
deleted file mode 100644
index 142aceec9cd..00000000000
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose.c
+++ /dev/null
@@ -1,29 +0,0 @@
-/* { dg-additional-options "--param=openacc-kernels=decompose" } */
-
-/* See also 'declare-vla-kernels-decompose-ice-1.c'.  */
-
-/* { dg-additional-options "-fopt-info-omp-all" }
-   { dg-additional-options "-foffload=-fopt-info-all-omp" } */
-
-/* { dg-additional-options "--param=openacc-privatization=noisy" }
-   { dg-additional-options "-foffload=--param=openacc-privatization=noisy" }
-   Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
-   { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
-
-#define KERNELS_DECOMPOSE_ICE_HACK
-#include "declare-vla.c"
-
-/* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } 27 } */
-
-/* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } 61 } */
-
-/* { dg-bogus {note: variable [^\n\r]+ candidate for adjusting OpenACC privatization level} {TODO 'data'} { xfail *-*-* } 42 } */
-
-/* { dg-note {variable 'i\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } 58 }
-   { dg-note {variable 'N\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } 58 } */
-
-/* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target { ! __OPTIMIZE__ } } 24 }
-   { dg-optimized {assigned OpenACC gang loop parallelism} {} { target { __OPTIMIZE__ } } 24 } */
-
-/* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target { ! __OPTIMIZE__ } } 58 }
-   { dg-optimized {assigned OpenACC gang loop parallelism} {} { target { __OPTIMIZE__ } } 58 } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
index 4ce2e6d1f18..f6fc3ffefa4 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/declare-vla.c
@@ -1,5 +1,7 @@
 /* Verify OpenACC 'declare' with VLAs.  */
 
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
+
 /* { dg-additional-options "-fopt-info-omp-all" }
    { dg-additional-options "-foffload=-fopt-info-all-omp" } */
 
@@ -8,6 +10,15 @@
    Prune a few: uninteresting, and potentially varying depending on GCC configuration (data types):
    { dg-prune-output {note: variable 'D\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} } */
 
+/* It's only with Tcl 8.5 (released in 2007) that "the variable 'varName'
+   passed to 'incr' may be unset, and in that case, it will be set to [...]",
+   so to maintain compatibility with earlier Tcl releases, we manually
+   initialize counter variables:
+   { dg-line l_dummy[variable c_compute 0] }
+   { dg-message "dummy" "" { target iN-VAl-Id } l_dummy } to avoid
+   "WARNING: dg-line var l_dummy defined, but not used".  */
+
+
 #include <assert.h>
 
 
@@ -21,9 +32,10 @@ f (void)
   for (i = 0; i < N; i++)
     A[i] = -i;
 
-#pragma acc kernels
-  /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target { ! __OPTIMIZE__ } } .-1 }
-     { dg-optimized {assigned OpenACC gang loop parallelism} {} { target { __OPTIMIZE__ } } .-2 } */
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target { ! __OPTIMIZE__ } } l_compute$c_compute }
+     { dg-optimized {assigned OpenACC gang loop parallelism} {} { target __OPTIMIZE__ } l_compute$c_compute } */
+  /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
   for (i = 0; i < N; i++)
     A[i] = i;
 
@@ -49,15 +61,14 @@ f_data (void)
     for (i = 0; i < N; i++)
       A[i] = -i;
 
-    /* See 'declare-vla-kernels-decompose.c'.  */
-#ifdef KERNELS_DECOMPOSE_ICE_HACK
-    (volatile int *) &i;
-    (volatile int *) &N;
-#endif
-
-# pragma acc kernels
-  /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target { ! __OPTIMIZE__ } } .-1 }
-     { dg-optimized {assigned OpenACC gang loop parallelism} {} { target { __OPTIMIZE__ } } .-2 } */
+# pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+       { dg-note {variable 'i' made addressable} {} { target *-*-* } l_compute$c_compute } */
+    /* { dg-note {OpenACC 'kernels' decomposition: variable 'N' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+       { dg-note {variable 'N' made addressable} {} { target *-*-* } l_compute$c_compute } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target { ! __OPTIMIZE__ } } l_compute$c_compute }
+       { dg-optimized {assigned OpenACC gang loop parallelism} {} { target __OPTIMIZE__ } l_compute$c_compute } */
+    /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
     for (i = 0; i < N; i++)
       A[i] = i;
 
@@ -78,6 +89,3 @@ main ()
 
   return 0;
 }
-
-
-/* { dg-note dummy "" { target n-on-e } } to disable 'prune_notes'.  */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
index 40786c750d1..eb424776b6b 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c
@@ -24,7 +24,9 @@
 static int g1;
 static int g2;
 
-int main()
+/* PR100280, etc. */
+
+static void f1 ()
 {
   int a = 0;
   /*TODO Without making 'a' addressable, for GCN offloading we will not see the expected value copied out.  (But it does work for nvptx offloading, strange...)  */
@@ -153,5 +155,67 @@ int main()
   assert (g2 == N * (N + 1) / 2);
   assert (f1 == 2432902008176640000ULL);
 
+#undef N
+}
+
+
+/* PR104086 */
+
+static void f2 ()
+{
+#pragma acc data
+  /* { dg-bogus {note: variable [^\n\r]+ candidate for adjusting OpenACC privatization level} {TODO 'data'} { xfail *-*-* } .-1 } */
+  {
+    int i;
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+       { dg-note {variable 'i' made addressable} {} { target *-*-* } l_compute$c_compute } */
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+    i = 1;
+
+    assert (i == 1);
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+    /* { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+       { dg-note {variable 'i' already made addressable} {} { target *-*-* } l_compute$c_compute } */
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+    i = -1;
+
+    assert (i == -1);
+  }
+
+
+  int ia[1];
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-note {OpenACC 'kernels' decomposition: variable 'ia' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+     { dg-note {variable 'ia' made addressable} {} { target *-*-* } l_compute$c_compute } */
+  /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+  ia[0] = -2;
+
+  assert (ia[0] == -2);
+
+#pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
+  /* { dg-note {OpenACC 'kernels' decomposition: variable 'ia' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+     { dg-note {variable 'ia' already made addressable} {} { target *-*-* } l_compute$c_compute } */
+  /* { dg-note {OpenACC 'kernels' decomposition: variable 'i' declared in block requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+     { dg-note {variable 'i' made addressable} {} { target *-*-* } l_compute$c_compute }
+     { dg-note {variable 'i' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute$c_compute } */
+  /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_compute$c_compute } */
+  /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
+  for (int i = 0; i < 100; ++i)
+    ++ia[0];
+
+  assert (ia[0] == -2 + 100);
+}
+
+
+int main()
+{
+  f1 ();
+
+  f2 ();
+
   return 0;
 }
-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Enhance further testcases to verify Openacc 'kernels' decomposition
  2020-11-13 22:22 ` Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs (was: [og8] OpenACC 'kernels' construct changes: splitting of the construct into several regions) Thomas Schwinge
                     ` (4 preceding siblings ...)
  2022-03-12 12:38   ` Add 'c-c++-common/goacc/kernels-decompose-pr104086-1.c' [PR104086] Thomas Schwinge
@ 2022-03-17  8:04   ` Thomas Schwinge
  5 siblings, 0 replies; 33+ messages in thread
From: Thomas Schwinge @ 2022-03-17  8:04 UTC (permalink / raw)
  To: gcc-patches; +Cc: Kwok Cheung Yeung

[-- Attachment #1: Type: text/plain, Size: 605 bytes --]

Hi!

Pushed to master branch commit c43cb355f25dd22133d15819bd6ec03d3d3939fd
"Enhance further testcases to verify Openacc 'kernels' decomposition",
see attached.


Kwok, this ought to resolve some testsuite noise from your upcoming og12
branch.  Please let me know if there are any more such issues.


Grüße
 Thomas


-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Enhance-further-testcases-to-verify-Openacc-kernels-.patch --]
[-- Type: text/x-diff, Size: 39056 bytes --]

From c43cb355f25dd22133d15819bd6ec03d3d3939fd Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 16 Mar 2022 14:19:41 +0100
Subject: [PATCH] Enhance further testcases to verify Openacc 'kernels'
 decomposition

	gcc/testsuite/
	* c-c++-common/goacc-gomp/nesting-1.c: Enhance.
	* c-c++-common/goacc/kernels-loop-g.c: Likewise.
	* c-c++-common/goacc/nesting-1.c: Likewise.
	* gcc.dg/goacc/nested-function-1.c: Likewise.
	* gfortran.dg/goacc/common-block-3.f90: Likewise.
	* gfortran.dg/goacc/nested-function-1.f90: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c:
	Enhance.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Likewise.
	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
---
 .../c-c++-common/goacc-gomp/nesting-1.c       |  7 ++-
 .../c-c++-common/goacc/kernels-loop-g.c       |  3 ++
 gcc/testsuite/c-c++-common/goacc/nesting-1.c  | 18 +++++--
 .../gcc.dg/goacc/nested-function-1.c          | 22 ++++++++
 .../gfortran.dg/goacc/common-block-3.f90      | 18 +++++--
 .../gfortran.dg/goacc/nested-function-1.f90   | 10 ++++
 .../acc_prof-kernels-1.c                      | 19 +++++--
 .../kernels-loop-g.c                          |  3 ++
 .../testsuite/libgomp.oacc-fortran/if-1.f90   | 51 ++++++++++++++++++-
 9 files changed, 134 insertions(+), 17 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
index 39b92712b31..51c5e359f29 100644
--- a/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
+++ b/gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "--param=openacc-kernels=decompose" }
+
 /* { dg-additional-options "-fopt-info-omp-note" } */
 
 /* { dg-additional-options "--param=openacc-privatization=noisy" }
@@ -21,10 +23,11 @@ void
 f_acc_kernels (void)
 {
 #pragma acc kernels
-  /* { dg-note {variable 'i' declared in block is candidate for adjusting OpenACC privatization level} "" { target *-*-* } .-1 }
-     { dg-note {variable 'i' ought to be adjusted for OpenACC privatization level: 'gang'} "" { target *-*-* } .-2 } */
+  /* { dg-note {variable 'i' declared in block is candidate for adjusting OpenACC privatization level} "" { target *-*-* } .-1 } */
   {
     int i;
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} "" { target c } .+3 }
+       { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} "" { target c++ } .+1 } */
 #pragma omp atomic write
     i = 0;
   }
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
index 73b469d7061..5bdaa40b02c 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
@@ -1,5 +1,8 @@
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
+
 /* { dg-additional-options "-O2" } */
 /* { dg-additional-options "-g" } */
+/*TODO PR100400 { dg-additional-options -fcompare-debug } */
 /* { dg-additional-options "-fdump-tree-parloops1-all" } */
 /* { dg-additional-options "-fdump-tree-optimized" } */
 
diff --git a/gcc/testsuite/c-c++-common/goacc/nesting-1.c b/gcc/testsuite/c-c++-common/goacc/nesting-1.c
index 83cbff767a4..8c3d1adc785 100644
--- a/gcc/testsuite/c-c++-common/goacc/nesting-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/nesting-1.c
@@ -1,3 +1,5 @@
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
+
 /* { dg-additional-options "-fopt-info-all-omp" } */
 
 /* { dg-additional-options "--param=openacc-privatization=noisy" } */
@@ -32,11 +34,13 @@ void
 f_acc_kernels (void)
 {
 #pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
-  /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_compute$c_compute } */
+  /* { dg-note {variable 'i\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
   {
 #pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
     /* { dg-note {variable 'i\.[0-9]+' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
     /* { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_i$c_loop_i } */
+    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_loop_i$c_loop_i } */
     for (i = 0; i < 2; ++i)
       ;
   }
@@ -63,15 +67,17 @@ f_acc_data (void)
     }
 
 #pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
-    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_compute$c_compute } */
+    /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } l_compute$c_compute } */
     ;
 
 #pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
-    /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_compute$c_compute } */
+    /* { dg-note {variable 'i\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
     {
 #pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+      /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
       /* { dg-note {variable 'i\.[0-9]+' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
       /* { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_i$c_loop_i } */
+      /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_loop_i$c_loop_i } */
       for (i = 0; i < 2; ++i)
 	;
     }
@@ -102,15 +108,17 @@ f_acc_data (void)
       }
 
 #pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
-      /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_compute$c_compute } */
+      /* { dg-note {beginning 'gang-single' part in OpenACC 'kernels' region} {} { target *-*-* } l_compute$c_compute } */
       ;
 
 #pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
-      /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_compute$c_compute } */
+      /* { dg-note {variable 'i\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
       {
 #pragma acc loop /* { dg-line l_loop_i[incr c_loop_i] } */
+	/* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_loop_i$c_loop_i } */
 	/* { dg-note {variable 'i\.[0-9]+' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_loop_i$c_loop_i } */
 	/* { dg-note {variable 'i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_loop_i$c_loop_i } */
+	/* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_loop_i$c_loop_i } */
 	for (i = 0; i < 2; ++i)
 	  ;
       }
diff --git a/gcc/testsuite/gcc.dg/goacc/nested-function-1.c b/gcc/testsuite/gcc.dg/goacc/nested-function-1.c
index c34bcb0d601..2e48410b39d 100644
--- a/gcc/testsuite/gcc.dg/goacc/nested-function-1.c
+++ b/gcc/testsuite/gcc.dg/goacc/nested-function-1.c
@@ -2,6 +2,8 @@
 /* See gcc/testsuite/gfortran.dg/goacc/nested-function-1.f90 for the Fortran
    version.  */
 
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
+
 /* { dg-additional-options "-fopt-info-all-omp" } */
 
 /* { dg-additional-options "--param=openacc-privatization=noisy" }
@@ -42,6 +44,11 @@ int main ()
 #pragma acc kernels loop /* { dg-line l_compute_loop[incr c_compute_loop] } */ \
   gang(num:local_arg) worker(local_arg) vector(local_arg) \
   wait async(local_arg)
+    /* { dg-note {OpenACC 'kernels' decomposition: variable 'local_arg' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute_loop$c_compute_loop }
+       { dg-note {variable 'local_arg' made addressable} {} { target *-*-* } l_compute_loop$c_compute_loop } */
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_compute_loop$c_compute_loop } */
+    /* { dg-note {variable 'local_arg\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop } */
+    /* { dg-note {variable 'local_i\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop } */
     /* { dg-note {variable 'local_i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop } */
     /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_compute_loop$c_compute_loop } */
     for (local_i = 0; local_i < N; ++local_i)
@@ -61,6 +68,11 @@ int main ()
 #pragma acc kernels loop /* { dg-line l_compute_loop[incr c_compute_loop] } */ \
   gang(static:local_arg) worker(local_arg) vector(local_arg) \
   wait(local_arg, local_arg + 1, local_arg + 2) async
+    /* { dg-note {OpenACC 'kernels' decomposition: variable 'local_arg' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute_loop$c_compute_loop }
+       { dg-note {variable 'local_arg' already made addressable} {} { target *-*-* } l_compute_loop$c_compute_loop } */
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_compute_loop$c_compute_loop } */
+    /* { dg-note {variable 'local_arg\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop } */
+    /* { dg-note {variable 'local_i\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop } */
     /* { dg-note {variable 'local_i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop } */
     /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_compute_loop$c_compute_loop } */
     for (local_i = 0; local_i < N; ++local_i)
@@ -87,6 +99,11 @@ int main ()
 #pragma acc kernels loop /* { dg-line l_compute_loop[incr c_compute_loop] } */ \
   gang(num:nonlocal_arg) worker(nonlocal_arg) vector(nonlocal_arg) \
   wait async(nonlocal_arg)
+    /* { dg-note {OpenACC 'kernels' decomposition: variable 'nonlocal_arg' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute_loop$c_compute_loop }
+       { dg-note {variable 'nonlocal_arg' made addressable} {} { target *-*-* } l_compute_loop$c_compute_loop } */
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_compute_loop$c_compute_loop } */
+    /* { dg-note {variable 'nonlocal_arg\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop } */
+    /* { dg-note {variable 'nonlocal_i\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop } */
     /* { dg-note {variable 'nonlocal_i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop } */
     /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_compute_loop$c_compute_loop } */
     for (nonlocal_i = 0; nonlocal_i < N; ++nonlocal_i)
@@ -106,6 +123,11 @@ int main ()
 #pragma acc kernels loop /* { dg-line l_compute_loop[incr c_compute_loop] } */ \
   gang(static:nonlocal_arg) worker(nonlocal_arg) vector(nonlocal_arg) \
   wait(nonlocal_arg, nonlocal_arg + 1, nonlocal_arg + 2) async
+    /* { dg-note {OpenACC 'kernels' decomposition: variable 'nonlocal_arg' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute_loop$c_compute_loop }
+       { dg-note {variable 'nonlocal_arg' already made addressable} {} { target *-*-* } l_compute_loop$c_compute_loop } */
+    /* { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_compute_loop$c_compute_loop } */
+    /* { dg-note {variable 'nonlocal_arg\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop } */
+    /* { dg-note {variable 'nonlocal_i\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop } */
     /* { dg-note {variable 'nonlocal_i' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop } */
     /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l_compute_loop$c_compute_loop } */
     for (nonlocal_i = 0; nonlocal_i < N; ++nonlocal_i)
diff --git a/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90 b/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90
index 9dbfa4cd2f0..6f08d7eb8d5 100644
--- a/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/common-block-3.f90
@@ -1,5 +1,7 @@
 ! { dg-options "-fopenacc -fdump-tree-omplower" }
 
+! { dg-additional-options "--param=openacc-kernels=decompose" }
+
 ! { dg-additional-options "-fopt-info-omp-all" }
 
 ! { dg-additional-options "--param=openacc-privatization=noisy" }
@@ -28,7 +30,11 @@ program main
      a(i) = b(i) + c
   end do
   !$acc kernels ! { dg-line l2 }
+  ! { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l2 }
+  !   { dg-note {variable 'i' made addressable} {} { target *-*-* } l2 }
+  ! { dg-note {variable 'c\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l2 }
   ! { dg-optimized {assigned OpenACC seq loop parallelism} {} { target *-*-* } l2 }
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 }
   do i = 1, n
      x(i) = y(i) + c
   end do
@@ -39,10 +45,14 @@ end program main
 ! { dg-final { scan-tree-dump-times "omp target oacc_parallel .*map\\(tofrom:b \\\[len: 400\\\]\\\)" 1 "omplower" } }
 ! { dg-final { scan-tree-dump-times "omp target oacc_parallel .*map\\(tofrom:c \\\[len: 4\\\]\\)" 1 "omplower" } }
 
-! { dg-final { scan-tree-dump-times "omp target oacc_kernels .*map\\(force_tofrom:i \\\[len: 4\\\]\\)" 1 "omplower" } }
-! { dg-final { scan-tree-dump-times "omp target oacc_kernels .*map\\(tofrom:x \\\[len: 400\\\]\\)" 1 "omplower" } }
-! { dg-final { scan-tree-dump-times "omp target oacc_kernels .*map\\(tofrom:y \\\[len: 400\\\]\\\)" 1 "omplower" } }
-! { dg-final { scan-tree-dump-times "omp target oacc_kernels .*map\\(force_tofrom:c \\\[len: 4\\\]\\)" 1 "omplower" } }
+! { dg-final { scan-tree-dump-times "omp target oacc_data_kernels .*map\\(force_tofrom:i \\\[len: 4\\\]\\)" 1 "omplower" } }
+!   { dg-final { scan-tree-dump-times "omp target oacc_kernels .*map\\(force_present:i \\\[len: 4\\\]\\)" 1 "omplower" } }
+! { dg-final { scan-tree-dump-times "omp target oacc_data_kernels .*map\\(tofrom:x \\\[len: 400\\\]\\)" 1 "omplower" } }
+!   { dg-final { scan-tree-dump-times "omp target oacc_kernels .*map\\(force_present:x \\\[len: 400\\\]\\)" 1 "omplower" } }
+! { dg-final { scan-tree-dump-times "omp target oacc_data_kernels .*map\\(tofrom:y \\\[len: 400\\\]\\\)" 1 "omplower" } }
+!   { dg-final { scan-tree-dump-times "omp target oacc_kernels .*map\\(force_present:y \\\[len: 400\\\]\\\)" 1 "omplower" } }
+! { dg-final { scan-tree-dump-times "omp target oacc_data_kernels .*map\\(force_tofrom:c \\\[len: 4\\\]\\)" 1 "omplower" } }
+!   { dg-final { scan-tree-dump-times "omp target oacc_kernels .*map\\(force_present:c \\\[len: 4\\\]\\)" 1 "omplower" } }
 
 ! Expecting no mapping of un-referenced common-blocks variables
 
diff --git a/gcc/testsuite/gfortran.dg/goacc/nested-function-1.f90 b/gcc/testsuite/gfortran.dg/goacc/nested-function-1.f90
index 50fd0c82e14..c631f90b27f 100644
--- a/gcc/testsuite/gfortran.dg/goacc/nested-function-1.f90
+++ b/gcc/testsuite/gfortran.dg/goacc/nested-function-1.f90
@@ -1,6 +1,8 @@
 ! Exercise nested function decomposition, gcc/tree-nested.c.
 ! See gcc/testsuite/gcc.dg/goacc/nested-function-1.c for the C version.
 
+! { dg-additional-options "--param=openacc-kernels=decompose" }
+
 ! { dg-additional-options "-fopt-info-all-omp" }
 
 ! { dg-additional-options "--param=openacc-privatization=noisy" }
@@ -44,6 +46,8 @@ contains
     !$acc kernels loop &
     !$acc gang(num:local_arg) worker(local_arg) vector(local_arg) &
     !$acc wait async(local_arg) ! { dg-line l_compute_loop[incr c_compute_loop] }
+    ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_compute_loop$c_compute_loop }
+    ! { dg-note {variable 'local_i\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop }
     ! { dg-note {variable 'local_i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute_loop$c_compute_loop }
     ! { dg-note {variable 'local_i\.[0-9]+' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop }
     ! { dg-note {variable 'parm\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop }
@@ -65,6 +69,8 @@ contains
     !$acc kernels loop &
     !$acc gang(static:local_arg) worker(local_arg) vector(local_arg) &
     !$acc wait(local_arg, local_arg + 1, local_arg + 2) async ! { dg-line l_compute_loop[incr c_compute_loop] }
+    ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_compute_loop$c_compute_loop }
+    ! { dg-note {variable 'local_i\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop }
     ! { dg-note {variable 'local_i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute_loop$c_compute_loop }
     ! { dg-note {variable 'local_i\.[0-9]+' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop }
     ! { dg-note {variable 'parm\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop }
@@ -95,6 +101,8 @@ contains
     !$acc kernels loop &
     !$acc gang(num:nonlocal_arg) worker(nonlocal_arg) vector(nonlocal_arg) &
     !$acc wait async(nonlocal_arg) ! { dg-line l_compute_loop[incr c_compute_loop] }
+    ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_compute_loop$c_compute_loop }
+    ! { dg-note {variable 'nonlocal_i\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop }
     ! { dg-note {variable 'nonlocal_i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute_loop$c_compute_loop }
     ! { dg-note {variable 'nonlocal_i\.[0-9]+' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop }
     ! { dg-note {variable 'parm\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop }
@@ -116,6 +124,8 @@ contains
     !$acc kernels loop &
     !$acc gang(static:nonlocal_arg) worker(nonlocal_arg) vector(nonlocal_arg) &
     !$acc wait(nonlocal_arg, nonlocal_arg + 1, nonlocal_arg + 2) async ! { dg-line l_compute_loop[incr c_compute_loop] }
+    ! { dg-note {forwarded loop nest in OpenACC 'kernels' region to 'parloops' for analysis} {} { target *-*-* } l_compute_loop$c_compute_loop }
+    ! { dg-note {variable 'nonlocal_i\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop }
     ! { dg-note {variable 'nonlocal_i' in 'private' clause is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute_loop$c_compute_loop }
     ! { dg-note {variable 'nonlocal_i\.[0-9]+' in 'private' clause isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop }
     ! { dg-note {variable 'parm\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute_loop$c_compute_loop }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
index c82a7edbfa0..2c853971474 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c
@@ -1,5 +1,7 @@
 /* Test dispatch of events to callbacks.  */
 
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
+
 /* { dg-additional-options "-fopt-info-omp-all" }
    { dg-additional-options "-foffload=-fopt-info-omp-all" } */
 
@@ -74,7 +76,7 @@ static void cb_enqueue_launch_start (acc_prof_info *prof_info, acc_event_info *e
   assert (prof_info->device_type == acc_device_type);
   assert (prof_info->device_number == acc_device_num);
   assert (prof_info->thread_id == -1);
-  assert (prof_info->async == acc_async_sync);
+  assert (prof_info->async == acc_async_noval);
   assert (prof_info->async_queue == prof_info->async);
   assert (prof_info->src_file == NULL);
   assert (prof_info->func_name == NULL);
@@ -181,10 +183,13 @@ int main()
 #define N 100
     int x[N];
 #pragma acc kernels /* { dg-line l_compute[incr c_compute] } */
-    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
+    /* { dg-note {OpenACC 'kernels' decomposition: variable 'i' declared in block requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+       { dg-note {variable 'i' made addressable} {} { target *-*-* } l_compute$c_compute } */
+    /* { dg-note {variable 'i' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute$c_compute } */
     /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target { ! __OPTIMIZE__ } } l_compute$c_compute }
        { dg-optimized {assigned OpenACC gang loop parallelism} {} { target __OPTIMIZE__ } l_compute$c_compute } */
     {
+      /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
       for (int i = 0; i < N; ++i)
 	x[i] = i * i;
     }
@@ -208,11 +213,14 @@ int main()
     int x[N];
 #pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ \
   num_gangs (30) num_workers (3) vector_length (5)
-    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
+    /* { dg-note {OpenACC 'kernels' decomposition: variable 'i' declared in block requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+       { dg-note {variable 'i' made addressable} {} { target *-*-* } l_compute$c_compute } */
+    /* { dg-note {variable 'i' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute$c_compute } */
     /* { dg-warning {using 'vector_length \(32\)', ignoring 5} {} { target { __OPTIMIZE__ && openacc_nvidia_accel_selected } } l_compute$c_compute } */
     /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target { ! __OPTIMIZE__ } } l_compute$c_compute }
        { dg-optimized {assigned OpenACC gang loop parallelism} {} { target __OPTIMIZE__ } l_compute$c_compute } */
     {
+      /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
       for (int i = 0; i < N; ++i)
 	x[i] = i * i;
     }
@@ -236,11 +244,14 @@ int main()
     int x[N];
 #pragma acc kernels /* { dg-line l_compute[incr c_compute] } */ \
   num_gangs (num_gangs) num_workers (num_workers) vector_length (vector_length)
-    /* { dg-note {variable 'i' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} {} { target *-*-* } l_compute$c_compute } */
+    /* { dg-note {OpenACC 'kernels' decomposition: variable 'i' declared in block requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+       { dg-note {variable 'i' made addressable} {} { target *-*-* } l_compute$c_compute } */
+    /* { dg-note {variable 'i' declared in block is candidate for adjusting OpenACC privatization level} {} { target *-*-* } l_compute$c_compute } */
     /* { dg-warning {using 'vector_length \(32\)', ignoring runtime setting} {} { target { __OPTIMIZE__ && openacc_nvidia_accel_selected } } l_compute$c_compute } */
     /* { dg-optimized {assigned OpenACC seq loop parallelism} {} { target { ! __OPTIMIZE__ } } l_compute$c_compute }
        { dg-optimized {assigned OpenACC gang loop parallelism} {} { target __OPTIMIZE__ } l_compute$c_compute } */
     {
+      /* { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} {} { target *-*-* } .+1 } */
       for (int i = 0; i < N; ++i)
 	x[i] = i * i;
     }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
index 88258be16bd..e513946ea71 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c
@@ -1,3 +1,6 @@
+/* { dg-additional-options "--param=openacc-kernels=decompose" } */
+
 /* { dg-additional-options "-g" } */
+/*TODO PR100400 { dg-additional-options -fcompare-debug } */
 
 #include "kernels-loop.c"
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/if-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
index 3c4d9a6efb7..c6d67647d4a 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
@@ -1,6 +1,8 @@
 ! { dg-do run }
 ! { dg-additional-options "-cpp" }
 
+! { dg-additional-options "--param=openacc-kernels=decompose" }
+
 ! { dg-additional-options "-fopt-info-note-omp" }
 ! { dg-additional-options "-foffload=-fopt-info-note-omp" }
 
@@ -490,6 +492,9 @@ program main
   a(:) = 4.0
 
   !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if (1 == 1) ! { dg-line l_compute[incr c_compute] }
+  ! { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+  !   { dg-note {variable 'i' made addressable} {} { target *-*-* } l_compute$c_compute } */
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }
      do i = 1, N
         ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_compute$c_compute }
         if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
@@ -513,6 +518,9 @@ program main
   a(:) = 16.0
 
   !$acc kernels if (0 == 1) ! { dg-line l_compute[incr c_compute] }
+  ! { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+  !   { dg-note {variable 'i' already made addressable} {} { target *-*-* } l_compute$c_compute } */
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }
      do i = 1, N
         ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_compute$c_compute }
        if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
@@ -530,6 +538,9 @@ program main
   a(:) = 8.0
 
   !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if (one == 1) ! { dg-line l_compute[incr c_compute] }
+  ! { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+  !   { dg-note {variable 'i' already made addressable} {} { target *-*-* } l_compute$c_compute } */
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }
     do i = 1, N
        ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_compute$c_compute }
       if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
@@ -553,6 +564,9 @@ program main
   a(:) = 22.0
 
   !$acc kernels if (zero == 1) ! { dg-line l_compute[incr c_compute] }
+  ! { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+  !   { dg-note {variable 'i' already made addressable} {} { target *-*-* } l_compute$c_compute } */
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }
     do i = 1, N
        ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_compute$c_compute }
       if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
@@ -570,6 +584,9 @@ program main
   a(:) = 16.0
 
   !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if (.TRUE.) ! { dg-line l_compute[incr c_compute] }
+  ! { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+  !   { dg-note {variable 'i' already made addressable} {} { target *-*-* } l_compute$c_compute } */
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }
     do i = 1, N
        ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_compute$c_compute }
       if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
@@ -593,6 +610,9 @@ program main
   a(:) = 76.0
 
   !$acc kernels if (.FALSE.) ! { dg-line l_compute[incr c_compute] }
+  ! { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+  !   { dg-note {variable 'i' already made addressable} {} { target *-*-* } l_compute$c_compute } */
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }
     do i = 1, N
        ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_compute$c_compute }
       if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
@@ -612,6 +632,9 @@ program main
   nn = 1
 
   !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if (nn == 1) ! { dg-line l_compute[incr c_compute] }
+  ! { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+  !   { dg-note {variable 'i' already made addressable} {} { target *-*-* } l_compute$c_compute } */
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }
     do i = 1, N
        ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_compute$c_compute }
       if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
@@ -637,6 +660,9 @@ program main
   nn = 0
 
   !$acc kernels if (nn == 1) ! { dg-line l_compute[incr c_compute] }
+  ! { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+  !   { dg-note {variable 'i' already made addressable} {} { target *-*-* } l_compute$c_compute } */
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }
     do i = 1, N
        ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_compute$c_compute }
       if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
@@ -656,6 +682,9 @@ program main
   nn = 1
 
   !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if ((nn + nn) > 0) ! { dg-line l_compute[incr c_compute] }
+  ! { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+  !   { dg-note {variable 'i' already made addressable} {} { target *-*-* } l_compute$c_compute } */
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }
     do i = 1, N
        ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_compute$c_compute }
       if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
@@ -681,6 +710,9 @@ program main
   nn = 0;
 
   !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if ((nn + nn) > 0) ! { dg-line l_compute[incr c_compute] }
+  ! { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+  !   { dg-note {variable 'i' already made addressable} {} { target *-*-* } l_compute$c_compute } */
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }
     do i = 1, N
        ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_compute$c_compute }
       if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
@@ -698,6 +730,9 @@ program main
   a(:) = 91.0
 
   !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if (-2 > 0) ! { dg-line l_compute[incr c_compute] }
+  ! { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+  !   { dg-note {variable 'i' already made addressable} {} { target *-*-* } l_compute$c_compute } */
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }
     do i = 1, N
        ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_compute$c_compute }
       if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
@@ -715,6 +750,9 @@ program main
   a(:) = 43.0
 
   !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if (one == 1) ! { dg-line l_compute[incr c_compute] }
+  ! { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+  !   { dg-note {variable 'i' already made addressable} {} { target *-*-* } l_compute$c_compute } */
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }
     do i = 1, N
        ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_compute$c_compute }
       if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
@@ -738,6 +776,9 @@ program main
   a(:) = 87.0
 
   !$acc kernels if (one == 0) ! { dg-line l_compute[incr c_compute] }
+  ! { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+  !   { dg-note {variable 'i' already made addressable} {} { target *-*-* } l_compute$c_compute } */
+  ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }
     do i = 1, N
       ! { dg-note {variable 'C\.[0-9]+' declared in block potentially has improper OpenACC privatization level: 'const_decl'} "TODO" { target *-*-* } l_compute$c_compute }
       if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
@@ -818,7 +859,10 @@ program main
   !$acc data copyin (a(1:N)) copyout (b(1:N)) if (1 == 1)
   ! { dg-note {variable 'parm\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } .-1 }
 
-    !$acc kernels present (a(1:N))
+    !$acc kernels present (a(1:N)) ! { dg-line l_compute[incr c_compute] }
+    ! { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+    !   { dg-note {variable 'i' already made addressable} {} { target *-*-* } l_compute$c_compute } */
+    ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }
        do i = 1, N
            b(i) = a(i)
        end do
@@ -862,7 +906,10 @@ program main
         !$acc data copyout (b(1:N)) if (1 == 1)
         ! { dg-note {variable 'parm\.[0-9]+' declared in block isn't candidate for adjusting OpenACC privatization level: not addressable} "" { target *-*-* } .-1 }
 
-        !$acc kernels present (a(1:N)) present (b(1:N))
+        !$acc kernels present (a(1:N)) present (b(1:N)) ! { dg-line l_compute[incr c_compute] }
+        ! { dg-note {OpenACC 'kernels' decomposition: variable 'i' in 'copy' clause requested to be made addressable} {} { target *-*-* } l_compute$c_compute }
+        !   { dg-note {variable 'i' already made addressable} {} { target *-*-* } l_compute$c_compute } */
+        ! { dg-note {beginning 'parloops' part in OpenACC 'kernels' region} "" { target *-*-* } .+1 }
           do i = 1, N
             b(i) = a(i)
           end do
-- 
2.34.1


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-03-17  8:05 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-01  0:00 [og8] OpenACC 'kernels' construct changes: splitting of the construct into several regions Thomas Schwinge
2019-02-01 19:48 ` Thomas Schwinge
2019-07-17 21:03 ` [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions Kwok Cheung Yeung
2019-07-17 21:04   ` [PATCH 01/10, OpenACC] Use "-fopenacc-kernels=parloops" to document "parloops" test cases Kwok Cheung Yeung
2019-07-17 21:05   ` [PATCH 02/10, OpenACC] Add OpenACC target kinds for decomposed kernels regions Kwok Cheung Yeung
2019-07-18  9:28     ` Jakub Jelinek
2019-08-05 22:31       ` Kwok Cheung Yeung
2019-07-17 21:06   ` [PATCH 03/10, OpenACC] Separate OpenACC kernels regions in data and parallel parts Kwok Cheung Yeung
2019-07-17 21:11   ` [PATCH 04/10, OpenACC] Turn OpenACC kernels regions into a sequence of, parallel regions Kwok Cheung Yeung
2019-07-18 10:09     ` Jakub Jelinek
2019-08-05 21:58       ` Kwok Cheung Yeung
2020-11-13 22:33         ` In 'gcc/omp-oacc-kernels-decompose.cc', use langhook instead of accessing language-specific decl information (was: [PATCH 04/10, OpenACC] Turn OpenACC kernels regions into a sequence of, parallel regions) Thomas Schwinge
2019-07-17 21:12   ` [PATCH 05/10, OpenACC] Handle conditional execution of loops in OpenACC, kernels regions Kwok Cheung Yeung
2019-07-17 21:13   ` [PATCH 06/10, OpenACC] Adjust parallelism of loops in gang-single parts of OpenACC " Kwok Cheung Yeung
2019-08-05 22:17     ` Kwok Cheung Yeung
2019-07-17 21:24   ` [PATCH 07/10, OpenACC] Launch kernels asynchronously in " Kwok Cheung Yeung
2019-07-17 21:30   ` [PATCH 08/10, OpenACC] New OpenACC kernels region decompose algorithm Kwok Cheung Yeung
2019-07-17 21:32   ` [PATCH 09/10, OpenACC] Avoid introducing 'create' mapping clauses for loop index variables in kernels regions Kwok Cheung Yeung
2019-07-17 21:37   ` [PATCH 10/10, OpenACC] Make new OpenACC kernels conversion the default; adjust and add tests Kwok Cheung Yeung
2019-07-18  9:24   ` [PATCH 00/10, OpenACC] Rework handling of OpenACC kernels regions Jakub Jelinek
2020-11-13 22:22 ` Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs (was: [og8] OpenACC 'kernels' construct changes: splitting of the construct into several regions) Thomas Schwinge
2020-11-15  9:14   ` Tobias Burnus
2021-04-19  8:29     ` Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs Thomas Schwinge
2021-04-19 12:38       ` Thomas Schwinge
2020-11-27 13:50   ` Thomas Schwinge
2022-01-13  9:44   ` Enhance OpenACC 'kernels' decomposition testing (was: Decompose OpenACC 'kernels' constructs into parts, a sequence of compute constructs) Thomas Schwinge
2022-01-19 22:29   ` Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition [PR100400, PR103836, PR104061] " Thomas Schwinge
2022-01-19 23:00     ` Jakub Jelinek
2022-01-20  8:26       ` Catch 'GIMPLE_DEBUG' misbehavior in OpenACC 'kernels' decomposition [PR100400, PR103836, PR104061] Thomas Schwinge
2022-01-20  9:58         ` Jakub Jelinek
2022-03-12 12:38   ` Add 'c-c++-common/goacc/kernels-decompose-pr104086-1.c' [PR104086] Thomas Schwinge
2022-03-12 12:42     ` OpenACC 'kernels' decomposition: Mark variables used in 'present' clauses as addressable [PR100280, PR104086] Thomas Schwinge
2022-03-17  8:04   ` Enhance further testcases to verify Openacc 'kernels' decomposition Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).