public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Update OpenACC test cases
@ 2016-03-30 14:22 Thomas Schwinge
  2016-03-30 14:38 ` Jakub Jelinek
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Schwinge @ 2016-03-30 14:22 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek

Hi!

This is to integrate into trunk a large amount of the test case updates
that we have accumulated on gomp-4_0-branch.  OK to commit?

	gcc/testsuite/
	* c-c++-common/goacc/combined-directives.c: Clean up dg-*
	directives.
	* c-c++-common/goacc/loop-clauses.c: Likewise.
	* g++.dg/goacc/template.C: Likewise.
	* gfortran.dg/goacc/combined-directives.f90: Likewise.
	* gfortran.dg/goacc/loop-1.f95: Likewise.
	* gfortran.dg/goacc/loop-5.f95: Likewise.
	* gfortran.dg/goacc/loop-6.f95: Likewise.
	* gfortran.dg/goacc/loop-tree-1.f90: Likewise.
	* c-c++-common/goacc-gomp/nesting-1.c: Update.
	* c-c++-common/goacc-gomp/nesting-fail-1.c: Likewise.
	* c-c++-common/goacc/clauses-fail.c: Likewise.
	* c-c++-common/goacc/parallel-1.c: Likewise.
	* c-c++-common/goacc/reduction-1.c: Likewise.
	* c-c++-common/goacc/reduction-2.c: Likewise.
	* c-c++-common/goacc/reduction-3.c: Likewise.
	* c-c++-common/goacc/reduction-4.c: Likewise.
	* c-c++-common/goacc/routine-3.c: Likewise.
	* c-c++-common/goacc/routine-4.c: Likewise.
	* c-c++-common/goacc/routine-5.c: Likewise.
	* c-c++-common/goacc/tile.c: Likewise.
	* g++.dg/goacc/template.C: Likewise.
	* gfortran.dg/goacc/combined-directives.f90: Likewise.
	* c-c++-common/goacc/nesting-1.c: Move dg-error test cases into...
	* c-c++-common/goacc/nesting-fail-1.c: ... this file.  Update.
	* c-c++-common/goacc/kernels-1.c: Update.  Incorporate...
	* c-c++-common/goacc/kernels-empty.c: ... this file, and...
	* c-c++-common/goacc/kernels-eternal.c: ... this file, and...
	* c-c++-common/goacc/kernels-noreturn.c: ... this file.
	* c-c++-common/goacc/host_data-1.c: New file.  Incorporate...
	* c-c++-common/goacc/use_device-1.c: ... this file.
	* c-c++-common/goacc/host_data-2.c: New file.  Incorporate...
	* c-c++-common/goacc/host_data-5.c: ... this file, and...
	* c-c++-common/goacc/host_data-6.c: ... this file.
	* c-c++-common/goacc/loop-2-kernels.c: New file.
	* c-c++-common/goacc/loop-2-parallel.c: Likewise.
	* c-c++-common/goacc/loop-3.c: Likewise.
	* g++.dg/goacc/reference.C: Likewise.
	* g++.dg/goacc/routine-1.C: Likewise.
	* g++.dg/goacc/routine-2.C: Likewise.
	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/clauses-1.c: Update.
	* testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-loop.c: Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/declare-1.f90: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Likewise.
	XFAIL.
	* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: Update.
	Incorporate...
	* testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c: ... this
	file.
	* testsuite/libgomp.oacc-c++/template-reduction.C: New file.
	* testsuite/libgomp.oacc-c-c++-common/gang-static-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/gang-static-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-clauses.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/private-variables.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c: Likewise.
	* testsuite/libgomp.oacc-fortran/clauses-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/default-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/firstprivate-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/gang-static-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/implicit-firstprivate-ref.f90:
	Likewise.
	* testsuite/libgomp.oacc-fortran/pr68813.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/private-variables.f90: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Merge this
	file...
	* testsuite/libgomp.oacc-c-c++-common/parallel-1.c: ..., and this
	file into...
	* testsuite/libgomp.oacc-c-c++-common/data-clauses.h: ... this new
	file.  Update.
	* testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c: New
	file.
	* testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-2.c: Rename to...
	* testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c:
	... this new file.  Update.
	* testsuite/libgomp.oacc-c-c++-common/parallel-2.c: Rename to...
	* testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c:
	... this new file.  Update.
	* testsuite/libgomp.oacc-c-c++-common/mode-transitions.c: New
	file.  Incorporate...
	* testsuite/libgomp.oacc-c-c++-common/worker-single-1a.c: ... this
	file, and...
	* testsuite/libgomp.oacc-c-c++-common/worker-single-4.c: ... this
	file, and...
	* testsuite/libgomp.oacc-c-c++-common/worker-single-6.c: ... this
	file.
	* testsuite/libgomp.oacc-c-c++-common/update-1-2.c: Remove file.

 gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c  |   2 +-
 .../c-c++-common/goacc-gomp/nesting-fail-1.c       |  36 +-
 gcc/testsuite/c-c++-common/goacc/clauses-fail.c    |  12 +
 .../c-c++-common/goacc/combined-directives.c       |   7 +-
 .../goacc/{use_device-1.c => host_data-1.c}        |  12 +-
 gcc/testsuite/c-c++-common/goacc/host_data-2.c     |  78 ++
 gcc/testsuite/c-c++-common/goacc/host_data-5.c     |  23 -
 gcc/testsuite/c-c++-common/goacc/host_data-6.c     |  25 -
 gcc/testsuite/c-c++-common/goacc/kernels-1.c       |  43 +-
 gcc/testsuite/c-c++-common/goacc/kernels-empty.c   |   6 -
 gcc/testsuite/c-c++-common/goacc/kernels-eternal.c |  11 -
 .../c-c++-common/goacc/kernels-noreturn.c          |  12 -
 gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c  | 189 ++++
 gcc/testsuite/c-c++-common/goacc/loop-2-parallel.c | 162 ++++
 gcc/testsuite/c-c++-common/goacc/loop-3.c          |  58 ++
 gcc/testsuite/c-c++-common/goacc/loop-clauses.c    |   4 -
 gcc/testsuite/c-c++-common/goacc/nesting-1.c       |   8 -
 gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c  |  29 +
 gcc/testsuite/c-c++-common/goacc/parallel-1.c      |  36 +-
 gcc/testsuite/c-c++-common/goacc/reduction-1.c     |  57 +-
 gcc/testsuite/c-c++-common/goacc/reduction-2.c     |  42 +-
 gcc/testsuite/c-c++-common/goacc/reduction-3.c     |  42 +-
 gcc/testsuite/c-c++-common/goacc/reduction-4.c     |  40 +-
 gcc/testsuite/c-c++-common/goacc/routine-3.c       | 128 ++-
 gcc/testsuite/c-c++-common/goacc/routine-4.c       |  73 ++
 gcc/testsuite/c-c++-common/goacc/routine-5.c       |  15 +
 gcc/testsuite/c-c++-common/goacc/tile.c            | 258 +++++-
 gcc/testsuite/g++.dg/goacc/reference.C             |  39 +
 gcc/testsuite/g++.dg/goacc/routine-1.C             |  13 +
 gcc/testsuite/g++.dg/goacc/routine-2.C             |  42 +
 gcc/testsuite/g++.dg/goacc/template.C              |  81 +-
 .../gfortran.dg/goacc/combined-directives.f90      |  29 +-
 gcc/testsuite/gfortran.dg/goacc/loop-1.f95         |  15 +-
 gcc/testsuite/gfortran.dg/goacc/loop-5.f95         |   6 -
 gcc/testsuite/gfortran.dg/goacc/loop-6.f95         |   8 -
 gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90    |   6 -
 .../libgomp.oacc-c++/template-reduction.C          |  98 +++
 .../libgomp.oacc-c-c++-common/asyncwait-1.c        | 434 ++++++++++
 .../libgomp.oacc-c-c++-common/clauses-1.c          |  26 +
 ...parallel-2.c => data-clauses-kernels-ipa-pta.c} |   2 +-
 .../data-clauses-kernels.c                         |   2 +
 ...kernels-2.c => data-clauses-parallel-ipa-pta.c} |   2 +-
 .../data-clauses-parallel.c                        |   2 +
 .../{parallel-1.c => data-clauses.h}               |  92 +-
 .../libgomp.oacc-c-c++-common/deviceptr-1.c        |  23 +-
 .../libgomp.oacc-c-c++-common/firstprivate-1.c     | 114 ++-
 .../libgomp.oacc-c-c++-common/firstprivate-2.c     |  31 -
 .../libgomp.oacc-c-c++-common/gang-static-1.c      |  48 ++
 .../libgomp.oacc-c-c++-common/gang-static-2.c      | 100 +++
 libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c | 354 +++++++-
 .../libgomp.oacc-c-c++-common/kernels-1.c          | 184 ----
 .../kernels-loop-clauses.c                         |  62 ++
 .../libgomp.oacc-c-c++-common/mode-transitions.c   | 895 +++++++++++++++++++
 .../libgomp.oacc-c-c++-common/private-variables.c  | 953 +++++++++++++++++++++
 .../libgomp.oacc-c-c++-common/reduction-7.c        | 129 +++
 .../libgomp.oacc-c-c++-common/routine-1.c          |  88 ++
 .../libgomp.oacc-c-c++-common/routine-4.c          | 123 +++
 .../libgomp.oacc-c-c++-common/routine-wv-2.c       |  76 ++
 .../libgomp.oacc-c-c++-common/update-1-2.c         | 361 --------
 .../libgomp.oacc-c-c++-common/vector-loop.c        |   2 +-
 .../libgomp.oacc-c-c++-common/worker-single-1a.c   |  28 -
 .../libgomp.oacc-c-c++-common/worker-single-4.c    |  28 -
 .../libgomp.oacc-c-c++-common/worker-single-6.c    |  46 -
 .../testsuite/libgomp.oacc-fortran/asyncwait-1.f90 | 122 +++
 .../testsuite/libgomp.oacc-fortran/asyncwait-2.f90 |  29 +-
 .../testsuite/libgomp.oacc-fortran/asyncwait-3.f90 |  31 +-
 .../testsuite/libgomp.oacc-fortran/clauses-1.f90   | 290 +++++++
 .../testsuite/libgomp.oacc-fortran/declare-1.f90   |  41 +-
 .../testsuite/libgomp.oacc-fortran/default-1.f90   |  54 ++
 .../libgomp.oacc-fortran/firstprivate-1.f90        |  42 +
 .../libgomp.oacc-fortran/gang-static-1.f90         |  79 ++
 libgomp/testsuite/libgomp.oacc-fortran/if-1.f90    | 886 +++++++++++++++++++
 .../implicit-firstprivate-ref.f90                  |  42 +
 libgomp/testsuite/libgomp.oacc-fortran/pr68813.f90 |  19 +
 .../libgomp.oacc-fortran/private-variables.f90     | 544 ++++++++++++
 75 files changed, 7017 insertions(+), 1112 deletions(-)

diff --git gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
index dabba8c..aaf0e7a 100644
--- gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
+++ gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
@@ -20,12 +20,12 @@ f_acc_kernels (void)
   }
 }
 
+#pragma acc routine vector
 void
 f_acc_loop (void)
 {
   int i;
 
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
diff --git gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
index 5e3f183..1a33242 100644
--- gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
+++ gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
@@ -1,4 +1,5 @@
 extern int i;
+#pragma acc declare create(i)
 
 void
 f_omp (void)
@@ -14,6 +15,9 @@ f_omp (void)
 #pragma acc update host(i) /* { dg-error "OpenACC construct inside of non-OpenACC region" } */
 #pragma acc enter data copyin(i) /* { dg-error "OpenACC construct inside of non-OpenACC region" } */
 #pragma acc exit data delete(i) /* { dg-error "OpenACC construct inside of non-OpenACC region" } */
+#pragma acc loop /* { dg-error "loop directive must be associated with an OpenACC compute region" } */
+    for (i = 0; i < 2; ++i)
+      ;
   }
 
 #pragma omp for
@@ -358,85 +362,77 @@ f_acc_data (void)
   }
 }
 
+#pragma acc routine
 void
 f_acc_loop (void)
 {
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp parallel /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp parallel /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       ;
     }
 
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp for /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp for /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       for (i = 0; i < 3; i++)
 	;
     }
 
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp sections /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp sections /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       {
 	;
       }
     }
 
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp single /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp single /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       ;
     }
 
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp task /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp task /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       ;
     }
 
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp master /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp master /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       ;
     }
 
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp critical /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp critical /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       ;
     }
 
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp ordered /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp ordered /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       ;
     }
 
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp target /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp target /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       ;
-#pragma omp target data map(i) /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp target data map(i) /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       ;
-#pragma omp target update to(i) /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp target update to(i) /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
     }
 }
 
diff --git gcc/testsuite/c-c++-common/goacc/clauses-fail.c gcc/testsuite/c-c++-common/goacc/clauses-fail.c
index 661d364..853d010 100644
--- gcc/testsuite/c-c++-common/goacc/clauses-fail.c
+++ gcc/testsuite/c-c++-common/goacc/clauses-fail.c
@@ -1,3 +1,5 @@
+/* Miscellaneous tests where clause parsing is expected to fail.  */
+
 void
 f (void)
 {
@@ -17,3 +19,13 @@ f (void)
   for (i = 0; i < 2; ++i)
     ;
 }
+
+
+void
+f2 (void)
+{
+  int a, b[100];
+
+#pragma acc parallel firstprivate (b[10:20]) /* { dg-error "expected ... before ... token" } */
+  ;
+}
diff --git gcc/testsuite/c-c++-common/goacc/combined-directives.c gcc/testsuite/c-c++-common/goacc/combined-directives.c
index c387285..c2a3c57 100644
--- gcc/testsuite/c-c++-common/goacc/combined-directives.c
+++ gcc/testsuite/c-c++-common/goacc/combined-directives.c
@@ -1,10 +1,7 @@
-// { dg-do compile }
-// { dg-options "-fopenacc -fdump-tree-gimple" }
+// { dg-additional-options "-fdump-tree-gimple" }
 
-// This error is temporary.  Remove when support is added for these clauses
-// in the middle end.  Also remove the comments from the reduction test
+// Remove the comments from the reduction test
 // after the FE learns that reduction variables may appear in data clauses too.
-// { dg-prune-output "sorry, unimplemented" }
 
 void
 test ()
diff --git gcc/testsuite/c-c++-common/goacc/use_device-1.c gcc/testsuite/c-c++-common/goacc/host_data-1.c
similarity index 61%
rename from gcc/testsuite/c-c++-common/goacc/use_device-1.c
rename to gcc/testsuite/c-c++-common/goacc/host_data-1.c
index 9a4f6d0..0c7a857 100644
--- gcc/testsuite/c-c++-common/goacc/use_device-1.c
+++ gcc/testsuite/c-c++-common/goacc/host_data-1.c
@@ -1,4 +1,14 @@
-/* { dg-do compile } */
+/* Test valid use of host_data directive.  */
+
+int v1[3][3];
+
+void
+f (void)
+{
+#pragma acc host_data use_device(v1)
+  ;
+}
+
 
 void bar (float *, float *);
 
diff --git gcc/testsuite/c-c++-common/goacc/host_data-2.c gcc/testsuite/c-c++-common/goacc/host_data-2.c
new file mode 100644
index 0000000..bdce424
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/host_data-2.c
@@ -0,0 +1,78 @@
+/* Test invalid use of host_data directive.  */
+
+int v0;
+#pragma acc host_data use_device(v0) /* { dg-error "expected declaration specifiers before" } */
+
+
+void
+f (void)
+{
+  int v2 = 3;
+#pragma acc host_data copy(v2) /* { dg-error ".copy. is not valid for ..pragma acc host_data." } */
+  ;
+
+#pragma acc host_data use_device(v2)
+  ;
+  /* { dg-error ".use_device_ptr. variable is neither a pointer nor an array" "" { target c } 14 } */
+  /* { dg-error ".use_device_ptr. variable is neither a pointer, nor an arraynor reference to pointer or array" "" { target c++ } 14 } */
+  
+#pragma acc host_data use_device(v0)
+  ;
+  /* { dg-error ".use_device_ptr. variable is neither a pointer nor an array" "" { target c } 19 } */
+  /* { dg-error ".use_device_ptr. variable is neither a pointer, nor an arraynor reference to pointer or array" "" { target c++ } 19 } */
+}
+
+
+void
+f2 (void)
+{
+  int x[100];
+
+#pragma acc enter data copyin (x)
+  /* Specifying an array index is not valid for host_data/use_device.  */
+#pragma acc host_data use_device (x[4]) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  ;
+#pragma acc exit data delete (x)
+}
+
+
+void
+f3 (void)
+{
+  int x[100];
+
+#pragma acc data copyin (x[25:50])
+  {
+    int *xp;
+#pragma acc host_data use_device (x)
+    {
+      /* This use of the present clause is undefined behavior for OpenACC.  */
+#pragma acc parallel present (x) copyout (xp) /* { dg-error "variable .x. declared in enclosing .host_data. region" } */
+      {
+        xp = x;
+      }
+    }
+  }
+}
+
+
+void
+f4 (void)
+{
+  int x[50];
+
+#pragma acc data copyin (x[10:30])
+  {
+    int *xp;
+#pragma acc host_data use_device (x)
+    {
+      /* Here 'x' being implicitly firstprivate for the parallel region
+	 conflicts with it being declared as use_device in the enclosing
+	 host_data region.  */
+#pragma acc parallel copyout (xp)
+      {
+        xp = x; /* { dg-error "variable .x. declared in enclosing .host_data. region" } */
+      }
+    }
+  }
+}
diff --git gcc/testsuite/c-c++-common/goacc/host_data-5.c gcc/testsuite/c-c++-common/goacc/host_data-5.c
deleted file mode 100644
index a4206c8..0000000
--- gcc/testsuite/c-c++-common/goacc/host_data-5.c
+++ /dev/null
@@ -1,23 +0,0 @@
-/* { dg-do compile } */
-
-#define N 1024
-
-int main (int argc, char* argv[])
-{
-  int x[N];
-
-#pragma acc data copyin (x[0:N])
-  {
-    int *xp;
-#pragma acc host_data use_device (x)
-    {
-      /* This use of the present clause is undefined behavior for OpenACC.  */
-#pragma acc parallel present (x) copyout (xp) /* { dg-error "variable 'x' declared in enclosing 'host_data' region" } */
-      {
-        xp = x;
-      }
-    }
-  }
-
-  return 0;
-}
diff --git gcc/testsuite/c-c++-common/goacc/host_data-6.c gcc/testsuite/c-c++-common/goacc/host_data-6.c
deleted file mode 100644
index 8be7912..0000000
--- gcc/testsuite/c-c++-common/goacc/host_data-6.c
+++ /dev/null
@@ -1,25 +0,0 @@
-/* { dg-do compile } */
-
-#define N 1024
-
-int main (int argc, char* argv[])
-{
-  int x[N];
-
-#pragma acc data copyin (x[0:N])
-  {
-    int *xp;
-#pragma acc host_data use_device (x)
-    {
-      /* Here 'x' being implicitly firstprivate for the parallel region
-	 conflicts with it being declared as use_device in the enclosing
-	 host_data region.  */
-#pragma acc parallel copyout (xp)
-      {
-        xp = x; /* { dg-error "variable 'x' declared in enclosing 'host_data' region" } */
-      }
-    }
-  }
-
-  return 0;
-}
diff --git gcc/testsuite/c-c++-common/goacc/kernels-1.c gcc/testsuite/c-c++-common/goacc/kernels-1.c
index e91b81c..4fcf86e 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-1.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-1.c
@@ -1,6 +1,45 @@
-void
-foo (void)
+int
+kernels_empty (void)
 {
 #pragma acc kernels
   ;
+
+  return 0;
+}
+
+int
+kernels_eternal (void)
+{
+#pragma acc kernels
+  {
+    while (1)
+      ;
+  }
+
+  return 0;
+}
+
+int
+kernels_noreturn (void)
+{
+#pragma acc kernels
+  __builtin_abort ();
+
+  return 0;
+}
+
+
+float b[10][15][10];
+
+void
+kernels_loop_ptr_it (void)
+{
+  float *i;
+
+#pragma acc kernels
+  {
+#pragma acc loop
+    for (i = &b[0][0][0]; i < &b[0][0][10]; i++)
+      ;
+  }
 }
diff --git gcc/testsuite/c-c++-common/goacc/kernels-empty.c gcc/testsuite/c-c++-common/goacc/kernels-empty.c
deleted file mode 100644
index e91b81c..0000000
--- gcc/testsuite/c-c++-common/goacc/kernels-empty.c
+++ /dev/null
@@ -1,6 +0,0 @@
-void
-foo (void)
-{
-#pragma acc kernels
-  ;
-}
diff --git gcc/testsuite/c-c++-common/goacc/kernels-eternal.c gcc/testsuite/c-c++-common/goacc/kernels-eternal.c
deleted file mode 100644
index edc17d2..0000000
--- gcc/testsuite/c-c++-common/goacc/kernels-eternal.c
+++ /dev/null
@@ -1,11 +0,0 @@
-int
-main (void)
-{
-#pragma acc kernels
-  {
-    while (1)
-      ;
-  }
-
-  return 0;
-}
diff --git gcc/testsuite/c-c++-common/goacc/kernels-noreturn.c gcc/testsuite/c-c++-common/goacc/kernels-noreturn.c
deleted file mode 100644
index 1a8cc67..0000000
--- gcc/testsuite/c-c++-common/goacc/kernels-noreturn.c
+++ /dev/null
@@ -1,12 +0,0 @@
-int
-main (void)
-{
-
-#pragma acc kernels
-  {
-    __builtin_abort ();
-  }
-
-  return 0;
-}
-
diff --git gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
new file mode 100644
index 0000000..01ad32d
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
@@ -0,0 +1,189 @@
+void K(void)
+{
+  int i, j;
+
+#pragma acc kernels
+  {
+#pragma acc loop auto
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang(5)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang(num:5)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang(static:5)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang(static:*)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang
+    for (i = 0; i < 10; i++)
+      {
+#pragma acc loop vector 
+	for (j = 0; j < 10; j++)
+	  { }
+#pragma acc loop worker 
+	for (j = 0; j < 10; j++)
+	  { }
+#pragma acc loop gang // { dg-error "inner loop uses same" }
+	for (j = 0; j < 10; j++)
+	  { }
+      }
+#pragma acc loop seq gang // { dg-error "'seq' overrides" }
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop worker
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker(5)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker(num:5)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker
+    for (i = 0; i < 10; i++)
+      {
+#pragma acc loop vector 
+	for (j = 0; j < 10; j++)
+	  { }
+#pragma acc loop worker // { dg-error "inner loop uses same" }
+	for (j = 0; j < 10; j++)
+	  { }
+#pragma acc loop gang
+	for (j = 0; j < 10; j++)
+	  { }
+      }
+#pragma acc loop seq worker // { dg-error "'seq' overrides" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang worker
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop vector
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector(5)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector(length:5)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector
+    for (i = 0; i < 10; i++)
+      {
+#pragma acc loop vector // { dg-error "inner loop uses same" }
+	for (j = 1; j < 10; j++)
+	  { }
+#pragma acc loop worker
+	for (j = 1; j < 10; j++)
+	  { }
+#pragma acc loop gang
+	for (j = 1; j < 10; j++)
+	  { }
+      }
+#pragma acc loop seq vector // { dg-error "'seq' overrides" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang vector
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker vector
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop auto
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop seq auto // { dg-error "'seq' overrides" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang auto // { dg-error "'auto' conflicts" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker auto // { dg-error "'auto' conflicts" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector auto // { dg-error "'auto' conflicts" }
+    for (i = 0; i < 10; i++)
+      { }
+  }
+
+#pragma acc kernels loop auto
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop gang
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop gang(5)
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop gang(num:5)
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop gang(static:5)
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop gang(static:*)
+  for (i = 0; i < 10; i++)
+    { }
+
+#pragma acc kernels loop worker
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop worker(5)
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop worker(num:5)
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop seq worker // { dg-error "'seq' overrides" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+    { }
+#pragma acc kernels loop gang worker
+  for (i = 0; i < 10; i++)
+    { }
+
+#pragma acc kernels loop vector
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop vector(5)
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop vector(length:5)
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop seq vector // { dg-error "'seq' overrides" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+    { }
+#pragma acc kernels loop gang vector
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop worker vector
+  for (i = 0; i < 10; i++)
+    { }
+
+#pragma acc kernels loop auto
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop seq auto // { dg-error "'seq' overrides" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+    { }
+#pragma acc kernels loop gang auto // { dg-error "'auto' conflicts" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'auto' conflicts" "" { target c++ } }
+    { }
+#pragma acc kernels loop worker auto // { dg-error "'auto' conflicts" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'auto' conflicts" "" { target c++ } }
+    { }
+#pragma acc kernels loop vector auto // { dg-error "'auto' conflicts" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'auto' conflicts" "" { target c++ } }
+    { }
+}
diff --git gcc/testsuite/c-c++-common/goacc/loop-2-parallel.c gcc/testsuite/c-c++-common/goacc/loop-2-parallel.c
new file mode 100644
index 0000000..0ef5741
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/loop-2-parallel.c
@@ -0,0 +1,162 @@
+void P(void)
+{
+  int i, j;
+
+#pragma acc parallel
+  {
+#pragma acc loop auto
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang(static:5)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang(static:*)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang // { dg-message "containing loop" }
+    for (i = 0; i < 10; i++)
+      {
+#pragma acc loop vector
+	for (j = 1; j < 10; j++)
+	  { }
+#pragma acc loop worker 
+	for (j = 1; j < 10; j++)
+	  { }
+#pragma acc loop gang // { dg-error "inner loop uses same" }
+	for (j = 1; j < 10; j++)
+	  { }
+      }
+#pragma acc loop seq gang // { dg-error "'seq' overrides" }
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop worker
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker // { dg-message "containing loop" 2 }
+    for (i = 0; i < 10; i++)
+      {
+#pragma acc loop vector 
+	for (j = 1; j < 10; j++)
+	  { }
+#pragma acc loop worker // { dg-error "inner loop uses same" }
+	for (j = 1; j < 10; j++)
+	  { }
+#pragma acc loop gang // { dg-error "incorrectly nested" }
+	for (j = 1; j < 10; j++)
+	  { }
+      }
+#pragma acc loop seq worker // { dg-error "'seq' overrides" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang worker
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop vector
+    for (i = 0; i < 10; i++)
+      { }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector // { dg-message "containing loop" 3 }
+    for (i = 0; i < 10; i++)
+      {
+#pragma acc loop vector // { dg-error "inner loop uses same" }
+	for (j = 1; j < 10; j++)
+	  { }
+#pragma acc loop worker // { dg-error "incorrectly nested" }
+	for (j = 1; j < 10; j++)
+	  { }
+#pragma acc loop gang // { dg-error "incorrectly nested" }
+	for (j = 1; j < 10; j++)
+	  { }
+      }
+#pragma acc loop seq vector // { dg-error "'seq' overrides" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang vector
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker vector
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop auto
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop seq auto // { dg-error "'seq' overrides" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang auto // { dg-error "'auto' conflicts" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker auto // { dg-error "'auto' conflicts" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector auto // { dg-error "'auto' conflicts" }
+    for (i = 0; i < 10; i++)
+      { }
+
+  }
+
+#pragma acc parallel loop auto
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop gang
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop gang(static:5)
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop gang(static:*)
+  for (i = 0; i < 10; i++)
+    { }
+
+#pragma acc parallel loop seq gang // { dg-error "'seq' overrides" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+    { }
+
+#pragma acc parallel loop worker
+  for (i = 0; i < 10; i++)
+    { }
+
+#pragma acc parallel loop seq worker // { dg-error "'seq' overrides" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+    { }
+#pragma acc parallel loop gang worker
+  for (i = 0; i < 10; i++)
+    { }
+
+#pragma acc parallel loop vector
+  for (i = 0; i < 10; i++)
+    { }
+
+#pragma acc parallel loop seq vector // { dg-error "'seq' overrides" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+    { }
+#pragma acc parallel loop gang vector
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop worker vector
+  for (i = 0; i < 10; i++)
+    { }
+
+#pragma acc parallel loop auto
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop seq auto // { dg-error "'seq' overrides" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+    { }
+#pragma acc parallel loop gang auto // { dg-error "'auto' conflicts" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'auto' conflicts" "" { target c++ } }
+    { }
+#pragma acc parallel loop worker auto // { dg-error "'auto' conflicts" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'auto' conflicts" "" { target c++ } }
+    { }
+#pragma acc parallel loop vector auto // { dg-error "'auto' conflicts" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'auto' conflicts" "" { target c++ } }
+    { }
+}
diff --git gcc/testsuite/c-c++-common/goacc/loop-3.c gcc/testsuite/c-c++-common/goacc/loop-3.c
new file mode 100644
index 0000000..44b65a8
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/loop-3.c
@@ -0,0 +1,58 @@
+void par1 (void)
+{
+  int i, j;
+
+#pragma acc parallel
+  {
+#pragma acc loop gang(5) // { dg-error "argument not permitted" }
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop gang(num:5) // { dg-error "argument not permitted" }
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop worker(5) // { dg-error "argument not permitted" }
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop worker(num:5) // { dg-error "argument not permitted" }
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop vector(5) // { dg-error "argument not permitted" }
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop vector(length:5) // { dg-error "argument not permitted" }
+    for (i = 0; i < 10; i++)
+      { }
+
+   }
+}
+
+void p2 (void)
+{
+  int i, j;
+
+#pragma acc parallel loop gang(5) // { dg-error "argument not permitted" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "argument not permitted" "" { target c++ } }
+    { }
+#pragma acc parallel loop gang(num:5) // { dg-error "argument not permitted" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "argument not permitted" "" { target c++ } }
+    { }
+
+#pragma acc parallel loop worker(5) // { dg-error "argument not permitted" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "argument not permitted" "" { target c++ } }
+    { }
+#pragma acc parallel loop worker(num:5) // { dg-error "argument not permitted" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "argument not permitted" "" { target c++ } }
+    { }
+
+#pragma acc parallel loop vector(5) // { dg-error "argument not permitted" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "argument not permitted" "" { target c++ } }
+    { }
+#pragma acc parallel loop vector(length:5) // { dg-error "argument not permitted" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "argument not permitted" "" { target c++ } }
+    { }
+}
diff --git gcc/testsuite/c-c++-common/goacc/loop-clauses.c gcc/testsuite/c-c++-common/goacc/loop-clauses.c
index 97b8786..f3c7207 100644
--- gcc/testsuite/c-c++-common/goacc/loop-clauses.c
+++ gcc/testsuite/c-c++-common/goacc/loop-clauses.c
@@ -1,7 +1,3 @@
-/* { dg-do compile } */
-
-/* { dg-prune-output "sorry, unimplemented" } */
-
 int
 main ()
 {
diff --git gcc/testsuite/c-c++-common/goacc/nesting-1.c gcc/testsuite/c-c++-common/goacc/nesting-1.c
index 3a8f838..cab4f98 100644
--- gcc/testsuite/c-c++-common/goacc/nesting-1.c
+++ gcc/testsuite/c-c++-common/goacc/nesting-1.c
@@ -58,10 +58,6 @@ f_acc_data (void)
 
 #pragma acc exit data delete(i)
 
-#pragma acc loop /* { dg-error "loop directive must be associated with an OpenACC compute region" } */
-    for (i = 0; i < 2; ++i)
-      ;
-
 #pragma acc data
     {
 #pragma acc parallel
@@ -92,10 +88,6 @@ f_acc_data (void)
 #pragma acc enter data copyin(i)
 
 #pragma acc exit data delete(i)
-
-#pragma acc loop /* { dg-error "loop directive must be associated with an OpenACC compute region" } */
-      for (i = 0; i < 2; ++i)
-	;
     }
   }
 }
diff --git gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c
index 506a1ae..93a9111 100644
--- gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c
+++ gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c
@@ -38,6 +38,25 @@ f_acc_kernels (void)
   }
 }
 
+void
+f_acc_data (void)
+{
+  unsigned int i;
+#pragma acc data
+  {
+#pragma acc loop /* { dg-error "loop directive must be associated with an OpenACC compute region" } */
+    for (i = 0; i < 2; ++i)
+      ;
+
+#pragma acc data
+    {
+#pragma acc loop /* { dg-error "loop directive must be associated with an OpenACC compute region" } */
+      for (i = 0; i < 2; ++i)
+	;
+    }
+  }
+}
+
 #pragma acc routine
 void
 f_acc_routine (void)
@@ -45,3 +64,13 @@ f_acc_routine (void)
 #pragma acc parallel /* { dg-error "OpenACC region inside of OpenACC routine, nested parallelism not supported yet" } */
   ;
 }
+
+void
+f (void)
+{
+  int i, v = 0;
+
+#pragma acc loop gang reduction (+:v) /* { dg-error "loop directive must be associated with an OpenACC compute region" } */
+  for (i = 0; i < 10; i++)
+    v++;
+}
diff --git gcc/testsuite/c-c++-common/goacc/parallel-1.c gcc/testsuite/c-c++-common/goacc/parallel-1.c
index a860526..6c6cc88 100644
--- gcc/testsuite/c-c++-common/goacc/parallel-1.c
+++ gcc/testsuite/c-c++-common/goacc/parallel-1.c
@@ -1,6 +1,38 @@
-void
-foo (void)
+int
+parallel_empty (void)
 {
 #pragma acc parallel
   ;
+
+  return 0;
+}
+
+int
+parallel_eternal (void)
+{
+#pragma acc parallel
+  {
+    while (1)
+      ;
+  }
+
+  return 0;
+}
+
+int
+parallel_noreturn (void)
+{
+#pragma acc parallel
+  __builtin_abort ();
+
+  return 0;
+}
+
+int
+parallel_clauses (void)
+{
+  int a, b[100];
+
+#pragma acc parallel firstprivate (a, b)
+  ;
 }
diff --git gcc/testsuite/c-c++-common/goacc/reduction-1.c gcc/testsuite/c-c++-common/goacc/reduction-1.c
index de97125..3c1c2dd 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-1.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-1.c
@@ -1,70 +1,65 @@
-/* { dg-require-effective-target alloca } */
 /* Integer reductions.  */
 
-#define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   int result, array[n];
   int lresult;
 
   /* '+' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (*:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-//   result = 0;
-//   vresult = 0;
-// 
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-//
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
 
   /* '&' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&:result)
   for (i = 0; i < n; i++)
     result &= array[i];
 
   /* '|' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (|:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (|:result)
   for (i = 0; i < n; i++)
     result |= array[i];
 
   /* '^' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (^:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (^:result)
   for (i = 0; i < n; i++)
     result ^= array[i];
 
   /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
 
diff --git gcc/testsuite/c-c++-common/goacc/reduction-2.c gcc/testsuite/c-c++-common/goacc/reduction-2.c
index 2964236..c3105a2 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-2.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-2.c
@@ -1,49 +1,47 @@
-/* { dg-require-effective-target alloca } */
 /* float reductions.  */
 
-#define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   float result, array[n];
   int lresult;
 
   /* '+' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (*:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-// 
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
 
   /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
 
diff --git gcc/testsuite/c-c++-common/goacc/reduction-3.c gcc/testsuite/c-c++-common/goacc/reduction-3.c
index 34c51c2..4dbde04 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-3.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-3.c
@@ -1,49 +1,47 @@
-/* { dg-require-effective-target alloca } */
 /* double reductions.  */
 
-#define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   double result, array[n];
   int lresult;
 
   /* '+' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (*:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-// 
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
 
   /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
 
diff --git gcc/testsuite/c-c++-common/goacc/reduction-4.c gcc/testsuite/c-c++-common/goacc/reduction-4.c
index 328c0d4..c4572b9 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-4.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-4.c
@@ -1,51 +1,35 @@
-/* { dg-require-effective-target alloca } */
 /* complex reductions.  */
 
-#define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   __complex__ double result, array[n];
   int lresult;
 
   /* '+' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
-  /* Needs support for complex multiplication.  */
-
-//   /* '*' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (*:result)
-//   for (i = 0; i < n; i++)
-//     result *= array[i];
-//
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-// 
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* '*' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
+  for (i = 0; i < n; i++)
+    result *= array[i];
 
   /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (__real__(result) > __real__(array[i]));
 
   /* '||' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (__real__(result) > __real__(array[i]));
 
diff --git gcc/testsuite/c-c++-common/goacc/routine-3.c gcc/testsuite/c-c++-common/goacc/routine-3.c
index e6f83bd..b322d26 100644
--- gcc/testsuite/c-c++-common/goacc/routine-3.c
+++ gcc/testsuite/c-c++-common/goacc/routine-3.c
@@ -1,52 +1,118 @@
+/* Test invalid calls to routines.  */
+
 #pragma acc routine gang
-void gang (void) /* { dg-message "declared here" 3 } */
+int
+gang () /* { dg-message "declared here" 3 } */
 {
+  #pragma acc loop gang worker vector
+  for (int i = 0; i < 10; i++)
+    {
+    }
+
+  return 1;
 }
 
 #pragma acc routine worker
-void worker (void) /* { dg-message "declared here" 2 } */
+int
+worker () /* { dg-message "declared here" 2 } */
 {
+  #pragma acc loop worker vector
+  for (int i = 0; i < 10; i++)
+    {
+    }
+
+  return 1;
 }
 
 #pragma acc routine vector
-void vector (void) /* { dg-message "declared here" 1 } */
+int
+vector () /* { dg-message "declared here" } */
 {
+  #pragma acc loop vector
+  for (int i = 0; i < 10; i++)
+    {
+    }
+
+  return 1;
 }
 
 #pragma acc routine seq
-void seq (void)
+int
+seq ()
 {
+  return 1;
 }
 
-int main ()
+int
+main ()
 {
-
-#pragma acc parallel num_gangs (32) num_workers (32) vector_length (32)
+  int red = 0;
+#pragma acc parallel copy (red)
   {
-    #pragma acc loop gang /* { dg-message "loop here" 1 } */
-    for (int i = 0; i < 10; i++)
-      {
-	gang (); /*  { dg-error "routine call uses same" } */
-	worker ();
-	vector ();
-	seq ();
-      }
-    #pragma acc loop worker /* { dg-message "loop here" 2 } */
-    for (int i = 0; i < 10; i++)
-      {
-	gang (); /*  { dg-error "routine call uses same" } */
-	worker (); /*  { dg-error "routine call uses same" } */
-	vector ();
-	seq ();
-      }
-    #pragma acc loop vector /* { dg-message "loop here" 3 } */
-    for (int i = 0; i < 10; i++)
-      {
-	gang (); /*  { dg-error "routine call uses same" } */
-	worker (); /*  { dg-error "routine call uses same" } */
-	vector (); /*  { dg-error "routine call uses same" } */
-	seq ();
-      }
+    /* Independent/seq loop tests.  */
+#pragma acc loop reduction (+:red) // { dg-warning "insufficient partitioning" }
+    for (int i = 0; i < 10; i++)
+      red += gang ();
+
+#pragma acc loop reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += worker ();
+
+#pragma acc loop reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += vector ();
+
+    /* Gang routine tests.  */
+#pragma acc loop gang reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += gang (); // { dg-error "routine call uses same" }
+
+#pragma acc loop worker reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += gang (); // { dg-error "routine call uses same" }
+
+#pragma acc loop vector reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += gang (); // { dg-error "routine call uses same" }
+
+    /* Worker routine tests.  */
+#pragma acc loop gang reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += worker ();
+
+#pragma acc loop worker reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += worker (); // { dg-error "routine call uses same" }
+
+#pragma acc loop vector reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += worker (); // { dg-error "routine call uses same" }
+
+    /* Vector routine tests.  */
+#pragma acc loop gang reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += vector ();
+
+#pragma acc loop worker reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += vector ();
+
+#pragma acc loop vector reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += vector (); // { dg-error "routine call uses same" }
+
+    /* Seq routine tests.  */
+#pragma acc loop gang reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += seq ();
+
+#pragma acc loop worker reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += seq ();
+
+#pragma acc loop vector reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += seq ();
   }
 
   return 0;
diff --git gcc/testsuite/c-c++-common/goacc/routine-4.c gcc/testsuite/c-c++-common/goacc/routine-4.c
index 004d713..3e5fc4f 100644
--- gcc/testsuite/c-c++-common/goacc/routine-4.c
+++ gcc/testsuite/c-c++-common/goacc/routine-4.c
@@ -1,3 +1,4 @@
+/* Test invalid intra-routine parallelism.  */
 
 void gang (void);
 void worker (void);
@@ -14,6 +15,24 @@ void seq (void)
   worker ();  /* { dg-error "routine call uses" } */
   vector ();  /* { dg-error "routine call uses" } */
   seq ();
+
+  int red;
+
+#pragma acc loop reduction (+:red) // { dg-warning "insufficient partitioning" }
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop gang reduction (+:red) // { dg-error "disallowed by containing routine" }
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop worker reduction (+:red) // { dg-error "disallowed by containing routine" }
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop vector reduction (+:red) // { dg-error "disallowed by containing routine" }
+  for (int i = 0; i < 10; i++)
+    red ++;
 }
 
 void vector (void) /* { dg-message "declared here" 1 } */
@@ -22,6 +41,24 @@ void vector (void) /* { dg-message "declared here" 1 } */
   worker ();  /* { dg-error "routine call uses" } */
   vector ();
   seq ();
+
+  int red;
+
+#pragma acc loop reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop gang reduction (+:red) // { dg-error "disallowed by containing routine" }
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop worker reduction (+:red) // { dg-error "disallowed by containing routine" }
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop vector reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
 }
 
 void worker (void) /* { dg-message "declared here" 2 } */
@@ -30,6 +67,24 @@ void worker (void) /* { dg-message "declared here" 2 } */
   worker ();
   vector ();
   seq ();
+
+  int red;
+
+#pragma acc loop reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop gang reduction (+:red) // { dg-error "disallowed by containing routine" }
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop worker reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop vector reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
 }
 
 void gang (void) /* { dg-message "declared here" 3 } */
@@ -38,4 +93,22 @@ void gang (void) /* { dg-message "declared here" 3 } */
   worker ();
   vector ();
   seq ();
+
+  int red;
+
+#pragma acc loop reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop gang reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop worker reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop vector reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
 }
diff --git gcc/testsuite/c-c++-common/goacc/routine-5.c gcc/testsuite/c-c++-common/goacc/routine-5.c
index c34838f..2a9db90 100644
--- gcc/testsuite/c-c++-common/goacc/routine-5.c
+++ gcc/testsuite/c-c++-common/goacc/routine-5.c
@@ -46,6 +46,21 @@ using namespace g;
   
 #pragma acc routine (c) /* { dg-error "does not refer to" } */
 
+
+void Bar ();
+
+void Foo ()
+{
+  Bar ();
+}
+
+#pragma acc routine (Bar) // { dg-error "must be applied before use" }
+
+#pragma acc routine (Foo) gang // { dg-error "must be applied before definition" }
+
+#pragma acc routine (Baz) // { dg-error "not been declared" }
+
+
 int vb1;		/* { dg-error "directive for use" } */
 extern int vb2;		/* { dg-error "directive for use" } */
 static int vb3;		/* { dg-error "directive for use" } */
diff --git gcc/testsuite/c-c++-common/goacc/tile.c gcc/testsuite/c-c++-common/goacc/tile.c
index 2a81427..8e70e71 100644
--- gcc/testsuite/c-c++-common/goacc/tile.c
+++ gcc/testsuite/c-c++-common/goacc/tile.c
@@ -1,5 +1,3 @@
-/* { dg-do compile } */
-
 int
 main ()
 {
@@ -71,3 +69,259 @@ main ()
 
   return 0;
 }
+
+
+void par (void)
+{
+  int i, j;
+
+#pragma acc parallel
+  {
+#pragma acc loop tile // { dg-error "expected" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile() // { dg-error "expected" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(1) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(2) 
+    for (i = 0; i < 10; i++)
+      {
+	for (j = 1; j < 10; j++)
+	  { }
+      }
+#pragma acc loop tile(-2) // { dg-warning "'tile' value must be positive" }
+    for (i = 1; i < 10; i++)
+      { }
+#pragma acc loop tile(i)
+    for (i = 1; i < 10; i++)
+      { }
+#pragma acc loop tile(2, 2, 1)
+    for (i = 1; i < 3; i++)
+      {
+	for (j = 4; j < 6; j++)
+	  { }
+      } 
+#pragma acc loop tile(2, 2)
+    for (i = 1; i < 5; i+=2)
+      {
+	for (j = i + 1; j < 7; j+=i)
+	  { }
+      }
+#pragma acc loop vector tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector gang tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector worker tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang worker tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+  }
+}
+void p3 (void)
+{
+  int i, j;
+
+  
+#pragma acc parallel loop tile // { dg-error "expected" }
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop tile() // { dg-error "expected" }
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop tile(1) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop tile(*, 1) 
+  for (i = 0; i < 10; i++)
+    {
+      for (j = 1; j < 10; j++)
+	{ }
+    }
+#pragma acc parallel loop tile(-2) // { dg-warning "'tile' value must be positive" }
+  for (i = 1; i < 10; i++)
+    { }
+#pragma acc parallel loop tile(i)
+  for (i = 1; i < 10; i++)
+    { }
+#pragma acc parallel loop tile(2, 2, 1)
+  for (i = 1; i < 3; i++)
+    {
+      for (j = 4; j < 6; j++)
+        { }
+    }    
+#pragma acc parallel loop tile(2, 2)
+  for (i = 1; i < 5; i+=2)
+    {
+      for (j = i + 1; j < 7; j++)
+        { }
+    }
+#pragma acc parallel loop vector tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop worker tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop gang tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop vector gang tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop vector worker tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop gang worker tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+
+}
+
+
+void
+kern (void)
+{
+  int i, j;
+
+#pragma acc kernels
+  {
+#pragma acc loop tile // { dg-error "expected" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile() // { dg-error "expected" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(1)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(2)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(6-2) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(6+2) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(*, 1) 
+    for (i = 0; i < 10; i++)
+      {
+	for (j = 0; j < 10; i++)
+	  { }
+      }
+#pragma acc loop tile(-2) // { dg-warning "'tile' value must be positive" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(i)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(2, 2, 1)
+    for (i = 2; i < 4; i++)
+      for (i = 4; i < 6; i++)
+	{ }
+#pragma acc loop tile(2, 2)
+    for (i = 1; i < 5; i+=2)
+      for (j = i+1; j < 7; i++)
+	{ }
+#pragma acc loop vector tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector gang tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector worker tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang worker tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+   }
+}
+
+
+void k3 (void)
+{
+  int i, j;
+
+#pragma acc kernels loop tile // { dg-error "expected" }
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop tile() // { dg-error "expected" }
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop tile(1) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop tile(*, 1) 
+  for (i = 0; i < 10; i++)
+    {
+      for (j = 1; j < 10; j++)
+	{ }
+    }
+#pragma acc kernels loop tile(-2) // { dg-warning "'tile' value must be positive" }
+  for (i = 1; i < 10; i++)
+    { }
+#pragma acc kernels loop tile(i)
+  for (i = 1; i < 10; i++)
+    { }
+#pragma acc kernels loop tile(2, 2, 1)
+  for (i = 1; i < 3; i++)
+    {
+      for (j = 4; j < 6; j++)
+	{ }
+    }    
+#pragma acc kernels loop tile(2, 2)
+  for (i = 1; i < 5; i++)
+    {
+      for (j = i + 1; j < 7; j += i)
+	{ }
+    }
+#pragma acc kernels loop vector tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop worker tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop gang tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop vector gang tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop vector worker tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop gang worker tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+}
diff --git gcc/testsuite/g++.dg/goacc/reference.C gcc/testsuite/g++.dg/goacc/reference.C
new file mode 100644
index 0000000..b000668
--- /dev/null
+++ gcc/testsuite/g++.dg/goacc/reference.C
@@ -0,0 +1,39 @@
+int
+test1 (int &ref)
+{
+#pragma acc kernels copy (ref)
+  {
+    ref = 10;
+  }
+}
+
+int
+test2 (int &ref)
+{
+  int b;
+#pragma acc kernels copyout (b)
+  {
+    b = ref + 10;
+  }
+
+#pragma acc parallel copyout (b)
+  {
+    b = ref + 10;
+  }
+
+  ref = b;
+}
+
+int
+main()
+{
+  int a = 0;
+  int &ref_a = a;
+
+  #pragma acc parallel copy (a, ref_a)
+  {
+    ref_a = 5;
+  }
+
+  return a;
+}
diff --git gcc/testsuite/g++.dg/goacc/routine-1.C gcc/testsuite/g++.dg/goacc/routine-1.C
new file mode 100644
index 0000000..a73a73d
--- /dev/null
+++ gcc/testsuite/g++.dg/goacc/routine-1.C
@@ -0,0 +1,13 @@
+/* Test valid use of the routine directive.  */
+
+namespace N
+{
+  extern void foo1();
+  extern void foo2();
+#pragma acc routine (foo1)
+#pragma acc routine
+  void foo3()
+  {
+  }
+}
+#pragma acc routine (N::foo2)
diff --git gcc/testsuite/g++.dg/goacc/routine-2.C gcc/testsuite/g++.dg/goacc/routine-2.C
new file mode 100644
index 0000000..2d16466
--- /dev/null
+++ gcc/testsuite/g++.dg/goacc/routine-2.C
@@ -0,0 +1,42 @@
+/* Test invalid use of the routine directive.  */
+
+template <typename T>
+extern T one_d();
+#pragma acc routine (one_d) /* { dg-error "names a set of overloads" } */
+
+template <typename T>
+T
+one()
+{
+  return 1;
+}
+#pragma acc routine (one) /* { dg-error "names a set of overloads" } */
+
+int incr (int);
+float incr (float);
+int inc;
+
+#pragma acc routine (incr) /* { dg-error "names a set of overloads" } */
+
+#pragma acc routine (increment) /* { dg-error "has not been declared" } */
+
+#pragma acc routine (inc) /* { dg-error "does not refer to a function" } */
+
+#pragma acc routine (+) /* { dg-error "expected unqualified-id before '.' token" } */
+
+int sum (int, int);
+
+namespace foo {
+#pragma acc routine (sum)
+  int sub (int, int);
+}
+
+#pragma acc routine (foo::sub)
+
+/* It's strange to apply a routine directive to subset of overloaded
+   functions, but that is permissible in OpenACC 2.x.  */
+
+int decr (int a);
+
+#pragma acc routine
+float decr (float a);
diff --git gcc/testsuite/g++.dg/goacc/template.C gcc/testsuite/g++.dg/goacc/template.C
index f7a717b..f139dc2 100644
--- gcc/testsuite/g++.dg/goacc/template.C
+++ gcc/testsuite/g++.dg/goacc/template.C
@@ -1,8 +1,3 @@
-// This error is temporary.  Remove when support is added for these clauses
-// in the middle end.  Also remove the comments from the reduction test
-// after the FE learns that reduction variables may appear in data clauses too.
-// { dg-prune-output "sorry, unimplemented" }
-
 #pragma acc routine
 template <typename T> T
 accDouble(int val)
@@ -20,55 +15,62 @@ oacc_parallel_copy (T a)
   double z = 4;
 
 #pragma acc parallel num_gangs (a) num_workers (a) vector_length (a) default (none) copyout (b) copyin (a)
-  {
+#pragma acc loop gang worker vector
+  for (int i = 0; i < 1; i++)
     b = a;
-  }
 
 #pragma acc parallel num_gangs (a) copy (w, x, y, z)
-  {
-    w = accDouble<char>(w);
-    x = accDouble<int>(x);
-    y = accDouble<float>(y);
-    z = accDouble<double>(z);
-  }
+#pragma acc loop
+  for (int i = 0; i < 1; i++)
+    {
+      w = accDouble<char>(w);
+      x = accDouble<int>(x);
+      y = accDouble<float>(y);
+      z = accDouble<double>(z);
+    }
 
 #pragma acc parallel num_gangs (a) if (1)
   {
+#pragma acc loop independent collapse (2) gang
+    for (int i = 0; i < a; i++)
+      for (int j = 0; j < 5; j++)
+	b = a;
+
 #pragma acc loop auto tile (a, 3)
-  for (int i = 0; i < a; i++)
-    for (int j = 0; j < 5; j++)
-      b = a;
+    for (int i = 0; i < a; i++)
+      for (int j = 0; j < 5; j++)
+	b = a;
 
 #pragma acc loop seq
-  for (int i = 0; i < a; i++)
-    b = a;
+    for (int i = 0; i < a; i++)
+      b = a;
   }
 
   T c;
 
 #pragma acc parallel num_workers (10)
-  {
+#pragma acc loop worker
+  for (int i = 0; i < 1; i++)
+    {
 #pragma acc atomic capture
-    c = b++;
+      c = b++;
 
 #pragma atomic update
-    c++;
+      c++;
 
 #pragma acc atomic read
-    b = a;
+      b = a;
 
 #pragma acc atomic write
-    b = a;
-  }
+      b = a;
+    }
 
-//#pragma acc parallel reduction (+:c)
-//  {
-//    c = 1;
-//  }
+#pragma acc parallel reduction (+:c)
+  c = 1;
 
 #pragma acc data if (1) copy (b)
   {
-    #pragma acc parallel
+#pragma acc parallel
     {
       b = a;
     }
@@ -76,9 +78,9 @@ oacc_parallel_copy (T a)
 
 #pragma acc enter data copyin (b)
 #pragma acc parallel present (b)
-    {
-      b = a;
-    }
+  {
+    b = a;
+  }
 
 #pragma acc update host (b)
 #pragma acc update self (b)
@@ -109,11 +111,9 @@ oacc_kernels_copy (T a)
 #pragma acc kernels copyout (b) copyin (a)
   b = a;
 
-//#pragma acc kernels loop reduction (+:c)
-//  for (int i = 0; i < 10; i++)
-//    {
-//      c = 1;
-//    }
+#pragma acc kernels loop reduction (+:c)
+  for (int i = 0; i < 10; i++)
+    c = 1;
 
 #pragma acc data if (1) copy (b)
   {
@@ -125,9 +125,10 @@ oacc_kernels_copy (T a)
 
 #pragma acc enter data copyin (b)
 #pragma acc kernels present (b)
-    {
-      b = a;
-    }
+  {
+    b = a;
+  }
+
   return b;
 }
 
diff --git gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
index 6977525..42a447a 100644
--- gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
+++ gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
@@ -1,17 +1,10 @@
 ! Exercise combined OpenACC directives.
 
-! { dg-do compile }
-! { dg-options "-fopenacc -fdump-tree-gimple" }
-
-! This error is temporary.  Remove when support is added for these clauses
-! in the middle end.
-! { dg-prune-output "sorry, unimplemented" }
-
-! Update the reduction tests.
+! { dg-additional-options "-fdump-tree-gimple" }
 
 subroutine test
   implicit none
-  integer a(100), i, j, z
+  integer a(100), i, j, y, z
 
   ! PARALLEL
   
@@ -73,10 +66,10 @@ subroutine test
   end do
   !$acc end parallel loop
 
-!  !$acc parallel loop reduction (+:z) copy (z)
-!  do i = 1, 100
-!  end do
-!  !$acc end parallel loop
+  !$acc parallel loop reduction (+:y) copy (y)
+  do i = 1, 100
+  end do
+  !$acc end parallel loop
 
   ! KERNELS
 
@@ -138,10 +131,10 @@ subroutine test
   end do
   !$acc end kernels loop
 
-!  !$acc kernels loop reduction (+:z) copy (z)
-!  do i = 1, 100
-!  end do
-!  !$acc end kernels loop
+  !$acc kernels loop reduction (+:y) copy (y)
+  do i = 1, 100
+  end do
+  !$acc end kernels loop
 end subroutine test
 
 ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. collapse.2." 2 "gimple" } }
@@ -153,3 +146,5 @@ end subroutine test
 ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. tile.2, 3" 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "acc loop private.i. independent" 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "private.z" 2 "gimple" } }
+! { dg-final { scan-tree-dump-times "omp target oacc_\[^ \]+ map.force_tofrom:y" 2 "gimple" } }
+! { dg-final { scan-tree-dump-times "acc loop private.i. reduction..:y." 2 "gimple" } }
diff --git gcc/testsuite/gfortran.dg/goacc/loop-1.f95 gcc/testsuite/gfortran.dg/goacc/loop-1.f95
index 817039f..b5f9e03 100644
--- gcc/testsuite/gfortran.dg/goacc/loop-1.f95
+++ gcc/testsuite/gfortran.dg/goacc/loop-1.f95
@@ -1,5 +1,3 @@
-! { dg-do compile } 
-! { dg-additional-options "-fmax-errors=100" } 
 module test
   implicit none
 contains
@@ -29,14 +27,18 @@ subroutine test1
        i = i + 1
   end do
   !$acc loop
-  do 300 d = 1, 30, 6 ! { dg-error "integer" }
+  do 300 d = 1, 30, 6
       i = d
   300 a(i) = 1
+  ! { dg-warning "Deleted feature: Loop variable at .1. must be integer" "" { target *-*-* } 30 }
+  ! { dg-error "ACC LOOP iteration variable must be of type integer" "" { target *-*-* } 30 }
   !$acc loop
-  do d = 1, 30, 5 ! { dg-error "integer" }
+  do d = 1, 30, 5
        i = d
       a(i) = 2
   end do
+  ! { dg-warning "Deleted feature: Loop variable at .1. must be integer" "" { target *-*-* } 36 }
+  ! { dg-error "ACC LOOP iteration variable must be of type integer" "" { target *-*-* } 36 }
   !$acc loop
   do i = 1, 30
       if (i .eq. 16) exit ! { dg-error "EXIT statement" }
@@ -144,8 +146,10 @@ subroutine test1
     end do
     !$acc parallel loop collapse(2)
     do i = 1, 3
-        do r = 4, 6    ! { dg-error "integer" }
+        do r = 4, 6
         end do
+        ! { dg-warning "Deleted feature: Loop variable at .1. must be integer" "" { target *-*-* } 149 }
+        ! { dg-error "ACC LOOP iteration variable must be of type integer" "" { target *-*-* } 149 }
     end do
 
     ! Both seq and independent are not allowed
@@ -167,4 +171,3 @@ subroutine test1
 
 end subroutine test1
 end module test
-! { dg-prune-output "Deleted" }
diff --git gcc/testsuite/gfortran.dg/goacc/loop-5.f95 gcc/testsuite/gfortran.dg/goacc/loop-5.f95
index 5cbd975..d059cf7 100644
--- gcc/testsuite/gfortran.dg/goacc/loop-5.f95
+++ gcc/testsuite/gfortran.dg/goacc/loop-5.f95
@@ -1,9 +1,3 @@
-! { dg-do compile }
-! { dg-additional-options "-fmax-errors=100" }
-
-! { dg-prune-output "sorry, unimplemented" }
-! { dg-prune-output "Error: work-sharing region" }
-
 program test
   implicit none
   integer :: i, j
diff --git gcc/testsuite/gfortran.dg/goacc/loop-6.f95 gcc/testsuite/gfortran.dg/goacc/loop-6.f95
index e844468..d0855b4 100644
--- gcc/testsuite/gfortran.dg/goacc/loop-6.f95
+++ gcc/testsuite/gfortran.dg/goacc/loop-6.f95
@@ -1,11 +1,3 @@
-! { dg-do compile }
-! { dg-additional-options "-fmax-errors=100" }
-
-! This error is temporary.  Remove when support is added for these clauses
-! in the middle end.
-! { dg-prune-output "sorry, unimplemented" }
-! { dg-prune-output "Error: work-sharing region" }
-
 program test
   implicit none
   integer :: i, j
diff --git gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90 gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90
index 6cfd715..81bdc23 100644
--- gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90
+++ gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90
@@ -1,13 +1,7 @@
-! { dg-do compile } 
 ! { dg-additional-options "-fdump-tree-original -std=f2008" } 
 
 ! test for tree-dump-original and spaces-commas
 
-! This error is temporary.  Remove when support is added for these clauses
-! in the middle end.
-! { dg-prune-output "sorry, unimplemented" }
-! { dg-prune-output "Error: work-sharing region" }
-
 program test
   implicit none
   integer :: i, j, k, m, sum
diff --git libgomp/testsuite/libgomp.oacc-c++/template-reduction.C libgomp/testsuite/libgomp.oacc-c++/template-reduction.C
new file mode 100644
index 0000000..fb5924c
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c++/template-reduction.C
@@ -0,0 +1,98 @@
+const int n = 100;
+
+// Check explicit template copy map
+
+template<typename T> T
+sum (T array[])
+{
+   T s = 0;
+
+#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy (s, array[0:n])
+  for (int i = 0; i < n; i++)
+    s += array[i];
+
+  return s;
+}
+
+// Check implicit template copy map
+
+template<typename T> T
+sum ()
+{
+  T s = 0;
+  T array[n];
+
+  for (int i = 0; i < n; i++)
+    array[i] = i+1;
+
+#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy (s)
+  for (int i = 0; i < n; i++)
+    s += array[i];
+
+  return s;
+}
+
+// Check present and async
+
+template<typename T> T
+async_sum (T array[])
+{
+   T s = 0;
+
+#pragma acc parallel loop num_gangs (10) gang async (1) present (array[0:n])
+   for (int i = 0; i < n; i++)
+     array[i] = i+1;
+
+#pragma acc parallel loop num_gangs (10) gang reduction (+:s) present (array[0:n]) copy (s) async wait (1)
+  for (int i = 0; i < n; i++)
+    s += array[i];
+
+#pragma acc wait
+
+  return s;
+}
+
+// Check present and async and an explicit firstprivate
+
+template<typename T> T
+async_sum (int c)
+{
+   T s = 0;
+
+#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy(s) firstprivate (c) async wait (1)
+  for (int i = 0; i < n; i++)
+    s += i+c;
+
+#pragma acc wait
+
+  return s;
+}
+
+int
+main()
+{
+  int a[n];
+  int result = 0;
+
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = i+1;
+      result += i+1;
+    }
+
+  if (sum (a) != result)
+    __builtin_abort ();
+
+  if (sum<int> () != result)
+    __builtin_abort ();
+
+#pragma acc enter data copyin (a)
+  if (async_sum (a) != result)
+    __builtin_abort ();
+
+  if (async_sum<int> (1) != result)
+    __builtin_abort ();
+#pragma acc exit data delete (a)
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
index 22cef6d..f3b490a 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
@@ -1,4 +1,6 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* <http://news.gmane.org/find-root.php?message_id=%3C87pp0aaksc.fsf%40kepler.schwinge.homeip.net%3E>.
+   { dg-xfail-run-if "TODO" { *-*-* } } */
 /* { dg-additional-options "-lcuda" } */
 
 #include <openacc.h>
@@ -460,6 +462,438 @@ main (int argc, char **argv)
             abort ();
     }
 
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc data copy (a[0:N]) copy (b[0:N]) copyin (N)
+    {
+
+#pragma acc kernels async
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+#pragma acc wait
+
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 3.0)
+            abort ();
+
+        if (b[i] != 3.0)
+            abort ();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 2.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc data copy (a[0:N]) copy (b[0:N]) copyin (N)
+    {
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+#pragma acc wait (1)
+
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 2.0)
+            abort ();
+
+        if (b[i] != 2.0)
+            abort ();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 0.0;
+        c[i] = 0.0;
+        d[i] = 0.0;
+    }
+
+#pragma acc data copy (a[0:N]) copy (b[0:N]) copy (c[0:N]) copy (d[0:N]) copyin (N)
+    {
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii];
+    }
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii];
+    }
+
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii];
+    }
+
+#pragma acc wait (1)
+
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 3.0)
+            abort ();
+
+        if (b[i] != 9.0)
+            abort ();
+
+        if (c[i] != 4.0)
+            abort ();
+
+        if (d[i] != 1.0)
+            abort ();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 2.0;
+        b[i] = 0.0;
+        c[i] = 0.0;
+        d[i] = 0.0;
+        e[i] = 0.0;
+    }
+
+#pragma acc data copy (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N]) copyin (N)
+    {
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii];
+    }
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii];
+    }
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii];
+    }
+
+#pragma acc kernels wait (1) async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            e[ii] = a[ii] + b[ii] + c[ii] + d[ii];
+    }
+
+#pragma acc wait (1)
+
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 2.0)
+            abort ();
+
+        if (b[i] != 4.0)
+            abort ();
+
+        if (c[i] != 4.0)
+            abort ();
+
+        if (d[i] != 1.0)
+            abort ();
+
+        if (e[i] != 11.0)
+            abort ();
+    }
+
+
+    r = cuStreamCreate (&stream1, CU_STREAM_NON_BLOCKING);
+    if (r != CUDA_SUCCESS)
+    {
+        fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+        abort ();
+    }
+
+    acc_set_cuda_stream (1, stream1);
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 5.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc data copy (a[0:N], b[0:N]) copyin (N)
+    {
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+#pragma acc wait (1)
+
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 5.0)
+            abort ();
+
+        if (b[i] != 5.0)
+            abort ();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 7.0;
+        b[i] = 0.0;
+        c[i] = 0.0;
+        d[i] = 0.0;
+    }
+
+#pragma acc data copy (a[0:N]) copy (b[0:N]) copy (c[0:N]) copy (d[0:N]) copyin (N)
+    {
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii];
+    }
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii];
+    }
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii];
+    }
+
+#pragma acc wait (1)
+
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 7.0)
+            abort ();
+
+        if (b[i] != 49.0)
+            abort ();
+
+        if (c[i] != 4.0)
+            abort ();
+
+        if (d[i] != 1.0)
+            abort ();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 0.0;
+        c[i] = 0.0;
+        d[i] = 0.0;
+        e[i] = 0.0;
+    }
+
+#pragma acc data copy (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N]) copyin (N)
+    {
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii];
+    }
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii];
+    }
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii];
+    }
+
+#pragma acc kernels wait (1) async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            e[ii] = a[ii] + b[ii] + c[ii] + d[ii];
+    }
+
+#pragma acc wait (1)
+
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 3.0)
+            abort ();
+
+        if (b[i] != 9.0)
+            abort ();
+
+        if (c[i] != 4.0)
+            abort ();
+
+        if (d[i] != 1.0)
+            abort ();
+
+        if (e[i] != 17.0)
+            abort ();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 4.0;
+        b[i] = 0.0;
+        c[i] = 0.0;
+        d[i] = 0.0;
+        e[i] = 0.0;
+    }
+
+#pragma acc data copyin (a[0:N], b[0:N], c[0:N]) copyin (N)
+    {
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii];
+    }
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii];
+    }
+
+#pragma acc update host (a[0:N], b[0:N], c[0:N]) wait (1)
+
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 4.0)
+            abort ();
+
+        if (b[i] != 16.0)
+            abort ();
+
+        if (c[i] != 4.0)
+            abort ();
+    }
+
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 5.0;
+        b[i] = 0.0;
+        c[i] = 0.0;
+        d[i] = 0.0;
+        e[i] = 0.0;
+    }
+
+#pragma acc data copyin (a[0:N], b[0:N], c[0:N]) copyin (N)
+    {
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii];
+    }
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii];
+    }
+
+#pragma acc update host (a[0:N], b[0:N], c[0:N]) async (1)
+
+#pragma acc wait (1)
+
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 5.0)
+            abort ();
+
+        if (b[i] != 25.0)
+            abort ();
+
+        if (c[i] != 4.0)
+            abort ();
+    }
+
     acc_shutdown (acc_device_nvidia);
 
     return 0;
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-1.c
index 51c0cf5..410c46c 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-1.c
@@ -586,6 +586,32 @@ main (int argc, char **argv)
 
     for (i = 0; i < N; i++)
     {
+        a[i] = 6.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc parallel pcopy (a[0:N], b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 6.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
         a[i] = 5.0;
         b[i] = 7.0;
     }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
similarity index 75%
rename from libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-2.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
index d9fff6f..2cd98bd 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
@@ -1,4 +1,4 @@
 /* { dg-do run { target lto } } */
 /* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
 
-#include "parallel-1.c"
+#include "data-clauses-kernels.c"
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
new file mode 100644
index 0000000..f7f2d1c
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
@@ -0,0 +1,2 @@
+#define CONSTRUCT kernels
+#include "data-clauses.h"
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
similarity index 75%
rename from libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-2.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
index f76c926..ddcf4e3 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
@@ -1,4 +1,4 @@
 /* { dg-do run { target lto } } */
 /* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
 
-#include "kernels-1.c"
+#include "data-clauses-parallel.c"
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
new file mode 100644
index 0000000..e734b2f
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
@@ -0,0 +1,2 @@
+#define CONSTRUCT parallel
+#include "data-clauses.h"
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
similarity index 56%
rename from libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
index fd9df33..d557bef 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
@@ -1,7 +1,3 @@
-/* { dg-do run } */
-
-#include <stdlib.h>
-
 int i;
 
 int main(void)
@@ -11,145 +7,145 @@ int main(void)
   i = -1;
   j = -2;
   v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) copyin (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) copyin (i, j)
   {
     if (i != -1 || j != -2)
-      abort ();
+      __builtin_abort ();
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
 #if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 #else
   if (v != 1 || i != -1 || j != -2)
-    abort ();
+    __builtin_abort ();
 #endif
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) copyout (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) copyout (i, j)
   {
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) copy (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) copy (i, j)
   {
     if (i != -1 || j != -2)
-      abort ();
+      __builtin_abort ();
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) create (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) create (i, j)
   {
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
 #if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 #else
   if (v != 1 || i != -1 || j != -2)
-    abort ();
+    __builtin_abort ();
 #endif
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copyin (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) present_or_copyin (i, j)
   {
     if (i != -1 || j != -2)
-      abort ();
+      __builtin_abort ();
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
   if (v != 1)
-    abort ();
+    __builtin_abort ();
 #if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 #else
   if (v != 1 || i != -1 || j != -2)
-    abort ();
+    __builtin_abort ();
 #endif
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copyout (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) present_or_copyout (i, j)
   {
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copy (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) present_or_copy (i, j)
   {
     if (i != -1 || j != -2)
-      abort ();
+      __builtin_abort ();
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_create (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) present_or_create (i, j)
   {
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
   if (v != 1)
-    abort ();
+    __builtin_abort ();
 #if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 #else
   if (v != 1 || i != -1 || j != -2)
-    abort ();
+    __builtin_abort ();
 #endif
 
   i = -1;
@@ -158,23 +154,23 @@ int main(void)
 
 #pragma acc data copyin (i, j)
   {
-#pragma acc parallel /* copyout */ present_or_copyout (v) present (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) present (i, j)
     {
       if (i != -1 || j != -2)
-        abort ();
+	__builtin_abort ();
       i = 2;
       j = 1;
       if (i != 2 || j != 1)
-        abort ();
+	__builtin_abort ();
       v = 1;
     }
   }
 #if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 #else
   if (v != 1 || i != -1 || j != -2)
-    abort ();
+    __builtin_abort ();
 #endif
 
   i = -1;
@@ -183,23 +179,23 @@ int main(void)
 
 #pragma acc data copyin(i, j)
   {
-#pragma acc parallel /* copyout */ present_or_copyout (v)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v)
     {
       if (i != -1 || j != -2)
-        abort ();
+	__builtin_abort ();
       i = 2;
       j = 1;
       if (i != 2 || j != 1)
-        abort ();
+	__builtin_abort ();
       v = 1;
     }
   }
 #if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 #else
   if (v != 1 || i != -1 || j != -2)
-    abort ();
+    __builtin_abort ();
 #endif
 
   return 0;
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
index e271a37..8247e7b 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
@@ -1,5 +1,3 @@
-/* { dg-do run } */
-
 #include <stdlib.h>
 
 int main (void)
@@ -28,5 +26,26 @@ int main (void)
     abort ();
 #endif
 
+  a_1 = a_2 = 0;
+
+#pragma acc data deviceptr (a)
+#pragma acc parallel copyout (a_1, a_2)
+  {
+    a_1 = a;
+    a_2 = &a;
+  }
+
+  if (a != A)
+    abort ();
+  if (a_1 != a)
+    abort ();
+#if ACC_MEM_SHARED
+  if (a_2 != &a)
+    abort ();
+#else
+  if (a_2 == &a)
+    abort ();
+#endif
+
   return 0;
 }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c
index 7f5d3d3..689a443 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c
@@ -1,8 +1,7 @@
-/* { dg-do run } */
-
 #include  <openacc.h>
 
-int main ()
+
+void t1 ()
 {
   int ok = 1;
   int val = 2;
@@ -28,14 +27,115 @@ int main ()
   if (ondev)
     {
       if (!ok)
-	return 1;
+	__builtin_abort ();
       if (val != 2)
-	return 1;
+	__builtin_abort ();
 
       for (int i = 0; i < 32; i++)
 	if (ary[i] != 2 + i)
-	  return 1;
+	  __builtin_abort ();
     }
-  
+}
+
+
+void t2 ()
+{
+  int ok = 1;
+  int val = 2;
+
+#pragma acc data copy(val)
+  {
+#pragma acc parallel present (val)
+    {
+      val = 7;
+    }
+
+#pragma acc parallel firstprivate (val) copy(ok)
+    {
+      ok  = val == 7;
+      val = 9;
+    }
+  }
+
+  if (!ok)
+    __builtin_abort ();
+  if (val != 7)
+    __builtin_abort ();
+}
+
+
+#define N 100
+void t3 ()
+{
+  int a, b[N], c, d, i;
+  int n = acc_get_device_type () == acc_device_nvidia ? N : 1;
+
+  a = 5;
+  for (i = 0; i < n; i++)
+    b[i] = -1;
+
+  #pragma acc parallel num_gangs (n) firstprivate (a)
+  #pragma acc loop gang
+  for (i = 0; i < n; i++)
+    {
+      a = a + i;
+      b[i] = a;
+    }
+
+  for (i = 0; i < n; i++)
+    if (a + i != b[i])
+      __builtin_abort ();
+
+  #pragma acc data copy (a)
+  {
+    #pragma acc parallel firstprivate (a) copyout (c)
+    {
+      a = 10;
+      c = a;
+    }
+
+    /* This version of 'a' should still be 5.  */
+    #pragma acc parallel copyout (d) present (a)
+    {
+      d = a;
+    }
+  }
+
+  if (c != 10)
+    __builtin_abort ();
+  if (d != 5)
+    __builtin_abort ();
+}
+#undef N
+
+
+void t4 ()
+{
+  int x = 5, i, arr[32];
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 3;
+
+#pragma acc parallel firstprivate(x) copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+#pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      arr[i] += x;
+  }
+
+  for (i = 0; i < 32; i++)
+    if (arr[i] != 8)
+      __builtin_abort ();
+}
+
+
+int
+main()
+{
+  t1 ();
+  t2 ();
+  t3 ();
+  t4 ();
+
   return 0;
 }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c
deleted file mode 100644
index 9666542..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c
+++ /dev/null
@@ -1,31 +0,0 @@
-/* { dg-do run } */
-
-#include  <openacc.h>
-
-int main ()
-{
-  int ok = 1;
-  int val = 2;
-
-#pragma acc data copy(val)
-  {
-#pragma acc parallel present (val)
-    {
-      val = 7;
-    }
-
-#pragma acc parallel firstprivate (val) copy(ok)
-    {
-      ok  = val == 7;
-      val = 9;
-    }
-
-  }
-
-  if (!ok)
-    return 1;
-  if(val != 7)
-    return 1;
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-1.c
new file mode 100644
index 0000000..d8ab958
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-1.c
@@ -0,0 +1,48 @@
+#include <assert.h>
+
+#define N 100
+
+void
+test (int *a, int *b, int sarg)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    assert (a[i] == b[i] + sarg);
+}
+
+int
+main ()
+{
+  int a[N], b[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+    b[i] = i+1;
+
+#pragma acc parallel loop gang (static:*) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = b[i] + 0;
+
+  test (a, b, 0);
+
+#pragma acc parallel loop gang (static:1) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = b[i] + 1;
+
+  test (a, b, 1);
+
+#pragma acc parallel loop gang (static:5) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = b[i] + 5;
+
+  test (a, b, 5);
+
+#pragma acc parallel loop gang (static:20) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = b[i] + 20;
+
+  test (a, b, 20);
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
new file mode 100644
index 0000000..ce9632c
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
@@ -0,0 +1,100 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* This code uses nvptx inline assembly guarded with acc_on_device, which is
+   not optimized away at -O0, and then confuses the target assembler.
+   { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
+
+#include <assert.h>
+#include <openacc.h>
+
+#define N 100
+
+#define GANG_ID(I)						\
+  (acc_on_device (acc_device_nvidia)				\
+   ? ({unsigned __r;						\
+       __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (__r));	\
+       __r; }) : (I))
+
+int
+test_static(int *a, int num_gangs, int sarg)
+{
+  int i, j;
+
+  if (sarg == 0)
+    sarg = 1;
+
+  for (i = 0; i < N / sarg; i++)
+    for (j = 0; j < sarg; j++)
+      assert (a[i*sarg+j] == i % num_gangs);
+}
+
+int
+test_nonstatic(int *a, int gangs)
+{
+  int i, j;
+
+  for (i = 0; i < N; i+=gangs)
+    for (j = 0; j < gangs; j++)
+      assert (a[i+j] == i/gangs);
+}
+
+int
+main ()
+{
+  int a[N];
+  int i, x;
+
+#pragma acc parallel loop gang (static:*) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = GANG_ID (i);
+
+  test_nonstatic (a, 10);
+
+#pragma acc parallel loop gang (static:1) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = GANG_ID (i);
+
+  test_static (a, 10, 1);
+
+#pragma acc parallel loop gang (static:2) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = GANG_ID (i);
+
+  test_static (a, 10, 2);
+
+#pragma acc parallel loop gang (static:5) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = GANG_ID (i);
+
+  test_static (a, 10, 5);
+
+#pragma acc parallel loop gang (static:20) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = GANG_ID (i);
+
+  test_static (a, 10, 20);
+
+  /* Non-static gang.  */
+#pragma acc parallel loop gang num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = GANG_ID (i);
+
+  test_nonstatic (a, 10);
+
+  /* Static arguments with a variable expression.  */
+
+  x = 20;
+#pragma acc parallel loop gang (static:0+x) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = GANG_ID (i);
+
+  test_static (a, 10, 20);
+
+  x = 20;
+#pragma acc parallel loop gang (static:x) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = GANG_ID (i);
+
+  test_static (a, 10, 20);
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
index 6aa3bb7..5398905 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
@@ -1,5 +1,3 @@
-/* { dg-do run } */
-
 #include <openacc.h>
 #include <stdlib.h>
 #include <stdbool.h>
@@ -608,5 +606,357 @@ main(int argc, char **argv)
 	abort ();
 #endif
 
+    for (i = 0; i < N; i++)
+        a[i] = 4.0;
+
+#pragma acc kernels copyin(a[0:N]) copyout(b[0:N]) if(1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 5.0;
+#else
+    exp = 4.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 16.0;
+
+#pragma acc kernels if(0)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 17.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 8.0;
+
+#pragma acc kernels copyin(a[0:N]) copyout(b[0:N]) if(one)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 9.0;
+#else
+    exp = 8.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 22.0;
+
+#pragma acc kernels if(zero)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 23.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 16.0;
+
+#pragma acc kernels copyin(a[0:N]) copyout(b[0:N]) if(true)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 17.0;
+#else
+    exp = 16.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 76.0;
+
+#pragma acc kernels if(false)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 77.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 22.0;
+
+    n = 1;
+
+#pragma acc kernels copyin(a[0:N]) copyout(b[0:N]) if(n)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 23.0;
+#else
+    exp = 22.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 18.0;
+
+    n = 0;
+
+#pragma acc kernels if(n)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 19.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 49.0;
+
+    n = 1;
+
+#pragma acc kernels copyin(a[0:N]) copyout(b[0:N]) if(n + n)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 50.0;
+#else
+    exp = 49.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 38.0;
+
+    n = 0;
+
+#pragma acc kernels if(n + n)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 39.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 91.0;
+
+#pragma acc kernels copyin(a[0:N]) copyout(b[0:N]) if(-2)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 92.0;
+#else
+    exp = 91.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 43.0;
+
+#pragma acc kernels copyin(a[0:N]) copyout(b[0:N]) if(one == 1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 44.0;
+#else
+    exp = 43.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 87.0;
+
+#pragma acc kernels if(one == 0)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 88.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 9.0;
+    }
+
+#if ACC_MEM_SHARED
+    exp = 0.0;
+    exp2 = 0.0;
+#else
+    acc_map_data (a, d_a, N * sizeof (float));
+    acc_map_data (b, d_b, N * sizeof (float));
+    exp = 3.0;
+    exp2 = 9.0;
+#endif
+
     return 0;
 }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
deleted file mode 100644
index 3acfdf5..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
+++ /dev/null
@@ -1,184 +0,0 @@
-/* { dg-do run } */
-
-#include <stdlib.h>
-
-int i;
-
-int main (void)
-{
-  int j, v;
-
-#if 0
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) copyin (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) copyout (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) copy (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) create (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyin (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1)
-    abort ();
-#if ACC_MEM_SHARED
-  if (i != 2 || j != 1)
-    abort ();
-#else
-  if (i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyout (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copy (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_create (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1)
-    abort ();
-#if ACC_MEM_SHARED
-  if (i != 2 || j != 1)
-    abort ();
-#else
-  if (i != -1 || j != -2)
-    abort ();
-#endif
-
-#if 0
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#endif
-
-#if 0
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#endif
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-clauses.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-clauses.c
new file mode 100644
index 0000000..2c42497
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-clauses.c
@@ -0,0 +1,62 @@
+/* Exercise the auto, independent, seq and tile loop clauses inside
+   kernels regions.  */
+
+#include <assert.h>
+
+#define N 100
+
+void
+check (int *a, int *b)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    assert (a[i] == b[i]);
+}
+
+int
+main ()
+{
+  int i, a[N], b[N];
+
+#pragma acc kernels copy(a)
+  {
+#pragma acc loop auto
+    for (i = 0; i < N; i++)
+      a[i] = i;
+  }
+
+  for (i = 0; i < N; i++)
+    b[i] = i;
+
+  check (a, b);
+
+#pragma acc kernels copyout(a)
+  {
+#pragma acc loop independent
+    for (i = 0; i < N; i++)
+      a[i] = i;
+  }
+
+  check (a, b);
+
+#pragma acc kernels present_or_copy(a)
+  {
+#pragma acc loop seq
+    for (i = 0; i < N; i++)
+      a[i] = i;
+  }
+
+  check (a, b);
+
+#pragma acc kernels pcopyout(a) present_or_copyin(b)
+  {
+#pragma acc loop seq
+    for (i = 0; i < N; i++)
+      a[i] = b[i];
+  }
+
+  check (a, b);
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/mode-transitions.c libgomp/testsuite/libgomp.oacc-c-c++-common/mode-transitions.c
new file mode 100644
index 0000000..2394ac8
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/mode-transitions.c
@@ -0,0 +1,895 @@
+/* Miscellaneous test cases for gang/worker/vector mode transitions.  */
+
+#include <assert.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <math.h>
+#include <openacc.h>
+
+
+/* Test basic vector-partitioned mode transitions.  */
+
+void t1()
+{
+  int n = 0, arr[32], i;
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 0;
+
+  #pragma acc parallel copy(n, arr) \
+		       num_gangs(1) num_workers(1) vector_length(32)
+  {
+    int j;
+    n++;
+    #pragma acc loop vector
+    for (j = 0; j < 32; j++)
+      arr[j]++;
+    n++;
+  }
+
+  assert (n == 2);
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == 1);
+}
+
+
+/* Test vector-partitioned, gang-partitioned mode.  */
+
+void t2()
+{
+  int n[32], arr[1024], i;
+  
+  for (i = 0; i < 1024; i++)
+    arr[i] = 0;
+
+  for (i = 0; i < 32; i++)
+    n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) \
+		       num_gangs(32) num_workers(1) vector_length(32)
+  {
+    int j, k;
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      #pragma acc loop vector
+      for (k = 0; k < 32; k++)
+	arr[j * 32 + k]++;
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (n[i] == 2);
+
+  for (i = 0; i < 1024; i++)
+    assert (arr[i] == 1);
+}
+
+
+/* Test conditions inside vector-partitioned loops.  */
+
+void t4()
+{
+  int n[32], arr[1024], i;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  for (i = 0; i < 32; i++)
+    n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) \
+		       num_gangs(32) num_workers(1) vector_length(32)
+  {
+    int j, k;
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop vector
+	for (k = 0; k < 32; k++)
+	  if ((arr[j * 32 + k] % 2) != 0)
+	    arr[j * 32 + k] *= 2;
+      }
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (n[i] == 2);
+
+  for (i = 0; i < 1024; i++)
+    assert (arr[i] == ((i % 2) == 0 ? i : i * 2));
+}
+
+
+/* Test conditions inside gang-partitioned/vector-partitioned loops.  */
+
+void t5()
+{
+  int n[32], arr[1024], i;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  for (i = 0; i < 32; i++)
+    n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) \
+		       num_gangs(32) num_workers(1) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+
+    #pragma acc loop gang vector
+    for (j = 0; j < 1024; j++)
+      if ((arr[j] % 2) != 0)
+	arr[j] *= 2;
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (n[i] == 2);
+
+  for (i = 0; i < 1024; i++)
+    assert (arr[i] == ((i % 2) == 0 ? i : i * 2));
+}
+
+
+/* Test trivial operation of vector-single mode.  */
+
+void t7()
+{
+  int n = 0;
+  #pragma acc parallel copy(n) \
+		       num_gangs(1) num_workers(1) vector_length(32)
+  {
+    n++;
+  }
+  assert (n == 1);
+}
+
+
+/* Test vector-single, gang-partitioned mode.  */
+
+void t8()
+{
+  int arr[1024];
+  int gangs;
+
+  for (gangs = 1; gangs <= 1024; gangs <<= 1)
+    {
+      int i;
+
+      for (i = 0; i < 1024; i++)
+	arr[i] = 0;
+
+      #pragma acc parallel copy(arr) \
+			   num_gangs(gangs) num_workers(1) vector_length(32)
+      {
+	int j;
+	#pragma acc loop gang
+	for (j = 0; j < 1024; j++)
+	  arr[j]++;
+      }
+
+      for (i = 0; i < 1024; i++)
+	assert (arr[i] == 1);
+    }
+}
+
+
+/* Test conditions in vector-single mode.  */
+
+void t9()
+{
+  int arr[1024];
+  int gangs;
+
+  for (gangs = 1; gangs <= 1024; gangs <<= 1)
+    {
+      int i;
+
+      for (i = 0; i < 1024; i++)
+	arr[i] = 0;
+
+      #pragma acc parallel copy(arr) \
+			   num_gangs(gangs) num_workers(1) vector_length(32)
+      {
+	int j;
+	#pragma acc loop gang
+	for (j = 0; j < 1024; j++)
+	  if ((j % 3) == 0)
+	    arr[j]++;
+	  else
+	    arr[j] += 2;
+      }
+
+      for (i = 0; i < 1024; i++)
+	assert (arr[i] == ((i % 3) == 0) ? 1 : 2);
+    }
+}
+
+
+/* Test switch in vector-single mode.  */
+
+void t10()
+{
+  int arr[1024];
+  int gangs;
+
+  for (gangs = 1; gangs <= 1024; gangs <<= 1)
+    {
+      int i;
+
+      for (i = 0; i < 1024; i++)
+	arr[i] = 0;
+
+      #pragma acc parallel copy(arr) \
+			   num_gangs(gangs) num_workers(1) vector_length(32)
+      {
+	int j;
+	#pragma acc loop gang
+	for (j = 0; j < 1024; j++)
+	  switch (j % 5)
+	    {
+	    case 0: arr[j] += 1; break;
+	    case 1: arr[j] += 2; break;
+	    case 2: arr[j] += 3; break;
+	    case 3: arr[j] += 4; break;
+	    case 4: arr[j] += 5; break;
+	    default: arr[j] += 99;
+	    }
+      }
+
+      for (i = 0; i < 1024; i++)
+	assert (arr[i] == (i % 5) + 1);
+    }
+}
+
+
+/* Test switch in vector-single mode, initialise array on device.  */
+
+void t11()
+{
+  int arr[1024];
+  int i;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = 99;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(1024) num_workers(1) vector_length(32)
+  {
+    int j;
+
+    /* This loop and the one following must be distributed to available gangs
+       in the same way to ensure data dependencies are not violated (hence the
+       "static" clauses).  */
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 1024; j++)
+      arr[j] = 0;
+    
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 1024; j++)
+      switch (j % 5)
+	{
+	case 0: arr[j] += 1; break;
+	case 1: arr[j] += 2; break;
+	case 2: arr[j] += 3; break;
+	case 3: arr[j] += 4; break;
+	case 4: arr[j] += 5; break;
+	default: arr[j] += 99;
+	}
+  }
+
+  for (i = 0; i < 1024; i++)
+    assert (arr[i] == (i % 5) + 1);
+}
+
+
+/* Test multiple conditions in vector-single mode.  */
+
+#define NUM_GANGS 4096
+void t12()
+{
+  bool fizz[NUM_GANGS], buzz[NUM_GANGS], fizzbuzz[NUM_GANGS];
+  int i;
+
+  #pragma acc parallel copyout(fizz, buzz, fizzbuzz) \
+		       num_gangs(NUM_GANGS) num_workers(1) vector_length(32)
+  {
+    int j;
+    
+    /* This loop and the one following must be distributed to available gangs
+       in the same way to ensure data dependencies are not violated (hence the
+       "static" clauses).  */
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < NUM_GANGS; j++)
+      fizz[j] = buzz[j] = fizzbuzz[j] = 0;
+    
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < NUM_GANGS; j++)
+      {
+	if ((j % 3) == 0 && (j % 5) == 0)
+	  fizzbuzz[j] = 1;
+	else
+	  {
+	    if ((j % 3) == 0)
+	      fizz[j] = 1;
+	    else if ((j % 5) == 0)
+	      buzz[j] = 1;
+	  }
+      }
+  }
+
+  for (i = 0; i < NUM_GANGS; i++)
+    {
+      assert (fizzbuzz[i] == ((i % 3) == 0 && (i % 5) == 0));
+      assert (fizz[i] == ((i % 3) == 0 && (i % 5) != 0));
+      assert (buzz[i] == ((i % 3) != 0 && (i % 5) == 0));
+    }
+}
+#undef NUM_GANGS
+
+
+/* Test worker-partitioned/vector-single mode.  */
+
+void t13()
+{
+  int arr[32 * 8], i;
+
+  for (i = 0; i < 32 * 8; i++)
+    arr[i] = 0;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	int k;
+	#pragma acc loop worker
+	for (k = 0; k < 8; k++)
+          arr[j * 8 + k] += j * 8 + k;
+      }
+  }
+
+  for (i = 0; i < 32 * 8; i++)
+    assert (arr[i] == i);
+}
+
+
+/* Test worker-single/worker-partitioned transitions.  */
+
+void t16()
+{
+  int n[32], arr[32 * 32], i;
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = 0;
+
+  for (i = 0; i < 32; i++)
+    n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) \
+		       num_gangs(8) num_workers(16) vector_length(32)
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	int k;
+
+	n[j]++;
+
+	#pragma acc loop worker
+	for (k = 0; k < 32; k++)
+          arr[j * 32 + k]++;
+
+	n[j]++;
+
+	#pragma acc loop worker
+	for (k = 0; k < 32; k++)
+          arr[j * 32 + k]++;
+
+	n[j]++;
+
+	#pragma acc loop worker
+	for (k = 0; k < 32; k++)
+          arr[j * 32 + k]++;
+
+	n[j]++;
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (n[i] == 4);
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == 3);
+}
+
+
+/* Test correct synchronisation between worker-partitioned loops.  */
+
+void t17()
+{
+  int arr_a[32 * 32], arr_b[32 * 32], i;
+  int num_workers, num_gangs;
+
+  for (num_workers = 1; num_workers <= 32; num_workers <<= 1)
+    for (num_gangs = 1; num_gangs <= 32; num_gangs <<= 1)
+      {
+	for (i = 0; i < 32 * 32; i++)
+	  arr_a[i] = i;
+
+	#pragma acc parallel copyin(arr_a) copyout(arr_b) \
+			     num_gangs(num_gangs) num_workers(num_workers) vector_length(32)
+	{
+	  int j;
+	  #pragma acc loop gang
+	  for (j = 0; j < 32; j++)
+	    {
+	      int k;
+
+	      #pragma acc loop worker
+	      for (k = 0; k < 32; k++)
+        	arr_b[j * 32 + (31 - k)] = arr_a[j * 32 + k] * 2;
+
+	      #pragma acc loop worker
+	      for (k = 0; k < 32; k++)
+        	arr_a[j * 32 + (31 - k)] = arr_b[j * 32 + k] * 2;
+
+	      #pragma acc loop worker
+	      for (k = 0; k < 32; k++)
+        	arr_b[j * 32 + (31 - k)] = arr_a[j * 32 + k] * 2;
+	    }
+	}
+
+	for (i = 0; i < 32 * 32; i++)
+	  assert (arr_b[i] == (i ^ 31) * 8);
+      }
+}
+
+
+/* Test correct synchronisation between worker+vector-partitioned loops.  */
+
+void t18()
+{
+  int arr_a[32 * 32 * 32], arr_b[32 * 32 * 32], i;
+  int num_workers, num_gangs;
+
+  for (num_workers = 1; num_workers <= 32; num_workers <<= 1)
+    for (num_gangs = 1; num_gangs <= 32; num_gangs <<= 1)
+      {
+	for (i = 0; i < 32 * 32 * 32; i++)
+	  arr_a[i] = i;
+
+	#pragma acc parallel copyin(arr_a) copyout(arr_b) \
+			     num_gangs(num_gangs) num_workers(num_workers) vector_length(32)
+	{
+	  int j;
+	  #pragma acc loop gang
+	  for (j = 0; j < 32; j++)
+	    {
+	      int k;
+
+	      #pragma acc loop worker vector
+	      for (k = 0; k < 32 * 32; k++)
+        	arr_b[j * 32 * 32 + (1023 - k)] = arr_a[j * 32 * 32 + k] * 2;
+
+	      #pragma acc loop worker vector
+	      for (k = 0; k < 32 * 32; k++)
+        	arr_a[j * 32 * 32 + (1023 - k)] = arr_b[j * 32 * 32 + k] * 2;
+
+	      #pragma acc loop worker vector
+	      for (k = 0; k < 32 * 32; k++)
+        	arr_b[j * 32 * 32 + (1023 - k)] = arr_a[j * 32 * 32 + k] * 2;
+	    }
+	}
+
+	for (i = 0; i < 32 * 32 * 32; i++)
+	  assert (arr_b[i] == (i ^ 1023) * 8);
+      }
+}
+
+
+/* Test correct synchronisation between vector-partitioned loops in
+   worker-partitioned mode.  */
+
+void t19()
+{
+  int n[32 * 32], arr_a[32 * 32 * 32], arr_b[32 * 32 * 32], i;
+  int num_workers, num_gangs;
+
+  for (num_workers = 1; num_workers <= 32; num_workers <<= 1)
+    for (num_gangs = 1; num_gangs <= 32; num_gangs <<= 1)
+      {
+	for (i = 0; i < 32 * 32 * 32; i++)
+	  arr_a[i] = i;
+
+	for (i = 0; i < 32 * 32; i++)
+          n[i] = 0;
+
+	#pragma acc parallel copy (n) copyin(arr_a) copyout(arr_b) \
+			     num_gangs(num_gangs) num_workers(num_workers) vector_length(32)
+	{
+	  int j;
+	  #pragma acc loop gang
+	  for (j = 0; j < 32; j++)
+	    {
+	      int k;
+
+	      #pragma acc loop worker
+	      for (k = 0; k < 32; k++)
+		{
+		  int m;
+
+		  n[j * 32 + k]++;
+
+		  #pragma acc loop vector
+		  for (m = 0; m < 32; m++)
+		    {
+	              if (((j * 1024 + k * 32 + m) % 2) == 0)
+			arr_b[j * 1024 + k * 32 + (31 - m)]
+			  = arr_a[j * 1024 + k * 32 + m] * 2;
+		      else
+			arr_b[j * 1024 + k * 32 + (31 - m)]
+			  = arr_a[j * 1024 + k * 32 + m] * 3;
+		    }
+
+		  /* Test returning to vector-single mode...  */
+		  n[j * 32 + k]++;
+
+		  #pragma acc loop vector
+		  for (m = 0; m < 32; m++)
+		    {
+	              if (((j * 1024 + k * 32 + m) % 3) == 0)
+			arr_a[j * 1024 + k * 32 + (31 - m)]
+			  = arr_b[j * 1024 + k * 32 + m] * 5;
+		      else
+			arr_a[j * 1024 + k * 32 + (31 - m)]
+			  = arr_b[j * 1024 + k * 32 + m] * 7;
+		    }
+
+		  /* ...and back-to-back vector loops.  */
+
+		  #pragma acc loop vector
+		  for (m = 0; m < 32; m++)
+		    {
+	              if (((j * 1024 + k * 32 + m) % 2) == 0)
+			arr_b[j * 1024 + k * 32 + (31 - m)]
+			  = arr_a[j * 1024 + k * 32 + m] * 3;
+		      else
+			arr_b[j * 1024 + k * 32 + (31 - m)]
+			  = arr_a[j * 1024 + k * 32 + m] * 2;
+		    }
+		}
+	    }
+	}
+
+	for (i = 0; i < 32 * 32; i++)
+          assert (n[i] == 2);
+
+	for (i = 0; i < 32 * 32 * 32; i++)
+          {
+	    int m = 6 * ((i % 3) == 0 ? 5 : 7);
+	    assert (arr_b[i] == (i ^ 31) * m);
+	  }
+      }
+}
+
+
+/* With -O0, variables are on the stack, not in registers.  Check that worker
+   state propagation handles the stack frame.  */
+
+void t20()
+{
+  int w0 = 0;
+  int w1 = 0;
+  int w2 = 0;
+  int w3 = 0;
+  int w4 = 0;
+  int w5 = 0;
+  int w6 = 0;
+  int w7 = 0;
+
+  int i;
+
+#pragma acc parallel copy (w0, w1, w2, w3, w4, w5, w6, w7) \
+		     num_gangs (1) num_workers (8)
+  {
+    int internal = 100;
+
+#pragma acc loop worker
+    for (i = 0; i < 8; i++)
+      {
+	switch (i)
+	  {
+	  case 0: w0 = internal; break;
+	  case 1: w1 = internal; break;
+	  case 2: w2 = internal; break;
+	  case 3: w3 = internal; break;
+	  case 4: w4 = internal; break;
+	  case 5: w5 = internal; break;
+	  case 6: w6 = internal; break;
+	  case 7: w7 = internal; break;
+	  default: break;
+	  }
+      }
+  }
+
+  if (w0 != 100
+      || w1 != 100
+      || w2 != 100
+      || w3 != 100
+      || w4 != 100
+      || w5 != 100
+      || w6 != 100
+      || w7 != 100)
+    __builtin_abort ();
+}
+
+
+/* Test worker-single/vector-single mode.  */
+
+void t21()
+{
+  int arr[32], i;
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 0;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      arr[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == 1);
+}
+
+
+/* Test worker-single/vector-single mode.  */
+
+void t22()
+{
+  int arr[32], i;
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 0;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc atomic
+	arr[j]++;
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == 1);
+}
+
+
+/* Test condition in worker-single/vector-single mode.  */
+
+void t23()
+{
+  int arr[32], i;
+
+  for (i = 0; i < 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      if ((arr[j] % 2) != 0)
+	arr[j]++;
+      else
+	arr[j] += 2;
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == ((i % 2) != 0) ? i + 1 : i + 2);
+}
+
+
+/* Test switch in worker-single/vector-single mode.  */
+
+void t24()
+{
+  int arr[32], i;
+
+  for (i = 0; i < 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      switch (arr[j] % 5)
+	{
+	case 0: arr[j] += 1; break;
+	case 1: arr[j] += 2; break;
+	case 2: arr[j] += 3; break;
+	case 3: arr[j] += 4; break;
+	case 4: arr[j] += 5; break;
+	default: arr[j] += 99;
+	}
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == i + (i % 5) + 1);
+}
+
+
+/* Test worker-single/vector-partitioned mode.  */
+
+void t25()
+{
+  int arr[32 * 32], i;
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	int k;
+	#pragma acc loop vector
+	for (k = 0; k < 32; k++)
+	  {
+	    #pragma acc atomic
+	    arr[j * 32 + k]++;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + 1);
+}
+
+
+/* Test worker-single, vector-partitioned, gang-redundant mode.  */
+
+#define ACTUAL_GANGS 8
+void t27()
+{
+  int n, arr[32], i;
+  int ondev;
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 0;
+
+  n = 0;
+
+  #pragma acc parallel copy(n, arr) copyout(ondev) \
+	  num_gangs(ACTUAL_GANGS) num_workers(8) vector_length(32)
+  {
+    int j;
+
+    ondev = acc_on_device (acc_device_not_host);
+
+    #pragma acc atomic
+    n++;
+
+    #pragma acc loop vector
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc atomic
+	arr[j] += 1;
+      }
+
+    #pragma acc atomic
+    n++;
+  }
+
+  int m = ondev ? ACTUAL_GANGS : 1;
+  
+  assert (n == m * 2);
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == m);
+}
+#undef ACTUAL_GANGS
+
+
+/* Check if worker-single variables get broadcastd to vectors.  */
+
+#pragma acc routine
+float t28_routine ()
+{
+  return 2.71;
+}
+
+#define N 32
+void t28()
+{
+  float threads[N], v1 = 3.14;
+
+  for (int i = 0; i < N; i++)
+    threads[i] = -1;
+
+#pragma acc parallel num_gangs (1) vector_length (32) copy (v1)
+  {
+    float val = t28_routine ();
+
+#pragma acc loop vector
+    for (int i = 0; i < N; i++)
+      threads[i] = val + v1*i;
+  }
+
+  for (int i = 0; i < N; i++)
+    assert (fabs (threads[i] - (t28_routine () + v1*i)) < 0.0001);
+}
+#undef N
+
+
+int main()
+{
+  t1();
+  t2();
+  t4();
+  t5();
+  t7();
+  t8();
+  t9();
+  t10();
+  t11();
+  t12();
+  t13();
+  t16();
+  t17();
+  t18();
+  t19();
+  t20();
+  t21();
+  t22();
+  t23();
+  t24();
+  t25();
+  t27();
+  t28();
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-variables.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-variables.c
new file mode 100644
index 0000000..53f03d1
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/private-variables.c
@@ -0,0 +1,953 @@
+#include <assert.h>
+#include <openacc.h>
+
+typedef struct {
+  int x, y;
+} vec2;
+
+typedef struct {
+  int x, y, z;
+  int attr[13];
+} vec3_attr;
+
+
+/* Test of gang-private variables declared in local scope with parallel
+   directive.  */
+
+void local_g_1()
+{
+  int i, arr[32];
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 3;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+    int x;
+
+    #pragma acc loop gang(static:1)
+    for (i = 0; i < 32; i++)
+      x = i * 2;
+
+    #pragma acc loop gang(static:1)
+    for (i = 0; i < 32; i++)
+      {
+	if (acc_on_device (acc_device_host))
+	  x = i * 2;
+	arr[i] += x;
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == 3 + i * 2);
+}
+
+
+/* Test of worker-private variables declared in a local scope, broadcasting
+   to vector-partitioned mode.  Back-to-back worker loops.  */
+
+void local_w_1()
+{
+  int i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    int x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+
+	#pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    int x = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared in a local scope, broadcasting
+   to vector-partitioned mode.  Successive vector loops.  */
+
+void local_w_2()
+{
+  int i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    int x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	    
+	    x = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared in a local scope, broadcasting
+   to vector-partitioned mode.  Aggregate worker variable.  */
+
+void local_w_3()
+{
+  int i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    vec2 pt;
+	    
+	    pt.x = i ^ j * 3;
+	    pt.y = i | j * 5;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt.x * k;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt.y * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared in a local scope, broadcasting
+   to vector-partitioned mode.  Addressable worker variable.  */
+
+void local_w_4()
+{
+  int i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    vec2 pt, *ptp;
+	    
+	    ptp = &pt;
+	    
+	    pt.x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += ptp->x * k;
+
+	    ptp->y = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt.y * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared in a local scope, broadcasting
+   to vector-partitioned mode.  Array worker variable.  */
+
+void local_w_5()
+{
+  int i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    int pt[2];
+	    
+	    pt[0] = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt[0] * k;
+
+	    pt[1] = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt[1] * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of gang-private variables declared on loop directive.  */
+
+void loop_g_1()
+{
+  int x = 5, i, arr[32];
+
+  for (i = 0; i < 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+    #pragma acc loop gang private(x)
+    for (i = 0; i < 32; i++)
+      {
+	x = i * 2;
+	arr[i] += x;
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == i * 3);
+}
+
+
+/* Test of gang-private variables declared on loop directive, with broadcasting
+   to partitioned workers.  */
+
+void loop_g_2()
+{
+  int x = 5, i, arr[32 * 32];
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+    #pragma acc loop gang private(x)
+    for (i = 0; i < 32; i++)
+      {
+	x = i * 2;
+
+	#pragma acc loop worker
+	for (int j = 0; j < 32; j++)
+	  arr[i * 32 + j] += x;
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + (i / 32) * 2);
+}
+
+
+/* Test of gang-private variables declared on loop directive, with broadcasting
+   to partitioned vectors.  */
+
+void loop_g_3()
+{
+  int x = 5, i, arr[32 * 32];
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+    #pragma acc loop gang private(x)
+    for (i = 0; i < 32; i++)
+      {
+	x = i * 2;
+
+	#pragma acc loop vector
+	for (int j = 0; j < 32; j++)
+	  arr[i * 32 + j] += x;
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + (i / 32) * 2);
+}
+
+
+/* Test of gang-private addressable variable declared on loop directive, with
+   broadcasting to partitioned workers.  */
+
+void loop_g_4()
+{
+  int x = 5, i, arr[32 * 32];
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+    #pragma acc loop gang private(x)
+    for (i = 0; i < 32; i++)
+      {
+        int *p = &x;
+
+	x = i * 2;
+
+	#pragma acc loop worker
+	for (int j = 0; j < 32; j++)
+	  arr[i * 32 + j] += x;
+
+	(*p)--;
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + (i / 32) * 2);
+}
+
+
+/* Test of gang-private array variable declared on loop directive, with
+   broadcasting to partitioned workers.  */
+
+void loop_g_5()
+{
+  int x[8], i, arr[32 * 32];
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+    #pragma acc loop gang private(x)
+    for (i = 0; i < 32; i++)
+      {
+        for (int j = 0; j < 8; j++)
+	  x[j] = j * 2;
+
+	#pragma acc loop worker
+	for (int j = 0; j < 32; j++)
+	  arr[i * 32 + j] += x[j % 8];
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + (i % 8) * 2);
+}
+
+
+/* Test of gang-private aggregate variable declared on loop directive, with
+   broadcasting to partitioned workers.  */
+
+void loop_g_6()
+{
+  int i, arr[32 * 32];
+  vec3_attr pt;
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+    #pragma acc loop gang private(pt)
+    for (i = 0; i < 32; i++)
+      {
+        pt.x = i;
+	pt.y = i * 2;
+	pt.z = i * 4;
+	pt.attr[5] = i * 6;
+
+	#pragma acc loop worker
+	for (int j = 0; j < 32; j++)
+	  arr[i * 32 + j] += pt.x + pt.y + pt.z + pt.attr[5];
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + (i / 32) * 13);
+}
+
+
+/* Test of vector-private variables declared on loop directive.  */
+
+void loop_v_1()
+{
+  int x, i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+
+	    #pragma acc loop vector private(x)
+	    for (k = 0; k < 32; k++)
+	      {
+		x = i ^ j * 3;
+		arr[i * 1024 + j * 32 + k] += x * k;
+	      }
+
+	    #pragma acc loop vector private(x)
+	    for (k = 0; k < 32; k++)
+	      {
+		x = i | j * 5;
+		arr[i * 1024 + j * 32 + k] += x * k;
+	      }
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of vector-private variables declared on loop directive. Array type.  */
+
+void loop_v_2()
+{
+  int pt[2], i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+
+	    #pragma acc loop vector private(pt)
+	    for (k = 0; k < 32; k++)
+	      {
+	        pt[0] = i ^ j * 3;
+		pt[1] = i | j * 5;
+		arr[i * 1024 + j * 32 + k] += pt[0] * k;
+		arr[i * 1024 + j * 32 + k] += pt[1] * k;
+	      }
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared on a loop directive.  */
+
+void loop_w_1()
+{
+  int x = 5, i, arr[32 * 32];
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker private(x)
+	for (j = 0; j < 32; j++)
+	  {
+	    x = i ^ j * 3;
+	    /* Try to ensure 'x' accesses doesn't get optimized into a
+	       temporary.  */
+	    __asm__ __volatile__ ("");
+	    arr[i * 32 + j] += x;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + ((i / 32) ^ (i % 32) * 3));
+}
+
+
+/* Test of worker-private variables declared on a loop directive, broadcasting
+   to vector-partitioned mode.  */
+
+void loop_w_2()
+{
+  int x = 5, i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker private(x)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared on a loop directive, broadcasting
+   to vector-partitioned mode.  Back-to-back worker loops.  */
+
+void loop_w_3()
+{
+  int x = 5, i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker private(x)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+
+	#pragma acc loop worker private(x)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    x = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared on a loop directive, broadcasting
+   to vector-partitioned mode.  Successive vector loops.  */
+
+void loop_w_4()
+{
+  int x = 5, i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker private(x)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	    
+	    x = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared on a loop directive, broadcasting
+   to vector-partitioned mode.  Addressable worker variable.  */
+
+void loop_w_5()
+{
+  int x = 5, i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker private(x)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    int *p = &x;
+	    
+	    x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	    
+	    *p = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared on a loop directive, broadcasting
+   to vector-partitioned mode.  Aggregate worker variable.  */
+
+void loop_w_6()
+{
+  int i, arr[32 * 32 * 32];
+  vec2 pt;
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker private(pt)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    
+	    pt.x = i ^ j * 3;
+	    pt.y = i | j * 5;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt.x * k;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt.y * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared on loop directive, broadcasting
+   to vector-partitioned mode.  Array worker variable.  */
+
+void loop_w_7()
+{
+  int i, arr[32 * 32 * 32];
+  int pt[2];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  /* "pt" is treated as "present_or_copy" on the parallel directive because it
+     is an array variable.  */
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        /* But here, it is made private per-worker.  */
+        #pragma acc loop worker private(pt)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    
+	    pt[0] = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt[0] * k;
+
+	    pt[1] = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt[1] * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of gang-private variables declared on the parallel directive.  */
+
+void parallel_g_1()
+{
+  int x = 5, i, arr[32];
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 3;
+
+  #pragma acc parallel private(x) copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+    #pragma acc loop gang(static:1)
+    for (i = 0; i < 32; i++)
+      x = i * 2;
+
+    #pragma acc loop gang(static:1)
+    for (i = 0; i < 32; i++)
+      {
+	if (acc_on_device (acc_device_host))
+	  x = i * 2;
+	arr[i] += x;
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == 3 + i * 2);
+}
+
+
+/* Test of gang-private array variable declared on the parallel directive.  */
+
+void parallel_g_2()
+{
+  int x[32], i, arr[32 * 32];
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel private(x) copy(arr) num_gangs(32) num_workers(2) vector_length(32)
+  {
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        int j;
+	for (j = 0; j < 32; j++)
+	  x[j] = j * 2;
+	
+	#pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  arr[i * 32 + j] += x[31 - j];
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + (31 - (i % 32)) * 2);
+}
+
+
+int main ()
+{
+  local_g_1();
+  local_w_1();
+  local_w_2();
+  local_w_3();
+  local_w_4();
+  local_w_5();
+  loop_g_1();
+  loop_g_2();
+  loop_g_3();
+  loop_g_4();
+  loop_g_5();
+  loop_g_6();
+  loop_v_1();
+  loop_v_2();
+  loop_w_1();
+  loop_w_2();
+  loop_w_3();
+  loop_w_4();
+  loop_w_5();
+  loop_w_6();
+  loop_w_7();
+  parallel_g_1();
+  parallel_g_2();
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c
new file mode 100644
index 0000000..b23c758
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c
@@ -0,0 +1,129 @@
+/* Tests of reduction on loop directive.  */
+
+#include <assert.h>
+
+
+/* Test of reduction on loop directive (gangs, non-private reduction
+   variable).  */
+
+void g_np_1()
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  res = hres = 1;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang reduction(*:res)
+    for (i = 0; i < 12; i++)
+      res *= arr[i];
+  }
+
+  for (i = 0; i < 12; i++)
+    hres *= arr[i];
+
+  assert (res == hres);
+}
+
+
+/* Test of reduction on loop directive (gangs and vectors, non-private
+   reduction variable).  */
+
+void gv_np_1()
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang vector reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+}
+
+
+/* Test of reduction on loop directive (gangs and workers, non-private
+   reduction variable).  */
+
+void gw_np_1()
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang worker reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+}
+
+
+/* Test of reduction on loop directive (gangs, workers and vectors, non-private
+   reduction variable).  */
+
+void gwv_np_1()
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang worker vector reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+}
+
+
+int main()
+{
+  g_np_1();
+  gv_np_1();
+  gw_np_1();
+  gwv_np_1();
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/routine-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/routine-1.c
new file mode 100644
index 0000000..f112457
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-1.c
@@ -0,0 +1,88 @@
+// { dg-additional-options "-fno-exceptions" }
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#pragma acc routine
+int fact(int n)
+{
+  if (n == 0 || n == 1)
+    return 1;
+  else
+    return n * fact (n - 1);
+}
+
+int main()
+{
+  int *s, *g, *w, *v, *gw, *gv, *wv, *gwv, i, n = 10;
+
+  s = (int *) malloc (sizeof (int) * n);
+  g = (int *) malloc (sizeof (int) * n);
+  w = (int *) malloc (sizeof (int) * n);
+  v = (int *) malloc (sizeof (int) * n);
+  gw = (int *) malloc (sizeof (int) * n);
+  gv = (int *) malloc (sizeof (int) * n);
+  wv = (int *) malloc (sizeof (int) * n);
+  gwv = (int *) malloc (sizeof (int) * n);
+
+#pragma acc parallel loop async copyout(s[0:n]) seq
+  for (i = 0; i < n; i++)
+    s[i] = fact (i);
+
+#pragma acc parallel loop async copyout(g[0:n]) gang
+  for (i = 0; i < n; i++)
+    g[i] = fact (i);
+
+#pragma acc parallel loop async copyout(w[0:n]) worker
+  for (i = 0; i < n; i++)
+    w[i] = fact (i);
+
+#pragma acc parallel loop async copyout(v[0:n]) vector
+  for (i = 0; i < n; i++)
+    v[i] = fact (i);
+
+#pragma acc parallel loop async copyout(gw[0:n]) gang worker
+  for (i = 0; i < n; i++)
+    gw[i] = fact (i);
+
+#pragma acc parallel loop async copyout(gv[0:n]) gang vector
+  for (i = 0; i < n; i++)
+    gv[i] = fact (i);
+
+#pragma acc parallel loop async copyout(wv[0:n]) worker vector
+  for (i = 0; i < n; i++)
+    wv[i] = fact (i);
+
+#pragma acc parallel loop async copyout(gwv[0:n]) gang worker vector
+  for (i = 0; i < n; i++)
+    gwv[i] = fact (i);
+
+#pragma acc wait
+
+  for (i = 0; i < n; i++)
+    if (s[i] != fact (i))
+      abort ();
+  for (i = 0; i < n; i++)
+    if (g[i] != s[i])
+      abort ();
+  for (i = 0; i < n; i++)
+    if (w[i] != s[i])
+      abort ();
+  for (i = 0; i < n; i++)
+    if (v[i] != s[i])
+      abort ();
+  for (i = 0; i < n; i++)
+    if (gw[i] != s[i])
+      abort ();
+  for (i = 0; i < n; i++)
+    if (gv[i] != s[i])
+      abort ();
+  for (i = 0; i < n; i++)
+    if (wv[i] != s[i])
+      abort ();
+  for (i = 0; i < n; i++)
+    if (gwv[i] != s[i])
+      abort ();
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c
new file mode 100644
index 0000000..d6ff44d
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c
@@ -0,0 +1,123 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+#define M 8
+#define N 32
+
+#pragma acc routine vector
+void
+vector (int *a)
+{
+  int i;
+
+#pragma acc loop vector
+  for (i = 0; i < N; i++)
+    a[i] -= a[i]; 
+}
+
+#pragma acc routine worker
+void
+worker (int *b)
+{
+  int i, j;
+
+#pragma acc loop worker
+  for (i = 0; i < N; i++)
+    {
+#pragma acc loop vector
+      for (j = 0; j < M; j++)
+        b[i * M + j] += b[i  * M + j]; 
+    }
+}
+
+#pragma acc routine gang
+void
+gang (int *a)
+{
+  int i;
+
+#pragma acc loop gang worker vector
+  for (i = 0; i < N; i++)
+    a[i] -= i; 
+}
+
+#pragma acc routine seq
+void
+seq (int *a)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    a[i] += 1;
+}
+
+int
+main(int argc, char **argv)
+{
+  int i;
+  int a[N];
+  int b[M * N];
+
+  i = 0;
+
+  for (i = 0; i < N; i++)
+    a[i] = 0;
+
+#pragma acc parallel copy (a[0:N])
+  {
+#pragma acc loop seq
+    for (i = 0; i < N; i++)
+      seq (&a[0]);
+  }
+
+  for (i = 0; i < N; i++)
+    {
+      if (a[i] != N)
+	abort ();
+    }
+
+#pragma acc parallel copy (a[0:N])
+  {
+#pragma acc loop seq
+    for (i = 0; i < N; i++)
+      gang (&a[0]);
+  }
+
+  for (i = 0; i < N; i++)
+    {
+      if (a[i] != N + (N * (-1 * i)))
+	abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    a[i] = i;
+
+#pragma acc parallel copy (b[0:M*N])
+  {
+    worker (&b[0]);
+  }
+
+  for (i = 0; i < N; i++)
+    {
+      if (a[i] != i)
+	abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    a[i] = i;
+
+#pragma acc parallel copy (a[0:N])
+  {
+#pragma acc loop
+    for (i = 0; i < N; i++)
+      vector (&a[0]);
+  }
+
+  for (i = 0; i < N; i++)
+    {
+      if (a[i] != 0)
+	abort ();
+    }
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c
new file mode 100644
index 0000000..b5cbc90
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c
@@ -0,0 +1,76 @@
+/* This code uses nvptx inline assembly guarded with acc_on_device, which is
+   not optimized away at -O0, and then confuses the target assembler.
+   { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
+
+#include <stdio.h>
+#include <openacc.h>
+
+#define NUM_WORKERS 16
+#define NUM_VECTORS 32
+#define WIDTH 64
+#define HEIGHT 32
+
+#define WORK_ID(I,N)						\
+  (acc_on_device (acc_device_nvidia)				\
+   ? ({unsigned __r;						\
+       __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (__r));	\
+       __r; }) : (I % N))
+#define VEC_ID(I,N)						\
+  (acc_on_device (acc_device_nvidia)				\
+   ? ({unsigned __r;						\
+       __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (__r));	\
+       __r; }) : (I % N))
+
+#pragma acc routine worker
+void __attribute__ ((noinline))
+  WorkVec (int *ptr, int w, int h, int nw, int nv)
+{
+#pragma acc loop worker
+  for (int i = 0; i < h; i++)
+#pragma acc loop vector
+    for (int j = 0; j < w; j++)
+      ptr[i*w + j] = (WORK_ID (i, nw) << 8) | VEC_ID(j, nv);
+}
+
+int DoWorkVec (int nw)
+{
+  int ary[HEIGHT][WIDTH];
+  int err = 0;
+
+  for (int ix = 0; ix != HEIGHT; ix++)
+    for (int jx = 0; jx != WIDTH; jx++)
+      ary[ix][jx] = 0xdeadbeef;
+
+  printf ("spawning %d ...", nw); fflush (stdout);
+  
+#pragma acc parallel num_workers(nw) vector_length (NUM_VECTORS) copy (ary)
+  {
+    WorkVec ((int *)ary, WIDTH, HEIGHT, nw, NUM_VECTORS);
+  }
+
+  for (int ix = 0; ix != HEIGHT; ix++)
+    for (int jx = 0; jx != WIDTH; jx++)
+      {
+	int exp = ((ix % nw) << 8) | (jx % NUM_VECTORS);
+	
+	if (ary[ix][jx] != exp)
+	  {
+	    printf ("\nary[%d][%d] = %#x expected %#x", ix, jx,
+		    ary[ix][jx], exp);
+	    err = 1;
+	  }
+      }
+  printf (err ? " failed\n" : " ok\n");
+  
+  return err;
+}
+
+int main ()
+{
+  int err = 0;
+
+  for (int W = 1; W <= NUM_WORKERS; W <<= 1)
+    err |= DoWorkVec (W);
+
+  return err;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/update-1-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/update-1-2.c
deleted file mode 100644
index 82c3192..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/update-1-2.c
+++ /dev/null
@@ -1,361 +0,0 @@
-/* Copy of update-1.c with self exchanged with host for #pragma acc update.  */
-
-/* { dg-do run } */
-/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
-
-#include <openacc.h>
-#include <string.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <stdbool.h>
-
-int
-main (int argc, char **argv)
-{
-    int N = 8;
-    int NDIV2 = N / 2;
-    float *a, *b, *c;
-    float *d_a, *d_b, *d_c;
-    int i;
-
-    a = (float *) malloc (N * sizeof (float));
-    b = (float *) malloc (N * sizeof (float));
-    c = (float *) malloc (N * sizeof (float));
-
-    d_a = (float *) acc_malloc (N * sizeof (float));
-    d_b = (float *) acc_malloc (N * sizeof (float));
-    d_c = (float *) acc_malloc (N * sizeof (float));
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 3.0;
-        b[i] = 0.0;
-    }
-
-    acc_map_data (a, d_a, N * sizeof (float));
-    acc_map_data (b, d_b, N * sizeof (float));
-    acc_map_data (c, d_c, N * sizeof (float));
-
-#pragma acc update device (a[0:N], b[0:N])
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update self (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        if (a[i] != 3.0)
-            abort ();
-
-        if (b[i] != 3.0)
-            abort ();
-    }
-
-    if (!acc_is_present (&a[0], (N * sizeof (float))))
-      abort ();
-
-    if (!acc_is_present (&b[0], (N * sizeof (float))))
-      abort ();
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 5.0;
-        b[i] = 1.0;
-    }
-
-#pragma acc update device (a[0:N], b[0:N])
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update self (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        if (a[i] != 5.0)
-            abort ();
-
-        if (b[i] != 5.0)
-            abort ();
-    }
-
-    if (!acc_is_present (&a[0], (N * sizeof (float))))
-      abort ();
-
-    if (!acc_is_present (&b[0], (N * sizeof (float))))
-      abort ();
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 5.0;
-        b[i] = 1.0;
-    }
-
-#pragma acc update device (a[0:N], b[0:N])
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update host (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        if (a[i] != 5.0)
-            abort ();
-
-        if (b[i] != 5.0)
-            abort ();
-    }
-
-    if (!acc_is_present (&a[0], (N * sizeof (float))))
-      abort ();
-
-    if (!acc_is_present (&b[0], (N * sizeof (float))))
-      abort ();
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 6.0;
-        b[i] = 0.0;
-    }
-
-#pragma acc update device (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 9.0;
-    }
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update self (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        if (a[i] != 6.0)
-            abort ();
-
-        if (b[i] != 6.0)
-            abort ();
-    }
-
-    if (!acc_is_present (&a[0], (N * sizeof (float))))
-      abort ();
-
-    if (!acc_is_present (&b[0], (N * sizeof (float))))
-      abort ();
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 7.0;
-        b[i] = 2.0;
-    }
-
-#pragma acc update device (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 9.0;
-    }
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update self (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        if (a[i] != 7.0)
-            abort ();
-
-        if (b[i] != 7.0)
-            abort ();
-    }
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 9.0;
-    }
-
-#pragma acc update device (a[0:N])
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update self (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        if (a[i] != 9.0)
-            abort ();
-
-        if (b[i] != 9.0)
-            abort ();
-    }
-
-    if (!acc_is_present (&a[0], (N * sizeof (float))))
-      abort ();
-
-    if (!acc_is_present (&b[0], (N * sizeof (float))))
-      abort ();
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 5.0;
-    }
-
-#pragma acc update device (a[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 6.0;
-    }
-
-#pragma acc update device (a[0:NDIV2])
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update self (a[0:N], b[0:N])
-
-    for (i = 0; i < NDIV2; i++)
-    {
-        if (a[i] != 6.0)
-            abort ();
-
-        if (b[i] != 6.0)
-            abort ();
-    }
-
-    for (i = NDIV2; i < N; i++)
-    {
-        if (a[i] != 5.0)
-            abort ();
-
-        if (b[i] != 5.0)
-            abort ();
-    }
-
-    if (!acc_is_present (&a[0], (N * sizeof (float))))
-      abort ();
-
-    if (!acc_is_present (&b[0], (N * sizeof (float))))
-      abort ();
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 0.0;
-    }
-
-#pragma acc update device (a[0:4])
-
-#pragma acc parallel present (a[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            a[ii] = a[ii] + 1.0;
-    }
-
-#pragma acc update self (a[4:4])
-
-    for (i = 0; i < NDIV2; i++)
-    {
-        if (a[i] != 0.0)
-            abort ();
-    }
-
-    for (i = NDIV2; i < N; i++)
-    {
-        if (a[i] != 6.0)
-            abort ();
-    }
-
-#pragma acc update self (a[0:4])
-
-    for (i = 0; i < NDIV2; i++)
-    {
-        if (a[i] != 1.0)
-            abort ();
-    }
-
-    for (i = NDIV2; i < N; i++)
-    {
-        if (a[i] != 6.0)
-            abort ();
-    }
-
-    a[2] = 9;
-    a[3] = 9;
-    a[4] = 9;
-    a[5] = 9;
-
-#pragma acc update device (a[2:4])
-
-#pragma acc parallel present (a[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            a[ii] = a[ii] + 1.0;
-    }
-
-#pragma acc update self (a[2:4])
-
-    for (i = 0; i < 2; i++)
-    {
-      if (a[i] != 1.0)
-	abort ();
-    }
-
-    for (i = 2; i < 6; i++)
-    {
-      if (a[i] != 10.0)
-	abort ();
-    }
-
-    for (i = 6; i < N; i++)
-    {
-        if (a[i] != 6.0)
-            abort ();
-    }
-
-    return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c
index 8a51ee3..807347f 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c
@@ -20,7 +20,7 @@ main (void)
 
 #pragma acc parallel vector_length (32) copyin (a,b) copyout (c)
   {
-#pragma acc loop /* vector clause is missing, since it's not yet supported.  */
+#pragma acc loop vector
     for (unsigned int i = 0; i < n; i++)
       c[i] = a[i] + b[i];
   }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-1a.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-1a.c
deleted file mode 100644
index 99c6dfb..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-1a.c
+++ /dev/null
@@ -1,28 +0,0 @@
-#include <assert.h>
-
-/* Test worker-single/vector-single mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr[32], i;
-
-  for (i = 0; i < 32; i++)
-    arr[i] = 0;
-
-  #pragma acc parallel copy(arr) num_gangs(8) num_workers(8) vector_length(32)
-  {
-    int j;
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-	#pragma acc atomic
-	arr[j]++;
-      }
-  }
-
-  for (i = 0; i < 32; i++)
-    assert (arr[i] == 1);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-4.c
deleted file mode 100644
index 84080d0..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-4.c
+++ /dev/null
@@ -1,28 +0,0 @@
-#include <assert.h>
-
-/* Test worker-single/vector-partitioned mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr[32], i;
-
-  for (i = 0; i < 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(1) num_workers(8) vector_length(32)
-      {
-	int k;
-	#pragma acc loop vector
-	for (k = 0; k < 32; k++)
-	  {
-	    #pragma acc atomic
-	    arr[k]++;
-	  }
-      }
-
-  for (i = 0; i < 32; i++)
-    assert (arr[i] == i + 1);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-6.c
deleted file mode 100644
index cbc3e37..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-6.c
+++ /dev/null
@@ -1,46 +0,0 @@
-#include <assert.h>
-
-#if defined(ACC_DEVICE_TYPE_host)
-#define ACTUAL_GANGS 1
-#else
-#define ACTUAL_GANGS 8
-#endif
-
-/* Test worker-single, vector-partitioned, gang-redundant mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int n, arr[32], i;
-
-  for (i = 0; i < 32; i++)
-    arr[i] = 0;
-
-  n = 0;
-
-  #pragma acc parallel copy(n, arr) num_gangs(ACTUAL_GANGS) num_workers(8) \
-	  vector_length(32)
-  {
-    int j;
-
-    #pragma acc atomic
-    n++;
-
-    #pragma acc loop vector
-    for (j = 0; j < 32; j++)
-      {
-	#pragma acc atomic
-	arr[j] += 1;
-      }
-
-    #pragma acc atomic
-    n++;
-  }
-
-  assert (n == ACTUAL_GANGS * 2);
-
-  for (i = 0; i < 32; i++)
-    assert (arr[i] == ACTUAL_GANGS);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90 libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90
index b6e637b..01728bd 100644
--- libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90
@@ -132,4 +132,126 @@ program asyncwait
      if (d(i) .ne. 1.0) call abort
      if (e(i) .ne. 11.0) call abort
   end do
+
+  a(:) = 3.0
+  b(:) = 0.0
+
+  !$acc data copy (a(1:N)) copy (b(1:N))
+
+  !$acc kernels async
+  !$acc loop
+  do i = 1, N
+     b(i) = a(i)
+  end do
+  !$acc end kernels
+
+  !$acc wait
+  !$acc end data
+
+  do i = 1, N
+     if (a(i) .ne. 3.0) call abort
+     if (b(i) .ne. 3.0) call abort
+  end do
+
+  a(:) = 2.0
+  b(:) = 0.0
+
+  !$acc data copy (a(1:N)) copy (b(1:N))
+
+  !$acc kernels async (1)
+  !$acc loop
+  do i = 1, N
+     b(i) = a(i)
+  end do
+  !$acc end kernels
+
+  !$acc wait (1)
+  !$acc end data
+
+  do i = 1, N
+     if (a(i) .ne. 2.0) call abort
+     if (b(i) .ne. 2.0) call abort
+  end do
+
+  a(:) = 3.0
+  b(:) = 0.0
+  c(:) = 0.0
+  d(:) = 0.0
+
+  !$acc data copy (a(1:N)) copy (b(1:N)) copy (c(1:N)) copy (d(1:N))
+
+  !$acc kernels async (1)
+  do i = 1, N
+     b(i) = (a(i) * a(i) * a(i)) / a(i)
+  end do
+  !$acc end kernels
+
+  !$acc kernels async (1)
+  do i = 1, N
+     c(i) = (a(i) * 4) / a(i)
+  end do
+  !$acc end kernels
+
+  !$acc kernels async (1)
+  !$acc loop
+  do i = 1, N
+     d(i) = ((a(i) * a(i) + a(i)) / a(i)) - a(i)
+  end do
+  !$acc end kernels
+
+  !$acc wait (1)
+  !$acc end data
+
+  do i = 1, N
+     if (a(i) .ne. 3.0) call abort
+     if (b(i) .ne. 9.0) call abort
+     if (c(i) .ne. 4.0) call abort
+     if (d(i) .ne. 1.0) call abort
+  end do
+
+  a(:) = 2.0
+  b(:) = 0.0
+  c(:) = 0.0
+  d(:) = 0.0
+  e(:) = 0.0
+
+  !$acc data copy (a(1:N), b(1:N), c(1:N), d(1:N), e(1:N))
+
+  !$acc kernels async (1)
+  do i = 1, N
+     b(i) = (a(i) * a(i) * a(i)) / a(i)
+  end do
+  !$acc end kernels
+
+  !$acc kernels async (1)
+  !$acc loop
+  do i = 1, N
+     c(i) = (a(i) * 4) / a(i)
+  end do
+  !$acc end kernels
+
+  !$acc kernels async (1)
+  !$acc loop
+  do i = 1, N
+     d(i) = ((a(i) * a(i) + a(i)) / a(i)) - a(i)
+  end do
+  !$acc end kernels
+
+  !$acc kernels wait (1) async (1)
+  !$acc loop
+  do i = 1, N
+     e(i) = a(i) + b(i) + c(i) + d(i)
+  end do
+  !$acc end kernels
+
+  !$acc wait (1)
+  !$acc end data
+
+  do i = 1, N
+     if (a(i) .ne. 2.0) call abort
+     if (b(i) .ne. 4.0) call abort
+     if (c(i) .ne. 4.0) call abort
+     if (d(i) .ne. 1.0) call abort
+     if (e(i) .ne. 11.0) call abort
+  end do
 end program asyncwait
diff --git libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90 libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90
index bade52b..fe131b6 100644
--- libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90
@@ -1,6 +1,6 @@
 ! { dg-do run }
 
-program parallel_wait
+program asyncwait
   integer, parameter :: N = 64
   real, allocatable :: a(:), b(:), c(:)
   integer i
@@ -33,8 +33,33 @@ program parallel_wait
   do i = 1, N
     if (c(i) .ne. 2.0) call abort
   end do
+
+  !$acc kernels async (0)
+  !$acc loop
+  do i = 1, N
+    a(i) = 1
+  end do
+  !$acc end kernels
+
+  !$acc kernels async (1)
+  !$acc loop
+  do i = 1, N
+    b(i) = 1
+  end do
+  !$acc end kernels
+
+  !$acc kernels wait (0, 1)
+  !$acc loop
+  do i = 1, N
+    c(i) = a(i) + b(i)
+  end do
+  !$acc end kernels
+
+  do i = 1, N
+    if (c(i) .ne. 2.0) call abort
+  end do
   
   deallocate (a)
   deallocate (b)
   deallocate (c)
-end program parallel_wait
+end program asyncwait
diff --git libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90 libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90
index d48dc11..fa96a01 100644
--- libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90
@@ -1,6 +1,6 @@
 ! { dg-do run }
 
-program parallel_wait
+program asyncwait
   integer, parameter :: N = 64
   real, allocatable :: a(:), b(:), c(:)
   integer i
@@ -35,8 +35,35 @@ program parallel_wait
   do i = 1, N
     if (c(i) .ne. 2.0) call abort
   end do
+
+  !$acc kernels async (0)
+  !$acc loop
+  do i = 1, N
+    a(i) = 1
+  end do
+  !$acc end kernels
+
+  !$acc kernels async (1)
+  !$acc loop
+  do i = 1, N
+    b(i) = 1
+  end do
+  !$acc end kernels
+
+  !$acc wait (0, 1)
+
+  !$acc kernels
+  !$acc loop
+  do i = 1, N
+    c(i) = a(i) + b(i)
+  end do
+  !$acc end kernels
+
+  do i = 1, N
+    if (c(i) .ne. 2.0) call abort
+  end do
   
   deallocate (a)
   deallocate (b)
   deallocate (c)
-end program parallel_wait
+end program asyncwait
diff --git libgomp/testsuite/libgomp.oacc-fortran/clauses-1.f90 libgomp/testsuite/libgomp.oacc-fortran/clauses-1.f90
new file mode 100644
index 0000000..e6ab78d
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/clauses-1.f90
@@ -0,0 +1,290 @@
+! { dg-do run }
+! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
+
+program main
+  use openacc
+  implicit none
+
+  integer, parameter :: N = 32
+  real, allocatable :: a(:), b(:), c(:)
+  integer i
+
+  i = 0
+
+  allocate (a(N))
+  allocate (b(N))
+  allocate (c(N))
+
+  a(:) = 3.0
+  b(:) = 0.0
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N))
+    do i = 1, N
+      b(i) = a(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 3.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 5.0
+  b(:) = 1.0
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N))
+    do i = 1, N
+      b(i) = a(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 5.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 6.0
+  b(:) = 0.0
+
+  call acc_copyin (a, sizeof (a))
+
+  a(:) = 9.0
+
+  !$acc parallel present_or_copyin (a(1:N)) copyout (b(1:N))
+    do i = 1, N
+      b(i) = a(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 6.0) call abort
+  end do
+
+  call acc_copyout (a, sizeof (a))
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 6.0
+  b(:) = 0.0
+
+  !$acc parallel copyin (a(1:N)) present_or_copyout (b(1:N))
+     do i = 1, N
+       b(i) = a(i)
+     end do
+  !$acc end parallel
+
+  do i = 1, N
+     if (b(i) .ne. 6.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 5.0
+  b(:) = 2.0
+
+  call acc_copyin (b, sizeof (b))
+
+  !$acc parallel copyin (a(1:N)) present_or_copyout (b(1:N))
+     do i = 1, N
+       b(i) = a(i)
+     end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 5.0) call abort
+    if (b(i) .ne. 2.0) call abort
+  end do
+
+  call acc_copyout (b, sizeof (b))
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 3.0;
+  b(:) = 4.0;
+
+  !$acc parallel copy (a(1:N)) copyout (b(1:N))
+    do i = 1, N
+      a(i) = a(i) + 1
+      b(i) = a(i) + 2
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 4.0) call abort
+    if (b(i) .ne. 6.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 4.0
+  b(:) = 7.0
+
+  !$acc parallel present_or_copy (a(1:N)) present_or_copy (b(1:N))
+    do i = 1, N
+      a(i) = a(i) + 1
+      b(i) = b(i) + 2
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 5.0) call abort
+    if (b(i) .ne. 9.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 3.0
+  b(:) = 7.0
+
+  call acc_copyin (a, sizeof (a))
+  call acc_copyin (b, sizeof (b))
+
+  !$acc parallel present_or_copy (a(1:N)) present_or_copy (b(1:N))
+    do i = 1, N
+      a(i) = a(i) + 1
+      b(i) = b(i) + 2
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 3.0) call abort
+    if (b(i) .ne. 7.0) call abort
+  end do
+
+  call acc_copyout (a, sizeof (a))
+  call acc_copyout (b, sizeof (b))
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 3.0
+  b(:) = 7.0
+
+  !$acc parallel copyin (a(1:N)) create (c(1:N)) copyout (b(1:N))
+    do i = 1, N
+      c(i) = a(i)
+      b(i) = c(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 3.0) call abort
+    if (b(i) .ne. 3.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+  if (acc_is_present (c) .eqv. .TRUE.) call abort
+
+  a(:) = 4.0
+  b(:) = 8.0
+
+  !$acc parallel copyin (a(1:N)) present_or_create (c(1:N)) copyout (b(1:N))
+    do i = 1, N
+      c(i) = a(i)
+      b(i) = c(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 4.0) call abort
+    if (b(i) .ne. 4.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+  if (acc_is_present (c) .eqv. .TRUE.) call abort
+
+  a(:) = 4.0
+
+  call acc_copyin (a, sizeof (a))
+  call acc_copyin (b, sizeof (b))
+  call acc_copyin (c, sizeof (c))
+
+  !$acc parallel present (a(1:N)) present (c(1:N)) present (b(1:N))
+    do i = 1, N
+      c(i) = a(i)
+      b(i) = c(i)
+    end do
+  !$acc end parallel
+
+  call acc_copyout (a, sizeof (a))
+  call acc_copyout (b, sizeof (b))
+  call acc_copyout (c, sizeof (c))
+  
+  do i = 1, N
+    if (a(i) .ne. 4.0) call abort
+    if (b(i) .ne. 4.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+  if (acc_is_present (c) .eqv. .TRUE.) call abort
+
+  a(:) = 6.0
+  b(:) = 0.0
+
+  call acc_copyin (a, sizeof (a))
+
+  a(:) = 9.0
+
+  !$acc parallel pcopyin (a(1:N)) copyout (b(1:N))
+    do i = 1, N
+      b(i) = a(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 6.0) call abort
+  end do
+  
+  call acc_copyout (a, sizeof (a))
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 6.0
+  b(:) = 0.0
+
+  !$acc parallel copyin (a(1:N)) pcopyout (b(1:N))
+   do i = 1, N
+     b(i) = a(i)
+   end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 6.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 5.0
+  b(:) = 7.0
+
+  !$acc parallel copyin (a(1:N)) pcreate (c(1:N)) copyout (b(1:N))
+    do i = 1, N
+      c(i) = a(i)
+      b(i) = c(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 5.0) call abort
+    if (b(i) .ne. 5.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+  if (acc_is_present (c) .eqv. .TRUE.) call abort
+
+end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90 libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
index f717d1b..2d4b707 100644
--- libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
@@ -1,29 +1,22 @@
 ! { dg-do run  { target openacc_nvidia_accel_selected } }
 
+! Tests to exercise the declare directive along with
+! the clauses: copy
+!              copyin
+!              copyout
+!              create
+!              present
+!              present_or_copy
+!              present_or_copyin
+!              present_or_copyout
+!              present_or_create
+
 module vars
   implicit none
   integer z
   !$acc declare create (z)
 end module vars
 
-subroutine subr6 (a, d)
-  implicit none
-  integer, parameter :: N = 8
-  integer :: i
-  integer :: a(N)
-  !$acc declare deviceptr (a)
-  integer :: d(N)
-
-  i = 0
-
-  !$acc parallel copy (d)
-    do i = 1, N
-      d(i) = a(i) + a(i)
-    end do
-  !$acc end parallel
-
-end subroutine
-
 subroutine subr5 (a, b, c, d)
   implicit none
   integer, parameter :: N = 8
@@ -201,15 +194,6 @@ subroutine subr0 (a, b, c, d)
     if (d(i) .ne. 13) call abort
   end do
 
-  call subr6 (a, d)
-
-  call test (a, .true.)
-  call test (d, .false.)
-
-  do i = 1, N
-    if (d(i) .ne. 16) call abort
-  end do
-
 end subroutine
 
 program main
@@ -241,8 +225,7 @@ program main
     if (a(i) .ne. 8) call abort
     if (b(i) .ne. 8) call abort
     if (c(i) .ne. 8) call abort
-    if (d(i) .ne. 16) call abort
+    if (d(i) .ne. 13) call abort
   end do
 
-
 end program
diff --git libgomp/testsuite/libgomp.oacc-fortran/default-1.f90 libgomp/testsuite/libgomp.oacc-fortran/default-1.f90
new file mode 100644
index 0000000..1059089
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/default-1.f90
@@ -0,0 +1,54 @@
+! { dg-do run }
+
+program main
+  implicit none
+  real a, b
+  real c
+  !$acc declare create (c)
+
+  a = 2.0
+  b = 0.0
+
+  !$acc parallel copy (a) create (b) default (none)
+    b = a
+    a = 1.0
+    a = a + b
+  !$acc end parallel
+
+  if (a .ne. 3.0) call abort
+
+  !$acc kernels copy (a) create (b) default (none)
+    b = a
+    a = 1.0
+    a = a + b
+  !$acc end kernels
+
+  if (a .ne. 4.0) call abort
+
+  !$acc parallel default (none) copy (a) create (b)
+    b = a
+    a = 1.0
+    a = a + b
+  !$acc end parallel
+
+  if (a .ne. 5.0) call abort
+
+  !$acc parallel default (none) copy (a)
+    c = a
+    a = 1.0
+    a = a + c
+  !$acc end parallel
+
+  if (a .ne. 6.0) call abort
+
+  !$acc data copy (a)
+  !$acc parallel default (none)
+    c = a
+    a = 1.0
+    a = a + c
+  !$acc end parallel
+  !$acc end data
+
+  if (a .ne. 7.0) call abort
+
+end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/firstprivate-1.f90 libgomp/testsuite/libgomp.oacc-fortran/firstprivate-1.f90
new file mode 100644
index 0000000..d3f9093
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/firstprivate-1.f90
@@ -0,0 +1,42 @@
+! { dg-do run }
+
+program firstprivate
+  integer, parameter :: Nupper=100
+  integer :: a, b(Nupper), c, d, n
+  include "openacc_lib.h"
+
+  if (acc_get_device_type () .eq. acc_device_nvidia) then
+     n = Nupper
+  else
+     n = 1
+  end if
+
+  b(:) = -1
+  a = 5
+
+  !$acc parallel firstprivate (a) num_gangs (n)
+  !$acc loop gang
+  do i = 1, n
+     a = a + i
+     b(i) = a
+  end do
+  !$acc end parallel
+
+  do i = 1, n
+     if (b(i) .ne. i + a) call abort ()
+  end do
+
+  !$acc data copy (a)
+  !$acc parallel firstprivate (a) copyout (c)
+  a = 10
+  c = a
+  !$acc end parallel
+
+  !$acc parallel copyout (d) present (a)
+  d = a
+  !$acc end parallel
+  !$acc end data
+
+  if (c .ne. 10) call abort ()
+  if (d .ne. 5) call abort ()
+end program firstprivate
diff --git libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90 libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
new file mode 100644
index 0000000..7d56060
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
@@ -0,0 +1,79 @@
+! { dg-do run }
+
+program main
+  integer, parameter :: n = 100
+  integer i, a(n), b(n)
+  integer x
+
+  do i = 1, n
+     b(i) = i
+  end do
+
+  !$acc parallel loop gang (static:*) num_gangs (10)
+  do i = 1, n
+     a(i) = b(i) + 0
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 0, n)
+
+  !$acc parallel loop gang (static:1) num_gangs (10)
+  do i = 1, n
+     a(i) = b(i) + 1
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 1, n)
+
+  !$acc parallel loop gang (static:2) num_gangs (10)
+  do i = 1, n
+     a(i) = b(i) + 2
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 2, n)
+
+  !$acc parallel loop gang (static:5) num_gangs (10)
+  do i = 1, n
+     a(i) = b(i) + 5
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 5, n)
+
+  !$acc parallel loop gang (static:20) num_gangs (10)
+  do i = 1, n
+     a(i) = b(i) + 20
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 20, n)
+
+  x = 5
+  !$acc parallel loop gang (static:0+x) num_gangs (10)
+  do i = 1, n
+     a(i) = b(i) + 5
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 5, n)
+
+  x = 10
+  !$acc parallel loop gang (static:x) num_gangs (10)
+  do i = 1, n
+     a(i) = b(i) + 10
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 10, n)
+end program main
+
+subroutine test (a, b, sarg, n)
+  integer n
+  integer a (n), b(n), sarg
+  integer i
+
+  do i = 1, n
+     if (a(i) .ne. b(i) + sarg) call abort ()
+  end do
+end subroutine test
diff --git libgomp/testsuite/libgomp.oacc-fortran/if-1.f90 libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
new file mode 100644
index 0000000..44055e1
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
@@ -0,0 +1,886 @@
+! { dg-do run }
+! { dg-additional-options "-cpp" }
+
+program main
+  use openacc
+  implicit none
+
+  integer, parameter :: N = 8
+  integer, parameter :: one = 1
+  integer, parameter :: zero = 0
+  integer i, nn
+  real, allocatable :: a(:), b(:)
+  real exp, exp2
+
+  i = 0
+
+  allocate (a(N))
+  allocate (b(N))
+
+  a(:) = 4.0
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N)) if (1 == 1)
+     do i = 1, N
+        if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+          b(i) = a(i) + 1
+        else
+          b(i) = a(i)
+        end if
+     end do
+  !$acc end parallel
+
+#if ACC_MEM_SHARED
+  exp = 5.0
+#else
+  exp = 4.0
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 16.0
+
+  !$acc parallel if (0 == 1)
+     do i = 1, N
+       if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+         b(i) = a(i) + 1
+       else
+         b(i) = a(i)
+       end if
+     end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 17.0) call abort
+  end do
+
+  a(:) = 8.0
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N)) if (one == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+#if ACC_MEM_SHARED
+  exp = 9.0
+#else
+  exp = 8.0
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 22.0
+
+  !$acc parallel if (zero == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 23.0) call abort
+  end do
+
+  a(:) = 16.0
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N)) if (.TRUE.)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+#if ACC_MEM_SHARED
+  exp = 17.0;
+#else
+  exp = 16.0;
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 76.0
+
+  !$acc parallel if (.FALSE.)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 77.0) call abort
+  end do
+
+  a(:) = 22.0
+
+  nn = 1
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N)) if (nn == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+#if ACC_MEM_SHARED
+  exp = 23.0;
+#else
+  exp = 22.0;
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 18.0
+
+  nn = 0
+
+  !$acc parallel if (nn == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 19.0) call abort
+  end do
+
+  a(:) = 49.0
+
+  nn = 1
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N)) if ((nn + nn) > 0)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+#if ACC_MEM_SHARED
+  exp = 50.0
+#else
+  exp = 49.0
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 38.0
+
+  nn = 0;
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N)) if ((nn + nn) > 0)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 39.0) call abort
+  end do
+
+  a(:) = 91.0
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N)) if (-2 > 0)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 92.0) call abort
+  end do
+
+  a(:) = 43.0
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N)) if (one == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+#if ACC_MEM_SHARED
+  exp = 44.0
+#else
+  exp = 43.0
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 87.0
+
+  !$acc parallel if (one == 0)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 88.0) call abort
+  end do
+
+  a(:) = 3.0
+  b(:) = 9.0
+
+#if ACC_MEM_SHARED
+  exp = 0.0
+  exp2 = 0.0
+#else
+  call acc_copyin (a, sizeof (a))
+  call acc_copyin (b, sizeof (b))
+  exp = 3.0;
+  exp2 = 9.0;
+#endif
+
+  !$acc update device (a(1:N), b(1:N)) if (1 == 1)
+
+  a(:) = 0.0
+  b(:) = 0.0
+
+  !$acc update host (a(1:N), b(1:N)) if (1 == 1)
+
+  do i = 1, N
+    if (a(i) .ne. exp) call abort
+    if (b(i) .ne. exp2) call abort
+  end do
+
+  a(:) = 6.0
+  b(:) = 12.0
+
+  !$acc update device (a(1:N), b(1:N)) if (0 == 1)
+
+  a(:) = 0.0
+  b(:) = 0.0
+
+  !$acc update host (a(1:N), b(1:N)) if (1 == 1)
+
+  do i = 1, N
+    if (a(i) .ne. exp) call abort
+    if (b(i) .ne. exp2) call abort
+  end do
+
+  a(:) = 26.0
+  b(:) = 21.0
+
+  !$acc update device (a(1:N), b(1:N)) if (1 == 1)
+
+  a(:) = 0.0
+  b(:) = 0.0
+
+  !$acc update host (a(1:N), b(1:N)) if (0 == 1)
+
+  do i = 1, N
+    if (a(i) .ne. 0.0) call abort
+    if (b(i) .ne. 0.0) call abort
+  end do
+
+#if !ACC_MEM_SHARED
+  call acc_copyout (a, sizeof (a))
+  call acc_copyout (b, sizeof (b))
+#endif
+
+  a(:) = 4.0
+  b(:) = 0.0
+
+  !$acc data copyin (a(1:N)) copyout (b(1:N)) if (1 == 1)
+
+    !$acc parallel present (a(1:N))
+       do i = 1, N
+           b(i) = a(i)
+       end do
+    !$acc end parallel
+  !$acc end data
+
+  do i = 1, N
+    if (b(i) .ne. 4.0) call abort
+  end do
+
+  a(:) = 8.0
+  b(:) = 1.0
+
+  !$acc data copyin (a(1:N)) copyout (b(1:N)) if (0 == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc end data
+
+  a(:) = 18.0
+  b(:) = 21.0
+
+  !$acc data copyin (a(1:N)) if (1 == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (a) .eqv. .FALSE.) call abort
+#endif
+
+    !$acc data copyout (b(1:N)) if (0 == 1)
+#if !ACC_MEM_SHARED
+      if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+        !$acc data copyout (b(1:N)) if (1 == 1)
+
+        !$acc parallel present (a(1:N)) present (b(1:N))
+          do i = 1, N
+            b(i) = a(i)
+          end do
+      !$acc end parallel
+
+    !$acc end data
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+    !$acc end data
+  !$acc end data
+
+  do i = 1, N
+   if (b(1) .ne. 18.0) call abort
+  end do
+
+  !$acc enter data copyin (b(1:N)) if (0 == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (0 == 1)
+
+  !$acc enter data copyin (b(1:N)) if (1 == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .FALSE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (1 == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc enter data copyin (b(1:N)) if (zero == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (zero == 1)
+
+  !$acc enter data copyin (b(1:N)) if (one == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .FALSE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (one == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc enter data copyin (b(1:N)) if (one == 0)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (one == 0)
+
+  !$acc enter data copyin (b(1:N)) if (one == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .FALSE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (one == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  a(:) = 4.0
+
+  !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if (1 == 1)
+     do i = 1, N
+        if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+          b(i) = a(i) + 1
+        else
+          b(i) = a(i)
+        end if
+     end do
+  !$acc end kernels
+
+#if ACC_MEM_SHARED
+  exp = 5.0
+#else
+  exp = 4.0
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 16.0
+
+  !$acc kernels if (0 == 1)
+     do i = 1, N
+       if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+         b(i) = a(i) + 1
+       else
+         b(i) = a(i)
+       end if
+     end do
+  !$acc end kernels
+
+  do i = 1, N
+    if (b(i) .ne. 17.0) call abort
+  end do
+
+  a(:) = 8.0
+
+  !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if (one == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+#if ACC_MEM_SHARED
+  exp = 9.0
+#else
+  exp = 8.0
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 22.0
+
+  !$acc kernels if (zero == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+  do i = 1, N
+    if (b(i) .ne. 23.0) call abort
+  end do
+
+  a(:) = 16.0
+
+  !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if (.TRUE.)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+#if ACC_MEM_SHARED
+  exp = 17.0;
+#else
+  exp = 16.0;
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 76.0
+
+  !$acc kernels if (.FALSE.)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+  do i = 1, N
+    if (b(i) .ne. 77.0) call abort
+  end do
+
+  a(:) = 22.0
+
+  nn = 1
+
+  !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if (nn == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+#if ACC_MEM_SHARED
+  exp = 23.0;
+#else
+  exp = 22.0;
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 18.0
+
+  nn = 0
+
+  !$acc kernels if (nn == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+  do i = 1, N
+    if (b(i) .ne. 19.0) call abort
+  end do
+
+  a(:) = 49.0
+
+  nn = 1
+
+  !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if ((nn + nn) > 0)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+#if ACC_MEM_SHARED
+  exp = 50.0
+#else
+  exp = 49.0
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 38.0
+
+  nn = 0;
+
+  !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if ((nn + nn) > 0)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+  do i = 1, N
+    if (b(i) .ne. 39.0) call abort
+  end do
+
+  a(:) = 91.0
+
+  !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if (-2 > 0)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+  do i = 1, N
+    if (b(i) .ne. 92.0) call abort
+  end do
+
+  a(:) = 43.0
+
+  !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if (one == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+#if ACC_MEM_SHARED
+  exp = 44.0
+#else
+  exp = 43.0
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 87.0
+
+  !$acc kernels if (one == 0)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+  do i = 1, N
+    if (b(i) .ne. 88.0) call abort
+  end do
+
+  a(:) = 3.0
+  b(:) = 9.0
+
+#if ACC_MEM_SHARED
+  exp = 0.0
+  exp2 = 0.0
+#else
+  call acc_copyin (a, sizeof (a))
+  call acc_copyin (b, sizeof (b))
+  exp = 3.0;
+  exp2 = 9.0;
+#endif
+
+  !$acc update device (a(1:N), b(1:N)) if (1 == 1)
+
+  a(:) = 0.0
+  b(:) = 0.0
+
+  !$acc update host (a(1:N), b(1:N)) if (1 == 1)
+
+  do i = 1, N
+    if (a(i) .ne. exp) call abort
+    if (b(i) .ne. exp2) call abort
+  end do
+
+  a(:) = 6.0
+  b(:) = 12.0
+
+  !$acc update device (a(1:N), b(1:N)) if (0 == 1)
+
+  a(:) = 0.0
+  b(:) = 0.0
+
+  !$acc update host (a(1:N), b(1:N)) if (1 == 1)
+
+  do i = 1, N
+    if (a(i) .ne. exp) call abort
+    if (b(i) .ne. exp2) call abort
+  end do
+
+  a(:) = 26.0
+  b(:) = 21.0
+
+  !$acc update device (a(1:N), b(1:N)) if (1 == 1)
+
+  a(:) = 0.0
+  b(:) = 0.0
+
+  !$acc update host (a(1:N), b(1:N)) if (0 == 1)
+
+  do i = 1, N
+    if (a(i) .ne. 0.0) call abort
+    if (b(i) .ne. 0.0) call abort
+  end do
+
+#if !ACC_MEM_SHARED
+  call acc_copyout (a, sizeof (a))
+  call acc_copyout (b, sizeof (b))
+#endif
+
+  a(:) = 4.0
+  b(:) = 0.0
+
+  !$acc data copyin (a(1:N)) copyout (b(1:N)) if (1 == 1)
+
+    !$acc kernels present (a(1:N))
+       do i = 1, N
+           b(i) = a(i)
+       end do
+    !$acc end kernels
+  !$acc end data
+
+  do i = 1, N
+    if (b(i) .ne. 4.0) call abort
+  end do
+
+  a(:) = 8.0
+  b(:) = 1.0
+
+  !$acc data copyin (a(1:N)) copyout (b(1:N)) if (0 == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc end data
+
+  a(:) = 18.0
+  b(:) = 21.0
+
+  !$acc data copyin (a(1:N)) if (1 == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (a) .eqv. .FALSE.) call abort
+#endif
+
+    !$acc data copyout (b(1:N)) if (0 == 1)
+#if !ACC_MEM_SHARED
+      if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+        !$acc data copyout (b(1:N)) if (1 == 1)
+
+        !$acc kernels present (a(1:N)) present (b(1:N))
+          do i = 1, N
+            b(i) = a(i)
+          end do
+      !$acc end kernels
+
+    !$acc end data
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+    !$acc end data
+  !$acc end data
+
+  do i = 1, N
+   if (b(1) .ne. 18.0) call abort
+  end do
+
+  !$acc enter data copyin (b(1:N)) if (0 == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (0 == 1)
+
+  !$acc enter data copyin (b(1:N)) if (1 == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .FALSE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (1 == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc enter data copyin (b(1:N)) if (zero == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (zero == 1)
+
+  !$acc enter data copyin (b(1:N)) if (one == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .FALSE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (one == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc enter data copyin (b(1:N)) if (one == 0)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (one == 0)
+
+  !$acc enter data copyin (b(1:N)) if (one == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .FALSE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (one == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/implicit-firstprivate-ref.f90 libgomp/testsuite/libgomp.oacc-fortran/implicit-firstprivate-ref.f90
new file mode 100644
index 0000000..a5f3840
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/implicit-firstprivate-ref.f90
@@ -0,0 +1,42 @@
+! This test checks if the runtime can properly handle implicit
+! firstprivate varaibles inside subroutines in modules.
+
+! { dg-do run }
+
+module test_mod
+  contains
+    subroutine test(x)
+
+      IMPLICIT NONE
+
+      INTEGER      :: x, y, j
+
+      x = 5
+
+      !$ACC PARALLEL LOOP copyout (y)
+      DO j=1,10
+         y=x
+      ENDDO
+      !$ACC END PARALLEL LOOP
+
+      y = -1;
+
+      !$ACC PARALLEL LOOP firstprivate (y) copyout (x)
+      DO j=1,10
+         x=y
+      ENDDO
+      !$ACC END PARALLEL LOOP
+    end subroutine test
+end module test_mod
+
+program t
+  use test_mod
+
+  INTEGER      :: x_min
+
+  x_min = 8
+
+  CALL test(x_min)
+
+  if (x_min .ne. -1) call abort
+end program t
diff --git libgomp/testsuite/libgomp.oacc-fortran/pr68813.f90 libgomp/testsuite/libgomp.oacc-fortran/pr68813.f90
new file mode 100644
index 0000000..735350f
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/pr68813.f90
@@ -0,0 +1,19 @@
+program foo
+  implicit none
+  integer, parameter :: n = 100
+  integer, dimension(n,n) :: a
+  integer :: i, j, sum = 0
+
+  a = 1
+
+  !$acc parallel copyin(a(1:n,1:n)) firstprivate (sum)
+  !$acc loop gang reduction(+:sum)
+  do i=1, n
+     !$acc loop vector reduction(+:sum)
+     do j=1, n
+        sum = sum + a(i, j)
+     enddo
+  enddo
+  !$acc end parallel
+
+end program foo
diff --git libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90 libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90
new file mode 100644
index 0000000..3c1940b
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90
@@ -0,0 +1,544 @@
+! Miscellaneous tests for private variables.
+
+! { dg-do run }
+
+
+! Test of gang-private variables declared on loop directive.
+
+subroutine t1()
+  integer :: x, i, arr(32)
+
+  do i = 1, 32
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang private(x)
+  do i = 1, 32
+     x = i * 2;
+     arr(i) = arr(i) + x
+  end do
+  !$acc end parallel
+
+  do i = 1, 32
+     if (arr(i) .ne. i * 3) call abort
+  end do
+end subroutine t1
+
+
+! Test of gang-private variables declared on loop directive, with broadcasting
+! to partitioned workers.
+
+subroutine t2()
+  integer :: x, i, j, arr(0:32*32)
+
+  do i = 0, 32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang private(x)
+  do i = 0, 31
+     x = i * 2;
+
+     !$acc loop worker
+     do j = 0, 31
+        arr(i * 32 + j) = arr(i * 32 + j) + x
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 * 32 - 1
+     if (arr(i) .ne. i + (i / 32) * 2) call abort
+  end do
+end subroutine t2
+
+
+! Test of gang-private variables declared on loop directive, with broadcasting
+! to partitioned vectors.
+
+subroutine t3()
+  integer :: x, i, j, arr(0:32*32)
+
+  do i = 0, 32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang private(x)
+  do i = 0, 31
+     x = i * 2;
+
+     !$acc loop vector
+     do j = 0, 31
+        arr(i * 32 + j) = arr(i * 32 + j) + x
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 * 32 - 1
+     if (arr(i) .ne. i + (i / 32) * 2) call abort
+  end do
+end subroutine t3
+
+
+! Test of gang-private addressable variable declared on loop directive, with
+! broadcasting to partitioned workers.
+
+subroutine t4()
+  type vec3
+     integer x, y, z, attr(13)
+  end type vec3
+
+  integer i, j, arr(0:32*32)
+  type(vec3) pt
+  
+  do i = 0, 32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang private(pt)
+  do i = 0, 31
+     pt%x = i
+     pt%y = i * 2
+     pt%z = i * 4
+     pt%attr(5) = i * 6
+
+     !$acc loop vector
+     do j = 0, 31
+        arr(i * 32 + j) = arr(i * 32 + j) + pt%x + pt%y + pt%z + pt%attr(5);
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 * 32 - 1
+     if (arr(i) .ne. i + (i / 32) * 13) call abort
+  end do
+end subroutine t4
+
+
+! Test of vector-private variables declared on loop directive.
+
+subroutine t5()
+  integer :: x, i, j, k, idx, arr(0:32*32*32)
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker
+     do j = 0, 31
+        !$acc loop vector private(x)
+        do k = 0, 31
+           x = ieor(i, j * 3)
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+        !$acc loop vector private(x)
+        do k = 0, 31
+           x = ior(i, j * 5)
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t5
+
+
+! Test of vector-private variables declared on loop directive. Array type.
+
+subroutine t6()
+  integer :: i, j, k, idx, arr(0:32*32*32), pt(2)
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker
+     do j = 0, 31
+        !$acc loop vector private(x, pt)
+        do k = 0, 31
+           pt(1) = ieor(i, j * 3)
+           pt(2) = ior(i, j * 5)
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt(1) * k
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt(2) * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t6
+
+
+! Test of worker-private variables declared on a loop directive.
+
+subroutine t7()
+  integer :: x, i, j, arr(0:32*32)
+  common x
+
+  do i = 0, 32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang private(x)
+  do i = 0, 31
+     !$acc loop worker private(x)
+     do j = 0, 31
+        x = ieor(i, j * 3)
+        arr(i * 32 + j) = arr(i * 32 + j) + x
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 * 32 - 1
+     if (arr(i) .ne. i + ieor(i / 32, mod(i, 32) * 3)) call abort
+  end do
+end subroutine t7
+
+
+! Test of worker-private variables declared on a loop directive, broadcasting
+! to vector-partitioned mode.
+
+subroutine t8()
+  integer :: x, i, j, k, idx, arr(0:32*32*32)
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker private(x)
+     do j = 0, 31
+        x = ieor(i, j * 3)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k) call abort
+        end do
+     end do
+  end do
+end subroutine t8
+
+
+! Test of worker-private variables declared on a loop directive, broadcasting
+! to vector-partitioned mode.  Back-to-back worker loops.
+
+subroutine t9()
+  integer :: x, i, j, k, idx, arr(0:32*32*32)
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker private(x)
+     do j = 0, 31
+        x = ieor(i, j * 3)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+     end do
+
+     !$acc loop worker private(x)
+     do j = 0, 31
+        x = ior(i, j * 5)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t9
+
+
+! Test of worker-private variables declared on a loop directive, broadcasting
+! to vector-partitioned mode.  Successive vector loops.  */
+
+subroutine t10()
+  integer :: x, i, j, k, idx, arr(0:32*32*32)
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker private(x)
+     do j = 0, 31
+        x = ieor(i, j * 3)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+
+        x = ior(i, j * 5)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t10
+
+
+! Test of worker-private variables declared on a loop directive, broadcasting
+! to vector-partitioned mode.  Addressable worker variable.
+
+subroutine t11()
+  integer :: i, j, k, idx, arr(0:32*32*32)
+  integer, target :: x
+  integer, pointer :: p
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker private(x, p)
+     do j = 0, 31
+        p => x
+        x = ieor(i, j * 3)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+
+        p = ior(i, j * 5)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t11
+
+
+! Test of worker-private variables declared on a loop directive, broadcasting
+! to vector-partitioned mode.  Aggregate worker variable.
+
+subroutine t12()
+  type vec2
+     integer x, y
+  end type vec2
+  
+  integer :: i, j, k, idx, arr(0:32*32*32)
+  type(vec2) :: pt
+  
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker private(pt)
+     do j = 0, 31
+        pt%x = ieor(i, j * 3)
+        pt%y = ior(i, j * 5)
+        
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt%x * k
+        end do
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt%y * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t12
+
+
+! Test of worker-private variables declared on loop directive, broadcasting
+! to vector-partitioned mode.  Array worker variable.
+
+subroutine t13()
+  integer :: i, j, k, idx, arr(0:32*32*32), pt(2)
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker private(pt)
+     do j = 0, 31
+        pt(1) = ieor(i, j * 3)
+        pt(2) = ior(i, j * 5)
+        
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt(1) * k
+        end do
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt(2) * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t13
+
+
+! Test of gang-private variables declared on the parallel directive.
+
+subroutine t14()
+  use openacc
+  integer :: x = 5
+  integer, parameter :: n = 32
+  integer :: arr(n)
+
+  do i = 1, n
+    arr(i) = 3
+  end do
+
+  !$acc parallel private(x) copy(arr) num_gangs(n) num_workers(8) vector_length(32)
+    !$acc loop gang(static:1)
+    do i = 1, n
+      x = i * 2;
+    end do
+
+   !$acc loop gang(static:1)
+    do i = 1, n
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) x = i * 2
+      arr(i) = arr(i) + x
+    end do
+  !$acc end parallel
+
+  do i = 1, n
+    if (arr(i) .ne. (3 + i * 2)) call abort
+  end do
+
+end subroutine t14
+
+
+program main
+  call t1()
+  call t2()
+  call t3()
+  call t4()
+  call t5()
+  call t6()
+  call t7()
+  call t8()
+  call t9()
+  call t10()
+  call t11()
+  call t12()
+  call t13()
+  call t14()
+end program main


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Update OpenACC test cases
  2016-03-30 14:22 Update OpenACC test cases Thomas Schwinge
@ 2016-03-30 14:38 ` Jakub Jelinek
  2016-03-30 15:55   ` Thomas Schwinge
  0 siblings, 1 reply; 5+ messages in thread
From: Jakub Jelinek @ 2016-03-30 14:38 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches

On Wed, Mar 30, 2016 at 04:06:30PM +0200, Thomas Schwinge wrote:
> This is to integrate into trunk a large amount of the test case updates
> that we have accumulated on gomp-4_0-branch.  OK to commit?

Ok.

	Jakub

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Update OpenACC test cases
  2016-03-30 14:38 ` Jakub Jelinek
@ 2016-03-30 15:55   ` Thomas Schwinge
  2016-04-04 10:40     ` [gomp4] " Thomas Schwinge
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Schwinge @ 2016-03-30 15:55 UTC (permalink / raw)
  To: Jakub Jelinek, gcc-patches
  Cc: Julian Brown, Chung-Lin Tang, Cesar Philippidis, James Norris,
	Tom de Vries, Nathan Sidwell

Hi!

On Wed, 30 Mar 2016 16:13:32 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Mar 30, 2016 at 04:06:30PM +0200, Thomas Schwinge wrote:
> > This is to integrate into trunk a large amount of the test case updates
> > that we have accumulated on gomp-4_0-branch.  OK to commit?
> 
> Ok.

Thanks for the quick approval.  Committed in r234575, as posted:

commit 6a5dcab3f9ae651d06c23f81fa2457c3b604da8e
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Wed Mar 30 15:08:47 2016 +0000

    Update OpenACC test cases
    
    	gcc/testsuite/
    	* c-c++-common/goacc/combined-directives.c: Clean up dg-*
    	directives.
    	* c-c++-common/goacc/loop-clauses.c: Likewise.
    	* g++.dg/goacc/template.C: Likewise.
    	* gfortran.dg/goacc/combined-directives.f90: Likewise.
    	* gfortran.dg/goacc/loop-1.f95: Likewise.
    	* gfortran.dg/goacc/loop-5.f95: Likewise.
    	* gfortran.dg/goacc/loop-6.f95: Likewise.
    	* gfortran.dg/goacc/loop-tree-1.f90: Likewise.
    	* c-c++-common/goacc-gomp/nesting-1.c: Update.
    	* c-c++-common/goacc-gomp/nesting-fail-1.c: Likewise.
    	* c-c++-common/goacc/clauses-fail.c: Likewise.
    	* c-c++-common/goacc/parallel-1.c: Likewise.
    	* c-c++-common/goacc/reduction-1.c: Likewise.
    	* c-c++-common/goacc/reduction-2.c: Likewise.
    	* c-c++-common/goacc/reduction-3.c: Likewise.
    	* c-c++-common/goacc/reduction-4.c: Likewise.
    	* c-c++-common/goacc/routine-3.c: Likewise.
    	* c-c++-common/goacc/routine-4.c: Likewise.
    	* c-c++-common/goacc/routine-5.c: Likewise.
    	* c-c++-common/goacc/tile.c: Likewise.
    	* g++.dg/goacc/template.C: Likewise.
    	* gfortran.dg/goacc/combined-directives.f90: Likewise.
    	* c-c++-common/goacc/nesting-1.c: Move dg-error test cases into...
    	* c-c++-common/goacc/nesting-fail-1.c: ... this file.  Update.
    	* c-c++-common/goacc/kernels-1.c: Update.  Incorporate...
    	* c-c++-common/goacc/kernels-empty.c: ... this file, and...
    	* c-c++-common/goacc/kernels-eternal.c: ... this file, and...
    	* c-c++-common/goacc/kernels-noreturn.c: ... this file.
    	* c-c++-common/goacc/host_data-1.c: New file.  Incorporate...
    	* c-c++-common/goacc/use_device-1.c: ... this file.
    	* c-c++-common/goacc/host_data-2.c: New file.  Incorporate...
    	* c-c++-common/goacc/host_data-5.c: ... this file, and...
    	* c-c++-common/goacc/host_data-6.c: ... this file.
    	* c-c++-common/goacc/loop-2-kernels.c: New file.
    	* c-c++-common/goacc/loop-2-parallel.c: Likewise.
    	* c-c++-common/goacc/loop-3.c: Likewise.
    	* g++.dg/goacc/reference.C: Likewise.
    	* g++.dg/goacc/routine-1.C: Likewise.
    	* g++.dg/goacc/routine-2.C: Likewise.
    	libgomp/
    	* testsuite/libgomp.oacc-c-c++-common/clauses-1.c: Update.
    	* testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/vector-loop.c: Likewise.
    	* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/declare-1.f90: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Likewise.
    	XFAIL.
    	* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: Update.
    	Incorporate...
    	* testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c: ... this
    	file.
    	* testsuite/libgomp.oacc-c++/template-reduction.C: New file.
    	* testsuite/libgomp.oacc-c-c++-common/gang-static-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/gang-static-2.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-clauses.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/private-variables.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/routine-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/routine-4.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c: Likewise.
    	* testsuite/libgomp.oacc-fortran/clauses-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/default-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/firstprivate-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/gang-static-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/implicit-firstprivate-ref.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/pr68813.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/private-variables.f90: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Merge this
    	file...
    	* testsuite/libgomp.oacc-c-c++-common/parallel-1.c: ..., and this
    	file into...
    	* testsuite/libgomp.oacc-c-c++-common/data-clauses.h: ... this new
    	file.  Update.
    	* testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c: New
    	file.
    	* testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-2.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c:
    	... this new file.  Update.
    	* testsuite/libgomp.oacc-c-c++-common/parallel-2.c: Rename to...
    	* testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c:
    	... this new file.  Update.
    	* testsuite/libgomp.oacc-c-c++-common/mode-transitions.c: New
    	file.  Incorporate...
    	* testsuite/libgomp.oacc-c-c++-common/worker-single-1a.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/worker-single-4.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/worker-single-6.c: ... this
    	file.
    	* testsuite/libgomp.oacc-c-c++-common/update-1-2.c: Remove file.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@234575 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog                            |  49 ++
 gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c  |   2 +-
 .../c-c++-common/goacc-gomp/nesting-fail-1.c       |  36 +-
 gcc/testsuite/c-c++-common/goacc/clauses-fail.c    |  12 +
 .../c-c++-common/goacc/combined-directives.c       |   7 +-
 .../goacc/{use_device-1.c => host_data-1.c}        |  12 +-
 gcc/testsuite/c-c++-common/goacc/host_data-2.c     |  78 ++
 gcc/testsuite/c-c++-common/goacc/host_data-5.c     |  23 -
 gcc/testsuite/c-c++-common/goacc/host_data-6.c     |  25 -
 gcc/testsuite/c-c++-common/goacc/kernels-1.c       |  43 +-
 gcc/testsuite/c-c++-common/goacc/kernels-empty.c   |   6 -
 gcc/testsuite/c-c++-common/goacc/kernels-eternal.c |  11 -
 .../c-c++-common/goacc/kernels-noreturn.c          |  12 -
 gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c  | 189 ++++
 gcc/testsuite/c-c++-common/goacc/loop-2-parallel.c | 162 ++++
 gcc/testsuite/c-c++-common/goacc/loop-3.c          |  58 ++
 gcc/testsuite/c-c++-common/goacc/loop-clauses.c    |   4 -
 gcc/testsuite/c-c++-common/goacc/nesting-1.c       |   8 -
 gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c  |  29 +
 gcc/testsuite/c-c++-common/goacc/parallel-1.c      |  36 +-
 gcc/testsuite/c-c++-common/goacc/reduction-1.c     |  57 +-
 gcc/testsuite/c-c++-common/goacc/reduction-2.c     |  42 +-
 gcc/testsuite/c-c++-common/goacc/reduction-3.c     |  42 +-
 gcc/testsuite/c-c++-common/goacc/reduction-4.c     |  40 +-
 gcc/testsuite/c-c++-common/goacc/routine-3.c       | 128 ++-
 gcc/testsuite/c-c++-common/goacc/routine-4.c       |  73 ++
 gcc/testsuite/c-c++-common/goacc/routine-5.c       |  15 +
 gcc/testsuite/c-c++-common/goacc/tile.c            | 258 +++++-
 gcc/testsuite/g++.dg/goacc/reference.C             |  39 +
 gcc/testsuite/g++.dg/goacc/routine-1.C             |  13 +
 gcc/testsuite/g++.dg/goacc/routine-2.C             |  42 +
 gcc/testsuite/g++.dg/goacc/template.C              |  81 +-
 .../gfortran.dg/goacc/combined-directives.f90      |  29 +-
 gcc/testsuite/gfortran.dg/goacc/loop-1.f95         |  15 +-
 gcc/testsuite/gfortran.dg/goacc/loop-5.f95         |   6 -
 gcc/testsuite/gfortran.dg/goacc/loop-6.f95         |   8 -
 gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90    |   6 -
 libgomp/ChangeLog                                  |  68 ++
 .../libgomp.oacc-c++/template-reduction.C          |  98 +++
 .../libgomp.oacc-c-c++-common/asyncwait-1.c        | 434 ++++++++++
 .../libgomp.oacc-c-c++-common/clauses-1.c          |  26 +
 ...parallel-2.c => data-clauses-kernels-ipa-pta.c} |   2 +-
 .../data-clauses-kernels.c                         |   2 +
 ...kernels-2.c => data-clauses-parallel-ipa-pta.c} |   2 +-
 .../data-clauses-parallel.c                        |   2 +
 .../{parallel-1.c => data-clauses.h}               |  92 +-
 .../libgomp.oacc-c-c++-common/deviceptr-1.c        |  23 +-
 .../libgomp.oacc-c-c++-common/firstprivate-1.c     | 114 ++-
 .../libgomp.oacc-c-c++-common/firstprivate-2.c     |  31 -
 .../libgomp.oacc-c-c++-common/gang-static-1.c      |  48 ++
 .../libgomp.oacc-c-c++-common/gang-static-2.c      | 100 +++
 libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c | 354 +++++++-
 .../libgomp.oacc-c-c++-common/kernels-1.c          | 184 ----
 .../kernels-loop-clauses.c                         |  62 ++
 .../libgomp.oacc-c-c++-common/mode-transitions.c   | 895 +++++++++++++++++++
 .../libgomp.oacc-c-c++-common/private-variables.c  | 953 +++++++++++++++++++++
 .../libgomp.oacc-c-c++-common/reduction-7.c        | 129 +++
 .../libgomp.oacc-c-c++-common/routine-1.c          |  88 ++
 .../libgomp.oacc-c-c++-common/routine-4.c          | 123 +++
 .../libgomp.oacc-c-c++-common/routine-wv-2.c       |  76 ++
 .../libgomp.oacc-c-c++-common/update-1-2.c         | 361 --------
 .../libgomp.oacc-c-c++-common/vector-loop.c        |   2 +-
 .../libgomp.oacc-c-c++-common/worker-single-1a.c   |  28 -
 .../libgomp.oacc-c-c++-common/worker-single-4.c    |  28 -
 .../libgomp.oacc-c-c++-common/worker-single-6.c    |  46 -
 .../testsuite/libgomp.oacc-fortran/asyncwait-1.f90 | 122 +++
 .../testsuite/libgomp.oacc-fortran/asyncwait-2.f90 |  29 +-
 .../testsuite/libgomp.oacc-fortran/asyncwait-3.f90 |  31 +-
 .../testsuite/libgomp.oacc-fortran/clauses-1.f90   | 290 +++++++
 .../testsuite/libgomp.oacc-fortran/declare-1.f90   |  41 +-
 .../testsuite/libgomp.oacc-fortran/default-1.f90   |  54 ++
 .../libgomp.oacc-fortran/firstprivate-1.f90        |  42 +
 .../libgomp.oacc-fortran/gang-static-1.f90         |  79 ++
 libgomp/testsuite/libgomp.oacc-fortran/if-1.f90    | 886 +++++++++++++++++++
 .../implicit-firstprivate-ref.f90                  |  42 +
 libgomp/testsuite/libgomp.oacc-fortran/pr68813.f90 |  19 +
 .../libgomp.oacc-fortran/private-variables.f90     | 544 ++++++++++++
 77 files changed, 7134 insertions(+), 1112 deletions(-)

diff --git gcc/testsuite/ChangeLog gcc/testsuite/ChangeLog
index 658e6c5..f4a73a7 100644
--- gcc/testsuite/ChangeLog
+++ gcc/testsuite/ChangeLog
@@ -1,3 +1,52 @@
+2016-03-30  Thomas Schwinge  <thomas@codesourcery.com>
+	    Julian Brown  <julian@codesourcery.com>
+	    Chung-Lin Tang  <cltang@codesourcery.com>
+	    Cesar Philippidis  <cesar@codesourcery.com>
+	    James Norris  <jnorris@codesourcery.com>
+	    Tom de Vries  <tom@codesourcery.com>
+	    Nathan Sidwell  <nathan@codesourcery.com>
+
+	* c-c++-common/goacc/combined-directives.c: Clean up dg-*
+	directives.
+	* c-c++-common/goacc/loop-clauses.c: Likewise.
+	* g++.dg/goacc/template.C: Likewise.
+	* gfortran.dg/goacc/combined-directives.f90: Likewise.
+	* gfortran.dg/goacc/loop-1.f95: Likewise.
+	* gfortran.dg/goacc/loop-5.f95: Likewise.
+	* gfortran.dg/goacc/loop-6.f95: Likewise.
+	* gfortran.dg/goacc/loop-tree-1.f90: Likewise.
+	* c-c++-common/goacc-gomp/nesting-1.c: Update.
+	* c-c++-common/goacc-gomp/nesting-fail-1.c: Likewise.
+	* c-c++-common/goacc/clauses-fail.c: Likewise.
+	* c-c++-common/goacc/parallel-1.c: Likewise.
+	* c-c++-common/goacc/reduction-1.c: Likewise.
+	* c-c++-common/goacc/reduction-2.c: Likewise.
+	* c-c++-common/goacc/reduction-3.c: Likewise.
+	* c-c++-common/goacc/reduction-4.c: Likewise.
+	* c-c++-common/goacc/routine-3.c: Likewise.
+	* c-c++-common/goacc/routine-4.c: Likewise.
+	* c-c++-common/goacc/routine-5.c: Likewise.
+	* c-c++-common/goacc/tile.c: Likewise.
+	* g++.dg/goacc/template.C: Likewise.
+	* gfortran.dg/goacc/combined-directives.f90: Likewise.
+	* c-c++-common/goacc/nesting-1.c: Move dg-error test cases into...
+	* c-c++-common/goacc/nesting-fail-1.c: ... this file.  Update.
+	* c-c++-common/goacc/kernels-1.c: Update.  Incorporate...
+	* c-c++-common/goacc/kernels-empty.c: ... this file, and...
+	* c-c++-common/goacc/kernels-eternal.c: ... this file, and...
+	* c-c++-common/goacc/kernels-noreturn.c: ... this file.
+	* c-c++-common/goacc/host_data-1.c: New file.  Incorporate...
+	* c-c++-common/goacc/use_device-1.c: ... this file.
+	* c-c++-common/goacc/host_data-2.c: New file.  Incorporate...
+	* c-c++-common/goacc/host_data-5.c: ... this file, and...
+	* c-c++-common/goacc/host_data-6.c: ... this file.
+	* c-c++-common/goacc/loop-2-kernels.c: New file.
+	* c-c++-common/goacc/loop-2-parallel.c: Likewise.
+	* c-c++-common/goacc/loop-3.c: Likewise.
+	* g++.dg/goacc/reference.C: Likewise.
+	* g++.dg/goacc/routine-1.C: Likewise.
+	* g++.dg/goacc/routine-2.C: Likewise.
+
 2016-03-30  Richard Biener  <rguenther@suse.de>
 
 	PR middle-end/70450
diff --git gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
index dabba8c..aaf0e7a 100644
--- gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
+++ gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
@@ -20,12 +20,12 @@ f_acc_kernels (void)
   }
 }
 
+#pragma acc routine vector
 void
 f_acc_loop (void)
 {
   int i;
 
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
diff --git gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
index 5e3f183..1a33242 100644
--- gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
+++ gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
@@ -1,4 +1,5 @@
 extern int i;
+#pragma acc declare create(i)
 
 void
 f_omp (void)
@@ -14,6 +15,9 @@ f_omp (void)
 #pragma acc update host(i) /* { dg-error "OpenACC construct inside of non-OpenACC region" } */
 #pragma acc enter data copyin(i) /* { dg-error "OpenACC construct inside of non-OpenACC region" } */
 #pragma acc exit data delete(i) /* { dg-error "OpenACC construct inside of non-OpenACC region" } */
+#pragma acc loop /* { dg-error "loop directive must be associated with an OpenACC compute region" } */
+    for (i = 0; i < 2; ++i)
+      ;
   }
 
 #pragma omp for
@@ -358,85 +362,77 @@ f_acc_data (void)
   }
 }
 
+#pragma acc routine
 void
 f_acc_loop (void)
 {
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp parallel /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp parallel /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       ;
     }
 
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp for /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp for /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       for (i = 0; i < 3; i++)
 	;
     }
 
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp sections /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp sections /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       {
 	;
       }
     }
 
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp single /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp single /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       ;
     }
 
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp task /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp task /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       ;
     }
 
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp master /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp master /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       ;
     }
 
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp critical /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp critical /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       ;
     }
 
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp ordered /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp ordered /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       ;
     }
 
-#pragma acc parallel
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp target /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp target /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       ;
-#pragma omp target data map(i) /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp target data map(i) /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
       ;
-#pragma omp target update to(i) /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
+#pragma omp target update to(i) /* { dg-error "non-OpenACC construct inside of OpenACC routine" } */
     }
 }
 
diff --git gcc/testsuite/c-c++-common/goacc/clauses-fail.c gcc/testsuite/c-c++-common/goacc/clauses-fail.c
index 661d364..853d010 100644
--- gcc/testsuite/c-c++-common/goacc/clauses-fail.c
+++ gcc/testsuite/c-c++-common/goacc/clauses-fail.c
@@ -1,3 +1,5 @@
+/* Miscellaneous tests where clause parsing is expected to fail.  */
+
 void
 f (void)
 {
@@ -17,3 +19,13 @@ f (void)
   for (i = 0; i < 2; ++i)
     ;
 }
+
+
+void
+f2 (void)
+{
+  int a, b[100];
+
+#pragma acc parallel firstprivate (b[10:20]) /* { dg-error "expected ... before ... token" } */
+  ;
+}
diff --git gcc/testsuite/c-c++-common/goacc/combined-directives.c gcc/testsuite/c-c++-common/goacc/combined-directives.c
index c387285..c2a3c57 100644
--- gcc/testsuite/c-c++-common/goacc/combined-directives.c
+++ gcc/testsuite/c-c++-common/goacc/combined-directives.c
@@ -1,10 +1,7 @@
-// { dg-do compile }
-// { dg-options "-fopenacc -fdump-tree-gimple" }
+// { dg-additional-options "-fdump-tree-gimple" }
 
-// This error is temporary.  Remove when support is added for these clauses
-// in the middle end.  Also remove the comments from the reduction test
+// Remove the comments from the reduction test
 // after the FE learns that reduction variables may appear in data clauses too.
-// { dg-prune-output "sorry, unimplemented" }
 
 void
 test ()
diff --git gcc/testsuite/c-c++-common/goacc/use_device-1.c gcc/testsuite/c-c++-common/goacc/host_data-1.c
similarity index 61%
rename from gcc/testsuite/c-c++-common/goacc/use_device-1.c
rename to gcc/testsuite/c-c++-common/goacc/host_data-1.c
index 9a4f6d0..0c7a857 100644
--- gcc/testsuite/c-c++-common/goacc/use_device-1.c
+++ gcc/testsuite/c-c++-common/goacc/host_data-1.c
@@ -1,4 +1,14 @@
-/* { dg-do compile } */
+/* Test valid use of host_data directive.  */
+
+int v1[3][3];
+
+void
+f (void)
+{
+#pragma acc host_data use_device(v1)
+  ;
+}
+
 
 void bar (float *, float *);
 
diff --git gcc/testsuite/c-c++-common/goacc/host_data-2.c gcc/testsuite/c-c++-common/goacc/host_data-2.c
new file mode 100644
index 0000000..bdce424
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/host_data-2.c
@@ -0,0 +1,78 @@
+/* Test invalid use of host_data directive.  */
+
+int v0;
+#pragma acc host_data use_device(v0) /* { dg-error "expected declaration specifiers before" } */
+
+
+void
+f (void)
+{
+  int v2 = 3;
+#pragma acc host_data copy(v2) /* { dg-error ".copy. is not valid for ..pragma acc host_data." } */
+  ;
+
+#pragma acc host_data use_device(v2)
+  ;
+  /* { dg-error ".use_device_ptr. variable is neither a pointer nor an array" "" { target c } 14 } */
+  /* { dg-error ".use_device_ptr. variable is neither a pointer, nor an arraynor reference to pointer or array" "" { target c++ } 14 } */
+  
+#pragma acc host_data use_device(v0)
+  ;
+  /* { dg-error ".use_device_ptr. variable is neither a pointer nor an array" "" { target c } 19 } */
+  /* { dg-error ".use_device_ptr. variable is neither a pointer, nor an arraynor reference to pointer or array" "" { target c++ } 19 } */
+}
+
+
+void
+f2 (void)
+{
+  int x[100];
+
+#pragma acc enter data copyin (x)
+  /* Specifying an array index is not valid for host_data/use_device.  */
+#pragma acc host_data use_device (x[4]) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  ;
+#pragma acc exit data delete (x)
+}
+
+
+void
+f3 (void)
+{
+  int x[100];
+
+#pragma acc data copyin (x[25:50])
+  {
+    int *xp;
+#pragma acc host_data use_device (x)
+    {
+      /* This use of the present clause is undefined behavior for OpenACC.  */
+#pragma acc parallel present (x) copyout (xp) /* { dg-error "variable .x. declared in enclosing .host_data. region" } */
+      {
+        xp = x;
+      }
+    }
+  }
+}
+
+
+void
+f4 (void)
+{
+  int x[50];
+
+#pragma acc data copyin (x[10:30])
+  {
+    int *xp;
+#pragma acc host_data use_device (x)
+    {
+      /* Here 'x' being implicitly firstprivate for the parallel region
+	 conflicts with it being declared as use_device in the enclosing
+	 host_data region.  */
+#pragma acc parallel copyout (xp)
+      {
+        xp = x; /* { dg-error "variable .x. declared in enclosing .host_data. region" } */
+      }
+    }
+  }
+}
diff --git gcc/testsuite/c-c++-common/goacc/host_data-5.c gcc/testsuite/c-c++-common/goacc/host_data-5.c
deleted file mode 100644
index a4206c8..0000000
--- gcc/testsuite/c-c++-common/goacc/host_data-5.c
+++ /dev/null
@@ -1,23 +0,0 @@
-/* { dg-do compile } */
-
-#define N 1024
-
-int main (int argc, char* argv[])
-{
-  int x[N];
-
-#pragma acc data copyin (x[0:N])
-  {
-    int *xp;
-#pragma acc host_data use_device (x)
-    {
-      /* This use of the present clause is undefined behavior for OpenACC.  */
-#pragma acc parallel present (x) copyout (xp) /* { dg-error "variable 'x' declared in enclosing 'host_data' region" } */
-      {
-        xp = x;
-      }
-    }
-  }
-
-  return 0;
-}
diff --git gcc/testsuite/c-c++-common/goacc/host_data-6.c gcc/testsuite/c-c++-common/goacc/host_data-6.c
deleted file mode 100644
index 8be7912..0000000
--- gcc/testsuite/c-c++-common/goacc/host_data-6.c
+++ /dev/null
@@ -1,25 +0,0 @@
-/* { dg-do compile } */
-
-#define N 1024
-
-int main (int argc, char* argv[])
-{
-  int x[N];
-
-#pragma acc data copyin (x[0:N])
-  {
-    int *xp;
-#pragma acc host_data use_device (x)
-    {
-      /* Here 'x' being implicitly firstprivate for the parallel region
-	 conflicts with it being declared as use_device in the enclosing
-	 host_data region.  */
-#pragma acc parallel copyout (xp)
-      {
-        xp = x; /* { dg-error "variable 'x' declared in enclosing 'host_data' region" } */
-      }
-    }
-  }
-
-  return 0;
-}
diff --git gcc/testsuite/c-c++-common/goacc/kernels-1.c gcc/testsuite/c-c++-common/goacc/kernels-1.c
index e91b81c..4fcf86e 100644
--- gcc/testsuite/c-c++-common/goacc/kernels-1.c
+++ gcc/testsuite/c-c++-common/goacc/kernels-1.c
@@ -1,6 +1,45 @@
-void
-foo (void)
+int
+kernels_empty (void)
 {
 #pragma acc kernels
   ;
+
+  return 0;
+}
+
+int
+kernels_eternal (void)
+{
+#pragma acc kernels
+  {
+    while (1)
+      ;
+  }
+
+  return 0;
+}
+
+int
+kernels_noreturn (void)
+{
+#pragma acc kernels
+  __builtin_abort ();
+
+  return 0;
+}
+
+
+float b[10][15][10];
+
+void
+kernels_loop_ptr_it (void)
+{
+  float *i;
+
+#pragma acc kernels
+  {
+#pragma acc loop
+    for (i = &b[0][0][0]; i < &b[0][0][10]; i++)
+      ;
+  }
 }
diff --git gcc/testsuite/c-c++-common/goacc/kernels-empty.c gcc/testsuite/c-c++-common/goacc/kernels-empty.c
deleted file mode 100644
index e91b81c..0000000
--- gcc/testsuite/c-c++-common/goacc/kernels-empty.c
+++ /dev/null
@@ -1,6 +0,0 @@
-void
-foo (void)
-{
-#pragma acc kernels
-  ;
-}
diff --git gcc/testsuite/c-c++-common/goacc/kernels-eternal.c gcc/testsuite/c-c++-common/goacc/kernels-eternal.c
deleted file mode 100644
index edc17d2..0000000
--- gcc/testsuite/c-c++-common/goacc/kernels-eternal.c
+++ /dev/null
@@ -1,11 +0,0 @@
-int
-main (void)
-{
-#pragma acc kernels
-  {
-    while (1)
-      ;
-  }
-
-  return 0;
-}
diff --git gcc/testsuite/c-c++-common/goacc/kernels-noreturn.c gcc/testsuite/c-c++-common/goacc/kernels-noreturn.c
deleted file mode 100644
index 1a8cc67..0000000
--- gcc/testsuite/c-c++-common/goacc/kernels-noreturn.c
+++ /dev/null
@@ -1,12 +0,0 @@
-int
-main (void)
-{
-
-#pragma acc kernels
-  {
-    __builtin_abort ();
-  }
-
-  return 0;
-}
-
diff --git gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
new file mode 100644
index 0000000..01ad32d
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
@@ -0,0 +1,189 @@
+void K(void)
+{
+  int i, j;
+
+#pragma acc kernels
+  {
+#pragma acc loop auto
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang(5)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang(num:5)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang(static:5)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang(static:*)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang
+    for (i = 0; i < 10; i++)
+      {
+#pragma acc loop vector 
+	for (j = 0; j < 10; j++)
+	  { }
+#pragma acc loop worker 
+	for (j = 0; j < 10; j++)
+	  { }
+#pragma acc loop gang // { dg-error "inner loop uses same" }
+	for (j = 0; j < 10; j++)
+	  { }
+      }
+#pragma acc loop seq gang // { dg-error "'seq' overrides" }
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop worker
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker(5)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker(num:5)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker
+    for (i = 0; i < 10; i++)
+      {
+#pragma acc loop vector 
+	for (j = 0; j < 10; j++)
+	  { }
+#pragma acc loop worker // { dg-error "inner loop uses same" }
+	for (j = 0; j < 10; j++)
+	  { }
+#pragma acc loop gang
+	for (j = 0; j < 10; j++)
+	  { }
+      }
+#pragma acc loop seq worker // { dg-error "'seq' overrides" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang worker
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop vector
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector(5)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector(length:5)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector
+    for (i = 0; i < 10; i++)
+      {
+#pragma acc loop vector // { dg-error "inner loop uses same" }
+	for (j = 1; j < 10; j++)
+	  { }
+#pragma acc loop worker
+	for (j = 1; j < 10; j++)
+	  { }
+#pragma acc loop gang
+	for (j = 1; j < 10; j++)
+	  { }
+      }
+#pragma acc loop seq vector // { dg-error "'seq' overrides" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang vector
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker vector
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop auto
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop seq auto // { dg-error "'seq' overrides" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang auto // { dg-error "'auto' conflicts" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker auto // { dg-error "'auto' conflicts" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector auto // { dg-error "'auto' conflicts" }
+    for (i = 0; i < 10; i++)
+      { }
+  }
+
+#pragma acc kernels loop auto
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop gang
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop gang(5)
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop gang(num:5)
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop gang(static:5)
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop gang(static:*)
+  for (i = 0; i < 10; i++)
+    { }
+
+#pragma acc kernels loop worker
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop worker(5)
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop worker(num:5)
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop seq worker // { dg-error "'seq' overrides" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+    { }
+#pragma acc kernels loop gang worker
+  for (i = 0; i < 10; i++)
+    { }
+
+#pragma acc kernels loop vector
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop vector(5)
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop vector(length:5)
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop seq vector // { dg-error "'seq' overrides" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+    { }
+#pragma acc kernels loop gang vector
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop worker vector
+  for (i = 0; i < 10; i++)
+    { }
+
+#pragma acc kernels loop auto
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop seq auto // { dg-error "'seq' overrides" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+    { }
+#pragma acc kernels loop gang auto // { dg-error "'auto' conflicts" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'auto' conflicts" "" { target c++ } }
+    { }
+#pragma acc kernels loop worker auto // { dg-error "'auto' conflicts" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'auto' conflicts" "" { target c++ } }
+    { }
+#pragma acc kernels loop vector auto // { dg-error "'auto' conflicts" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'auto' conflicts" "" { target c++ } }
+    { }
+}
diff --git gcc/testsuite/c-c++-common/goacc/loop-2-parallel.c gcc/testsuite/c-c++-common/goacc/loop-2-parallel.c
new file mode 100644
index 0000000..0ef5741
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/loop-2-parallel.c
@@ -0,0 +1,162 @@
+void P(void)
+{
+  int i, j;
+
+#pragma acc parallel
+  {
+#pragma acc loop auto
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang(static:5)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang(static:*)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang // { dg-message "containing loop" }
+    for (i = 0; i < 10; i++)
+      {
+#pragma acc loop vector
+	for (j = 1; j < 10; j++)
+	  { }
+#pragma acc loop worker 
+	for (j = 1; j < 10; j++)
+	  { }
+#pragma acc loop gang // { dg-error "inner loop uses same" }
+	for (j = 1; j < 10; j++)
+	  { }
+      }
+#pragma acc loop seq gang // { dg-error "'seq' overrides" }
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop worker
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker // { dg-message "containing loop" 2 }
+    for (i = 0; i < 10; i++)
+      {
+#pragma acc loop vector 
+	for (j = 1; j < 10; j++)
+	  { }
+#pragma acc loop worker // { dg-error "inner loop uses same" }
+	for (j = 1; j < 10; j++)
+	  { }
+#pragma acc loop gang // { dg-error "incorrectly nested" }
+	for (j = 1; j < 10; j++)
+	  { }
+      }
+#pragma acc loop seq worker // { dg-error "'seq' overrides" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang worker
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop vector
+    for (i = 0; i < 10; i++)
+      { }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector // { dg-message "containing loop" 3 }
+    for (i = 0; i < 10; i++)
+      {
+#pragma acc loop vector // { dg-error "inner loop uses same" }
+	for (j = 1; j < 10; j++)
+	  { }
+#pragma acc loop worker // { dg-error "incorrectly nested" }
+	for (j = 1; j < 10; j++)
+	  { }
+#pragma acc loop gang // { dg-error "incorrectly nested" }
+	for (j = 1; j < 10; j++)
+	  { }
+      }
+#pragma acc loop seq vector // { dg-error "'seq' overrides" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang vector
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker vector
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop auto
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop seq auto // { dg-error "'seq' overrides" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang auto // { dg-error "'auto' conflicts" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker auto // { dg-error "'auto' conflicts" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector auto // { dg-error "'auto' conflicts" }
+    for (i = 0; i < 10; i++)
+      { }
+
+  }
+
+#pragma acc parallel loop auto
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop gang
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop gang(static:5)
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop gang(static:*)
+  for (i = 0; i < 10; i++)
+    { }
+
+#pragma acc parallel loop seq gang // { dg-error "'seq' overrides" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+    { }
+
+#pragma acc parallel loop worker
+  for (i = 0; i < 10; i++)
+    { }
+
+#pragma acc parallel loop seq worker // { dg-error "'seq' overrides" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+    { }
+#pragma acc parallel loop gang worker
+  for (i = 0; i < 10; i++)
+    { }
+
+#pragma acc parallel loop vector
+  for (i = 0; i < 10; i++)
+    { }
+
+#pragma acc parallel loop seq vector // { dg-error "'seq' overrides" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+    { }
+#pragma acc parallel loop gang vector
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop worker vector
+  for (i = 0; i < 10; i++)
+    { }
+
+#pragma acc parallel loop auto
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop seq auto // { dg-error "'seq' overrides" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'seq' overrides" "" { target c++ } }
+    { }
+#pragma acc parallel loop gang auto // { dg-error "'auto' conflicts" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'auto' conflicts" "" { target c++ } }
+    { }
+#pragma acc parallel loop worker auto // { dg-error "'auto' conflicts" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'auto' conflicts" "" { target c++ } }
+    { }
+#pragma acc parallel loop vector auto // { dg-error "'auto' conflicts" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "'auto' conflicts" "" { target c++ } }
+    { }
+}
diff --git gcc/testsuite/c-c++-common/goacc/loop-3.c gcc/testsuite/c-c++-common/goacc/loop-3.c
new file mode 100644
index 0000000..44b65a8
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/loop-3.c
@@ -0,0 +1,58 @@
+void par1 (void)
+{
+  int i, j;
+
+#pragma acc parallel
+  {
+#pragma acc loop gang(5) // { dg-error "argument not permitted" }
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop gang(num:5) // { dg-error "argument not permitted" }
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop worker(5) // { dg-error "argument not permitted" }
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop worker(num:5) // { dg-error "argument not permitted" }
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop vector(5) // { dg-error "argument not permitted" }
+    for (i = 0; i < 10; i++)
+      { }
+
+#pragma acc loop vector(length:5) // { dg-error "argument not permitted" }
+    for (i = 0; i < 10; i++)
+      { }
+
+   }
+}
+
+void p2 (void)
+{
+  int i, j;
+
+#pragma acc parallel loop gang(5) // { dg-error "argument not permitted" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "argument not permitted" "" { target c++ } }
+    { }
+#pragma acc parallel loop gang(num:5) // { dg-error "argument not permitted" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "argument not permitted" "" { target c++ } }
+    { }
+
+#pragma acc parallel loop worker(5) // { dg-error "argument not permitted" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "argument not permitted" "" { target c++ } }
+    { }
+#pragma acc parallel loop worker(num:5) // { dg-error "argument not permitted" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "argument not permitted" "" { target c++ } }
+    { }
+
+#pragma acc parallel loop vector(5) // { dg-error "argument not permitted" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "argument not permitted" "" { target c++ } }
+    { }
+#pragma acc parallel loop vector(length:5) // { dg-error "argument not permitted" "" { target c } }
+  for (i = 0; i < 10; i++) // { dg-error "argument not permitted" "" { target c++ } }
+    { }
+}
diff --git gcc/testsuite/c-c++-common/goacc/loop-clauses.c gcc/testsuite/c-c++-common/goacc/loop-clauses.c
index 97b8786..f3c7207 100644
--- gcc/testsuite/c-c++-common/goacc/loop-clauses.c
+++ gcc/testsuite/c-c++-common/goacc/loop-clauses.c
@@ -1,7 +1,3 @@
-/* { dg-do compile } */
-
-/* { dg-prune-output "sorry, unimplemented" } */
-
 int
 main ()
 {
diff --git gcc/testsuite/c-c++-common/goacc/nesting-1.c gcc/testsuite/c-c++-common/goacc/nesting-1.c
index 3a8f838..cab4f98 100644
--- gcc/testsuite/c-c++-common/goacc/nesting-1.c
+++ gcc/testsuite/c-c++-common/goacc/nesting-1.c
@@ -58,10 +58,6 @@ f_acc_data (void)
 
 #pragma acc exit data delete(i)
 
-#pragma acc loop /* { dg-error "loop directive must be associated with an OpenACC compute region" } */
-    for (i = 0; i < 2; ++i)
-      ;
-
 #pragma acc data
     {
 #pragma acc parallel
@@ -92,10 +88,6 @@ f_acc_data (void)
 #pragma acc enter data copyin(i)
 
 #pragma acc exit data delete(i)
-
-#pragma acc loop /* { dg-error "loop directive must be associated with an OpenACC compute region" } */
-      for (i = 0; i < 2; ++i)
-	;
     }
   }
 }
diff --git gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c
index 506a1ae..93a9111 100644
--- gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c
+++ gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c
@@ -38,6 +38,25 @@ f_acc_kernels (void)
   }
 }
 
+void
+f_acc_data (void)
+{
+  unsigned int i;
+#pragma acc data
+  {
+#pragma acc loop /* { dg-error "loop directive must be associated with an OpenACC compute region" } */
+    for (i = 0; i < 2; ++i)
+      ;
+
+#pragma acc data
+    {
+#pragma acc loop /* { dg-error "loop directive must be associated with an OpenACC compute region" } */
+      for (i = 0; i < 2; ++i)
+	;
+    }
+  }
+}
+
 #pragma acc routine
 void
 f_acc_routine (void)
@@ -45,3 +64,13 @@ f_acc_routine (void)
 #pragma acc parallel /* { dg-error "OpenACC region inside of OpenACC routine, nested parallelism not supported yet" } */
   ;
 }
+
+void
+f (void)
+{
+  int i, v = 0;
+
+#pragma acc loop gang reduction (+:v) /* { dg-error "loop directive must be associated with an OpenACC compute region" } */
+  for (i = 0; i < 10; i++)
+    v++;
+}
diff --git gcc/testsuite/c-c++-common/goacc/parallel-1.c gcc/testsuite/c-c++-common/goacc/parallel-1.c
index a860526..6c6cc88 100644
--- gcc/testsuite/c-c++-common/goacc/parallel-1.c
+++ gcc/testsuite/c-c++-common/goacc/parallel-1.c
@@ -1,6 +1,38 @@
-void
-foo (void)
+int
+parallel_empty (void)
 {
 #pragma acc parallel
   ;
+
+  return 0;
+}
+
+int
+parallel_eternal (void)
+{
+#pragma acc parallel
+  {
+    while (1)
+      ;
+  }
+
+  return 0;
+}
+
+int
+parallel_noreturn (void)
+{
+#pragma acc parallel
+  __builtin_abort ();
+
+  return 0;
+}
+
+int
+parallel_clauses (void)
+{
+  int a, b[100];
+
+#pragma acc parallel firstprivate (a, b)
+  ;
 }
diff --git gcc/testsuite/c-c++-common/goacc/reduction-1.c gcc/testsuite/c-c++-common/goacc/reduction-1.c
index de97125..3c1c2dd 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-1.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-1.c
@@ -1,70 +1,65 @@
-/* { dg-require-effective-target alloca } */
 /* Integer reductions.  */
 
-#define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   int result, array[n];
   int lresult;
 
   /* '+' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (*:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-//   result = 0;
-//   vresult = 0;
-// 
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-//
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
 
   /* '&' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&:result)
   for (i = 0; i < n; i++)
     result &= array[i];
 
   /* '|' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (|:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (|:result)
   for (i = 0; i < n; i++)
     result |= array[i];
 
   /* '^' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (^:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (^:result)
   for (i = 0; i < n; i++)
     result ^= array[i];
 
   /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
 
diff --git gcc/testsuite/c-c++-common/goacc/reduction-2.c gcc/testsuite/c-c++-common/goacc/reduction-2.c
index 2964236..c3105a2 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-2.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-2.c
@@ -1,49 +1,47 @@
-/* { dg-require-effective-target alloca } */
 /* float reductions.  */
 
-#define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   float result, array[n];
   int lresult;
 
   /* '+' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (*:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-// 
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
 
   /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
 
diff --git gcc/testsuite/c-c++-common/goacc/reduction-3.c gcc/testsuite/c-c++-common/goacc/reduction-3.c
index 34c51c2..4dbde04 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-3.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-3.c
@@ -1,49 +1,47 @@
-/* { dg-require-effective-target alloca } */
 /* double reductions.  */
 
-#define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   double result, array[n];
   int lresult;
 
   /* '+' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (*:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-// 
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
 
   /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
 
diff --git gcc/testsuite/c-c++-common/goacc/reduction-4.c gcc/testsuite/c-c++-common/goacc/reduction-4.c
index 328c0d4..c4572b9 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-4.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-4.c
@@ -1,51 +1,35 @@
-/* { dg-require-effective-target alloca } */
 /* complex reductions.  */
 
-#define vl 32
+#define n 1000
 
 int
 main(void)
 {
-  const int n = 1000;
   int i;
   __complex__ double result, array[n];
   int lresult;
 
   /* '+' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (+:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
-  /* Needs support for complex multiplication.  */
-
-//   /* '*' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (*:result)
-//   for (i = 0; i < n; i++)
-//     result *= array[i];
-//
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-// 
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* '*' reductions.  */
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
+  for (i = 0; i < n; i++)
+    result *= array[i];
 
   /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (&&:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (__real__(result) > __real__(array[i]));
 
   /* '||' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop reduction (||:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (__real__(result) > __real__(array[i]));
 
diff --git gcc/testsuite/c-c++-common/goacc/routine-3.c gcc/testsuite/c-c++-common/goacc/routine-3.c
index e6f83bd..b322d26 100644
--- gcc/testsuite/c-c++-common/goacc/routine-3.c
+++ gcc/testsuite/c-c++-common/goacc/routine-3.c
@@ -1,52 +1,118 @@
+/* Test invalid calls to routines.  */
+
 #pragma acc routine gang
-void gang (void) /* { dg-message "declared here" 3 } */
+int
+gang () /* { dg-message "declared here" 3 } */
 {
+  #pragma acc loop gang worker vector
+  for (int i = 0; i < 10; i++)
+    {
+    }
+
+  return 1;
 }
 
 #pragma acc routine worker
-void worker (void) /* { dg-message "declared here" 2 } */
+int
+worker () /* { dg-message "declared here" 2 } */
 {
+  #pragma acc loop worker vector
+  for (int i = 0; i < 10; i++)
+    {
+    }
+
+  return 1;
 }
 
 #pragma acc routine vector
-void vector (void) /* { dg-message "declared here" 1 } */
+int
+vector () /* { dg-message "declared here" } */
 {
+  #pragma acc loop vector
+  for (int i = 0; i < 10; i++)
+    {
+    }
+
+  return 1;
 }
 
 #pragma acc routine seq
-void seq (void)
+int
+seq ()
 {
+  return 1;
 }
 
-int main ()
+int
+main ()
 {
-
-#pragma acc parallel num_gangs (32) num_workers (32) vector_length (32)
+  int red = 0;
+#pragma acc parallel copy (red)
   {
-    #pragma acc loop gang /* { dg-message "loop here" 1 } */
-    for (int i = 0; i < 10; i++)
-      {
-	gang (); /*  { dg-error "routine call uses same" } */
-	worker ();
-	vector ();
-	seq ();
-      }
-    #pragma acc loop worker /* { dg-message "loop here" 2 } */
-    for (int i = 0; i < 10; i++)
-      {
-	gang (); /*  { dg-error "routine call uses same" } */
-	worker (); /*  { dg-error "routine call uses same" } */
-	vector ();
-	seq ();
-      }
-    #pragma acc loop vector /* { dg-message "loop here" 3 } */
-    for (int i = 0; i < 10; i++)
-      {
-	gang (); /*  { dg-error "routine call uses same" } */
-	worker (); /*  { dg-error "routine call uses same" } */
-	vector (); /*  { dg-error "routine call uses same" } */
-	seq ();
-      }
+    /* Independent/seq loop tests.  */
+#pragma acc loop reduction (+:red) // { dg-warning "insufficient partitioning" }
+    for (int i = 0; i < 10; i++)
+      red += gang ();
+
+#pragma acc loop reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += worker ();
+
+#pragma acc loop reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += vector ();
+
+    /* Gang routine tests.  */
+#pragma acc loop gang reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += gang (); // { dg-error "routine call uses same" }
+
+#pragma acc loop worker reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += gang (); // { dg-error "routine call uses same" }
+
+#pragma acc loop vector reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += gang (); // { dg-error "routine call uses same" }
+
+    /* Worker routine tests.  */
+#pragma acc loop gang reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += worker ();
+
+#pragma acc loop worker reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += worker (); // { dg-error "routine call uses same" }
+
+#pragma acc loop vector reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += worker (); // { dg-error "routine call uses same" }
+
+    /* Vector routine tests.  */
+#pragma acc loop gang reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += vector ();
+
+#pragma acc loop worker reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += vector ();
+
+#pragma acc loop vector reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += vector (); // { dg-error "routine call uses same" }
+
+    /* Seq routine tests.  */
+#pragma acc loop gang reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += seq ();
+
+#pragma acc loop worker reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += seq ();
+
+#pragma acc loop vector reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += seq ();
   }
 
   return 0;
diff --git gcc/testsuite/c-c++-common/goacc/routine-4.c gcc/testsuite/c-c++-common/goacc/routine-4.c
index 004d713..3e5fc4f 100644
--- gcc/testsuite/c-c++-common/goacc/routine-4.c
+++ gcc/testsuite/c-c++-common/goacc/routine-4.c
@@ -1,3 +1,4 @@
+/* Test invalid intra-routine parallelism.  */
 
 void gang (void);
 void worker (void);
@@ -14,6 +15,24 @@ void seq (void)
   worker ();  /* { dg-error "routine call uses" } */
   vector ();  /* { dg-error "routine call uses" } */
   seq ();
+
+  int red;
+
+#pragma acc loop reduction (+:red) // { dg-warning "insufficient partitioning" }
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop gang reduction (+:red) // { dg-error "disallowed by containing routine" }
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop worker reduction (+:red) // { dg-error "disallowed by containing routine" }
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop vector reduction (+:red) // { dg-error "disallowed by containing routine" }
+  for (int i = 0; i < 10; i++)
+    red ++;
 }
 
 void vector (void) /* { dg-message "declared here" 1 } */
@@ -22,6 +41,24 @@ void vector (void) /* { dg-message "declared here" 1 } */
   worker ();  /* { dg-error "routine call uses" } */
   vector ();
   seq ();
+
+  int red;
+
+#pragma acc loop reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop gang reduction (+:red) // { dg-error "disallowed by containing routine" }
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop worker reduction (+:red) // { dg-error "disallowed by containing routine" }
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop vector reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
 }
 
 void worker (void) /* { dg-message "declared here" 2 } */
@@ -30,6 +67,24 @@ void worker (void) /* { dg-message "declared here" 2 } */
   worker ();
   vector ();
   seq ();
+
+  int red;
+
+#pragma acc loop reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop gang reduction (+:red) // { dg-error "disallowed by containing routine" }
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop worker reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop vector reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
 }
 
 void gang (void) /* { dg-message "declared here" 3 } */
@@ -38,4 +93,22 @@ void gang (void) /* { dg-message "declared here" 3 } */
   worker ();
   vector ();
   seq ();
+
+  int red;
+
+#pragma acc loop reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop gang reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop worker reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop vector reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
 }
diff --git gcc/testsuite/c-c++-common/goacc/routine-5.c gcc/testsuite/c-c++-common/goacc/routine-5.c
index c34838f..2a9db90 100644
--- gcc/testsuite/c-c++-common/goacc/routine-5.c
+++ gcc/testsuite/c-c++-common/goacc/routine-5.c
@@ -46,6 +46,21 @@ using namespace g;
   
 #pragma acc routine (c) /* { dg-error "does not refer to" } */
 
+
+void Bar ();
+
+void Foo ()
+{
+  Bar ();
+}
+
+#pragma acc routine (Bar) // { dg-error "must be applied before use" }
+
+#pragma acc routine (Foo) gang // { dg-error "must be applied before definition" }
+
+#pragma acc routine (Baz) // { dg-error "not been declared" }
+
+
 int vb1;		/* { dg-error "directive for use" } */
 extern int vb2;		/* { dg-error "directive for use" } */
 static int vb3;		/* { dg-error "directive for use" } */
diff --git gcc/testsuite/c-c++-common/goacc/tile.c gcc/testsuite/c-c++-common/goacc/tile.c
index 2a81427..8e70e71 100644
--- gcc/testsuite/c-c++-common/goacc/tile.c
+++ gcc/testsuite/c-c++-common/goacc/tile.c
@@ -1,5 +1,3 @@
-/* { dg-do compile } */
-
 int
 main ()
 {
@@ -71,3 +69,259 @@ main ()
 
   return 0;
 }
+
+
+void par (void)
+{
+  int i, j;
+
+#pragma acc parallel
+  {
+#pragma acc loop tile // { dg-error "expected" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile() // { dg-error "expected" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(1) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(2) 
+    for (i = 0; i < 10; i++)
+      {
+	for (j = 1; j < 10; j++)
+	  { }
+      }
+#pragma acc loop tile(-2) // { dg-warning "'tile' value must be positive" }
+    for (i = 1; i < 10; i++)
+      { }
+#pragma acc loop tile(i)
+    for (i = 1; i < 10; i++)
+      { }
+#pragma acc loop tile(2, 2, 1)
+    for (i = 1; i < 3; i++)
+      {
+	for (j = 4; j < 6; j++)
+	  { }
+      } 
+#pragma acc loop tile(2, 2)
+    for (i = 1; i < 5; i+=2)
+      {
+	for (j = i + 1; j < 7; j+=i)
+	  { }
+      }
+#pragma acc loop vector tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector gang tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector worker tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang worker tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+  }
+}
+void p3 (void)
+{
+  int i, j;
+
+  
+#pragma acc parallel loop tile // { dg-error "expected" }
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop tile() // { dg-error "expected" }
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop tile(1) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop tile(*, 1) 
+  for (i = 0; i < 10; i++)
+    {
+      for (j = 1; j < 10; j++)
+	{ }
+    }
+#pragma acc parallel loop tile(-2) // { dg-warning "'tile' value must be positive" }
+  for (i = 1; i < 10; i++)
+    { }
+#pragma acc parallel loop tile(i)
+  for (i = 1; i < 10; i++)
+    { }
+#pragma acc parallel loop tile(2, 2, 1)
+  for (i = 1; i < 3; i++)
+    {
+      for (j = 4; j < 6; j++)
+        { }
+    }    
+#pragma acc parallel loop tile(2, 2)
+  for (i = 1; i < 5; i+=2)
+    {
+      for (j = i + 1; j < 7; j++)
+        { }
+    }
+#pragma acc parallel loop vector tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop worker tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop gang tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop vector gang tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop vector worker tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop gang worker tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+
+}
+
+
+void
+kern (void)
+{
+  int i, j;
+
+#pragma acc kernels
+  {
+#pragma acc loop tile // { dg-error "expected" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile() // { dg-error "expected" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(1)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(2)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(6-2) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(6+2) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(*, 1) 
+    for (i = 0; i < 10; i++)
+      {
+	for (j = 0; j < 10; i++)
+	  { }
+      }
+#pragma acc loop tile(-2) // { dg-warning "'tile' value must be positive" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(i)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(2, 2, 1)
+    for (i = 2; i < 4; i++)
+      for (i = 4; i < 6; i++)
+	{ }
+#pragma acc loop tile(2, 2)
+    for (i = 1; i < 5; i+=2)
+      for (j = i+1; j < 7; i++)
+	{ }
+#pragma acc loop vector tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector gang tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector worker tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang worker tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+   }
+}
+
+
+void k3 (void)
+{
+  int i, j;
+
+#pragma acc kernels loop tile // { dg-error "expected" }
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop tile() // { dg-error "expected" }
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop tile(1) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop tile(*, 1) 
+  for (i = 0; i < 10; i++)
+    {
+      for (j = 1; j < 10; j++)
+	{ }
+    }
+#pragma acc kernels loop tile(-2) // { dg-warning "'tile' value must be positive" }
+  for (i = 1; i < 10; i++)
+    { }
+#pragma acc kernels loop tile(i)
+  for (i = 1; i < 10; i++)
+    { }
+#pragma acc kernels loop tile(2, 2, 1)
+  for (i = 1; i < 3; i++)
+    {
+      for (j = 4; j < 6; j++)
+	{ }
+    }    
+#pragma acc kernels loop tile(2, 2)
+  for (i = 1; i < 5; i++)
+    {
+      for (j = i + 1; j < 7; j += i)
+	{ }
+    }
+#pragma acc kernels loop vector tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop worker tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop gang tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop vector gang tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop vector worker tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop gang worker tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+}
diff --git gcc/testsuite/g++.dg/goacc/reference.C gcc/testsuite/g++.dg/goacc/reference.C
new file mode 100644
index 0000000..b000668
--- /dev/null
+++ gcc/testsuite/g++.dg/goacc/reference.C
@@ -0,0 +1,39 @@
+int
+test1 (int &ref)
+{
+#pragma acc kernels copy (ref)
+  {
+    ref = 10;
+  }
+}
+
+int
+test2 (int &ref)
+{
+  int b;
+#pragma acc kernels copyout (b)
+  {
+    b = ref + 10;
+  }
+
+#pragma acc parallel copyout (b)
+  {
+    b = ref + 10;
+  }
+
+  ref = b;
+}
+
+int
+main()
+{
+  int a = 0;
+  int &ref_a = a;
+
+  #pragma acc parallel copy (a, ref_a)
+  {
+    ref_a = 5;
+  }
+
+  return a;
+}
diff --git gcc/testsuite/g++.dg/goacc/routine-1.C gcc/testsuite/g++.dg/goacc/routine-1.C
new file mode 100644
index 0000000..a73a73d
--- /dev/null
+++ gcc/testsuite/g++.dg/goacc/routine-1.C
@@ -0,0 +1,13 @@
+/* Test valid use of the routine directive.  */
+
+namespace N
+{
+  extern void foo1();
+  extern void foo2();
+#pragma acc routine (foo1)
+#pragma acc routine
+  void foo3()
+  {
+  }
+}
+#pragma acc routine (N::foo2)
diff --git gcc/testsuite/g++.dg/goacc/routine-2.C gcc/testsuite/g++.dg/goacc/routine-2.C
new file mode 100644
index 0000000..2d16466
--- /dev/null
+++ gcc/testsuite/g++.dg/goacc/routine-2.C
@@ -0,0 +1,42 @@
+/* Test invalid use of the routine directive.  */
+
+template <typename T>
+extern T one_d();
+#pragma acc routine (one_d) /* { dg-error "names a set of overloads" } */
+
+template <typename T>
+T
+one()
+{
+  return 1;
+}
+#pragma acc routine (one) /* { dg-error "names a set of overloads" } */
+
+int incr (int);
+float incr (float);
+int inc;
+
+#pragma acc routine (incr) /* { dg-error "names a set of overloads" } */
+
+#pragma acc routine (increment) /* { dg-error "has not been declared" } */
+
+#pragma acc routine (inc) /* { dg-error "does not refer to a function" } */
+
+#pragma acc routine (+) /* { dg-error "expected unqualified-id before '.' token" } */
+
+int sum (int, int);
+
+namespace foo {
+#pragma acc routine (sum)
+  int sub (int, int);
+}
+
+#pragma acc routine (foo::sub)
+
+/* It's strange to apply a routine directive to subset of overloaded
+   functions, but that is permissible in OpenACC 2.x.  */
+
+int decr (int a);
+
+#pragma acc routine
+float decr (float a);
diff --git gcc/testsuite/g++.dg/goacc/template.C gcc/testsuite/g++.dg/goacc/template.C
index f7a717b..f139dc2 100644
--- gcc/testsuite/g++.dg/goacc/template.C
+++ gcc/testsuite/g++.dg/goacc/template.C
@@ -1,8 +1,3 @@
-// This error is temporary.  Remove when support is added for these clauses
-// in the middle end.  Also remove the comments from the reduction test
-// after the FE learns that reduction variables may appear in data clauses too.
-// { dg-prune-output "sorry, unimplemented" }
-
 #pragma acc routine
 template <typename T> T
 accDouble(int val)
@@ -20,55 +15,62 @@ oacc_parallel_copy (T a)
   double z = 4;
 
 #pragma acc parallel num_gangs (a) num_workers (a) vector_length (a) default (none) copyout (b) copyin (a)
-  {
+#pragma acc loop gang worker vector
+  for (int i = 0; i < 1; i++)
     b = a;
-  }
 
 #pragma acc parallel num_gangs (a) copy (w, x, y, z)
-  {
-    w = accDouble<char>(w);
-    x = accDouble<int>(x);
-    y = accDouble<float>(y);
-    z = accDouble<double>(z);
-  }
+#pragma acc loop
+  for (int i = 0; i < 1; i++)
+    {
+      w = accDouble<char>(w);
+      x = accDouble<int>(x);
+      y = accDouble<float>(y);
+      z = accDouble<double>(z);
+    }
 
 #pragma acc parallel num_gangs (a) if (1)
   {
+#pragma acc loop independent collapse (2) gang
+    for (int i = 0; i < a; i++)
+      for (int j = 0; j < 5; j++)
+	b = a;
+
 #pragma acc loop auto tile (a, 3)
-  for (int i = 0; i < a; i++)
-    for (int j = 0; j < 5; j++)
-      b = a;
+    for (int i = 0; i < a; i++)
+      for (int j = 0; j < 5; j++)
+	b = a;
 
 #pragma acc loop seq
-  for (int i = 0; i < a; i++)
-    b = a;
+    for (int i = 0; i < a; i++)
+      b = a;
   }
 
   T c;
 
 #pragma acc parallel num_workers (10)
-  {
+#pragma acc loop worker
+  for (int i = 0; i < 1; i++)
+    {
 #pragma acc atomic capture
-    c = b++;
+      c = b++;
 
 #pragma atomic update
-    c++;
+      c++;
 
 #pragma acc atomic read
-    b = a;
+      b = a;
 
 #pragma acc atomic write
-    b = a;
-  }
+      b = a;
+    }
 
-//#pragma acc parallel reduction (+:c)
-//  {
-//    c = 1;
-//  }
+#pragma acc parallel reduction (+:c)
+  c = 1;
 
 #pragma acc data if (1) copy (b)
   {
-    #pragma acc parallel
+#pragma acc parallel
     {
       b = a;
     }
@@ -76,9 +78,9 @@ oacc_parallel_copy (T a)
 
 #pragma acc enter data copyin (b)
 #pragma acc parallel present (b)
-    {
-      b = a;
-    }
+  {
+    b = a;
+  }
 
 #pragma acc update host (b)
 #pragma acc update self (b)
@@ -109,11 +111,9 @@ oacc_kernels_copy (T a)
 #pragma acc kernels copyout (b) copyin (a)
   b = a;
 
-//#pragma acc kernels loop reduction (+:c)
-//  for (int i = 0; i < 10; i++)
-//    {
-//      c = 1;
-//    }
+#pragma acc kernels loop reduction (+:c)
+  for (int i = 0; i < 10; i++)
+    c = 1;
 
 #pragma acc data if (1) copy (b)
   {
@@ -125,9 +125,10 @@ oacc_kernels_copy (T a)
 
 #pragma acc enter data copyin (b)
 #pragma acc kernels present (b)
-    {
-      b = a;
-    }
+  {
+    b = a;
+  }
+
   return b;
 }
 
diff --git gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
index 6977525..42a447a 100644
--- gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
+++ gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
@@ -1,17 +1,10 @@
 ! Exercise combined OpenACC directives.
 
-! { dg-do compile }
-! { dg-options "-fopenacc -fdump-tree-gimple" }
-
-! This error is temporary.  Remove when support is added for these clauses
-! in the middle end.
-! { dg-prune-output "sorry, unimplemented" }
-
-! Update the reduction tests.
+! { dg-additional-options "-fdump-tree-gimple" }
 
 subroutine test
   implicit none
-  integer a(100), i, j, z
+  integer a(100), i, j, y, z
 
   ! PARALLEL
   
@@ -73,10 +66,10 @@ subroutine test
   end do
   !$acc end parallel loop
 
-!  !$acc parallel loop reduction (+:z) copy (z)
-!  do i = 1, 100
-!  end do
-!  !$acc end parallel loop
+  !$acc parallel loop reduction (+:y) copy (y)
+  do i = 1, 100
+  end do
+  !$acc end parallel loop
 
   ! KERNELS
 
@@ -138,10 +131,10 @@ subroutine test
   end do
   !$acc end kernels loop
 
-!  !$acc kernels loop reduction (+:z) copy (z)
-!  do i = 1, 100
-!  end do
-!  !$acc end kernels loop
+  !$acc kernels loop reduction (+:y) copy (y)
+  do i = 1, 100
+  end do
+  !$acc end kernels loop
 end subroutine test
 
 ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. collapse.2." 2 "gimple" } }
@@ -153,3 +146,5 @@ end subroutine test
 ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. tile.2, 3" 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "acc loop private.i. independent" 2 "gimple" } }
 ! { dg-final { scan-tree-dump-times "private.z" 2 "gimple" } }
+! { dg-final { scan-tree-dump-times "omp target oacc_\[^ \]+ map.force_tofrom:y" 2 "gimple" } }
+! { dg-final { scan-tree-dump-times "acc loop private.i. reduction..:y." 2 "gimple" } }
diff --git gcc/testsuite/gfortran.dg/goacc/loop-1.f95 gcc/testsuite/gfortran.dg/goacc/loop-1.f95
index 817039f..b5f9e03 100644
--- gcc/testsuite/gfortran.dg/goacc/loop-1.f95
+++ gcc/testsuite/gfortran.dg/goacc/loop-1.f95
@@ -1,5 +1,3 @@
-! { dg-do compile } 
-! { dg-additional-options "-fmax-errors=100" } 
 module test
   implicit none
 contains
@@ -29,14 +27,18 @@ subroutine test1
        i = i + 1
   end do
   !$acc loop
-  do 300 d = 1, 30, 6 ! { dg-error "integer" }
+  do 300 d = 1, 30, 6
       i = d
   300 a(i) = 1
+  ! { dg-warning "Deleted feature: Loop variable at .1. must be integer" "" { target *-*-* } 30 }
+  ! { dg-error "ACC LOOP iteration variable must be of type integer" "" { target *-*-* } 30 }
   !$acc loop
-  do d = 1, 30, 5 ! { dg-error "integer" }
+  do d = 1, 30, 5
        i = d
       a(i) = 2
   end do
+  ! { dg-warning "Deleted feature: Loop variable at .1. must be integer" "" { target *-*-* } 36 }
+  ! { dg-error "ACC LOOP iteration variable must be of type integer" "" { target *-*-* } 36 }
   !$acc loop
   do i = 1, 30
       if (i .eq. 16) exit ! { dg-error "EXIT statement" }
@@ -144,8 +146,10 @@ subroutine test1
     end do
     !$acc parallel loop collapse(2)
     do i = 1, 3
-        do r = 4, 6    ! { dg-error "integer" }
+        do r = 4, 6
         end do
+        ! { dg-warning "Deleted feature: Loop variable at .1. must be integer" "" { target *-*-* } 149 }
+        ! { dg-error "ACC LOOP iteration variable must be of type integer" "" { target *-*-* } 149 }
     end do
 
     ! Both seq and independent are not allowed
@@ -167,4 +171,3 @@ subroutine test1
 
 end subroutine test1
 end module test
-! { dg-prune-output "Deleted" }
diff --git gcc/testsuite/gfortran.dg/goacc/loop-5.f95 gcc/testsuite/gfortran.dg/goacc/loop-5.f95
index 5cbd975..d059cf7 100644
--- gcc/testsuite/gfortran.dg/goacc/loop-5.f95
+++ gcc/testsuite/gfortran.dg/goacc/loop-5.f95
@@ -1,9 +1,3 @@
-! { dg-do compile }
-! { dg-additional-options "-fmax-errors=100" }
-
-! { dg-prune-output "sorry, unimplemented" }
-! { dg-prune-output "Error: work-sharing region" }
-
 program test
   implicit none
   integer :: i, j
diff --git gcc/testsuite/gfortran.dg/goacc/loop-6.f95 gcc/testsuite/gfortran.dg/goacc/loop-6.f95
index e844468..d0855b4 100644
--- gcc/testsuite/gfortran.dg/goacc/loop-6.f95
+++ gcc/testsuite/gfortran.dg/goacc/loop-6.f95
@@ -1,11 +1,3 @@
-! { dg-do compile }
-! { dg-additional-options "-fmax-errors=100" }
-
-! This error is temporary.  Remove when support is added for these clauses
-! in the middle end.
-! { dg-prune-output "sorry, unimplemented" }
-! { dg-prune-output "Error: work-sharing region" }
-
 program test
   implicit none
   integer :: i, j
diff --git gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90 gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90
index 6cfd715..81bdc23 100644
--- gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90
+++ gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90
@@ -1,13 +1,7 @@
-! { dg-do compile } 
 ! { dg-additional-options "-fdump-tree-original -std=f2008" } 
 
 ! test for tree-dump-original and spaces-commas
 
-! This error is temporary.  Remove when support is added for these clauses
-! in the middle end.
-! { dg-prune-output "sorry, unimplemented" }
-! { dg-prune-output "Error: work-sharing region" }
-
 program test
   implicit none
   integer :: i, j, k, m, sum
diff --git libgomp/ChangeLog libgomp/ChangeLog
index f4f30fb..a1763b6 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,71 @@
+2016-03-30  Thomas Schwinge  <thomas@codesourcery.com>
+	    James Norris  <jnorris@codesourcery.com>
+	    Nathan Sidwell  <nathan@codesourcery.com>
+	    Julian Brown  <julian@codesourcery.com>
+	    Cesar Philippidis  <cesar@codesourcery.com>
+	    Chung-Lin Tang  <cltang@codesourcery.com>
+	    Tom de Vries  <tom@codesourcery.com>
+
+	* testsuite/libgomp.oacc-c-c++-common/clauses-1.c: Update.
+	* testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/vector-loop.c: Likewise.
+	* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/declare-1.f90: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Likewise.
+	XFAIL.
+	* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: Update.
+	Incorporate...
+	* testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c: ... this
+	file.
+	* testsuite/libgomp.oacc-c++/template-reduction.C: New file.
+	* testsuite/libgomp.oacc-c-c++-common/gang-static-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/gang-static-2.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-clauses.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/private-variables.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/routine-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/routine-4.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c: Likewise.
+	* testsuite/libgomp.oacc-fortran/clauses-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/default-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/firstprivate-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/gang-static-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/implicit-firstprivate-ref.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/pr68813.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/private-variables.f90: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Merge this
+	file...
+	* testsuite/libgomp.oacc-c-c++-common/parallel-1.c: ..., and this
+	file into...
+	* testsuite/libgomp.oacc-c-c++-common/data-clauses.h: ... this new
+	file.  Update.
+	* testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c: New
+	file.
+	* testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-2.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c:
+	... this new file.  Update.
+	* testsuite/libgomp.oacc-c-c++-common/parallel-2.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c:
+	... this new file.  Update.
+	* testsuite/libgomp.oacc-c-c++-common/mode-transitions.c: New
+	file.  Incorporate...
+	* testsuite/libgomp.oacc-c-c++-common/worker-single-1a.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-single-4.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-single-6.c: ... this
+	file.
+	* testsuite/libgomp.oacc-c-c++-common/update-1-2.c: Remove file.
+
 2016-03-29  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* testsuite/libgomp.oacc-c++/c++.exp [!lang_test_file_found]: Call
diff --git libgomp/testsuite/libgomp.oacc-c++/template-reduction.C libgomp/testsuite/libgomp.oacc-c++/template-reduction.C
new file mode 100644
index 0000000..fb5924c
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c++/template-reduction.C
@@ -0,0 +1,98 @@
+const int n = 100;
+
+// Check explicit template copy map
+
+template<typename T> T
+sum (T array[])
+{
+   T s = 0;
+
+#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy (s, array[0:n])
+  for (int i = 0; i < n; i++)
+    s += array[i];
+
+  return s;
+}
+
+// Check implicit template copy map
+
+template<typename T> T
+sum ()
+{
+  T s = 0;
+  T array[n];
+
+  for (int i = 0; i < n; i++)
+    array[i] = i+1;
+
+#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy (s)
+  for (int i = 0; i < n; i++)
+    s += array[i];
+
+  return s;
+}
+
+// Check present and async
+
+template<typename T> T
+async_sum (T array[])
+{
+   T s = 0;
+
+#pragma acc parallel loop num_gangs (10) gang async (1) present (array[0:n])
+   for (int i = 0; i < n; i++)
+     array[i] = i+1;
+
+#pragma acc parallel loop num_gangs (10) gang reduction (+:s) present (array[0:n]) copy (s) async wait (1)
+  for (int i = 0; i < n; i++)
+    s += array[i];
+
+#pragma acc wait
+
+  return s;
+}
+
+// Check present and async and an explicit firstprivate
+
+template<typename T> T
+async_sum (int c)
+{
+   T s = 0;
+
+#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy(s) firstprivate (c) async wait (1)
+  for (int i = 0; i < n; i++)
+    s += i+c;
+
+#pragma acc wait
+
+  return s;
+}
+
+int
+main()
+{
+  int a[n];
+  int result = 0;
+
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = i+1;
+      result += i+1;
+    }
+
+  if (sum (a) != result)
+    __builtin_abort ();
+
+  if (sum<int> () != result)
+    __builtin_abort ();
+
+#pragma acc enter data copyin (a)
+  if (async_sum (a) != result)
+    __builtin_abort ();
+
+  if (async_sum<int> (1) != result)
+    __builtin_abort ();
+#pragma acc exit data delete (a)
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
index 22cef6d..f3b490a 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
@@ -1,4 +1,6 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* <http://news.gmane.org/find-root.php?message_id=%3C87pp0aaksc.fsf%40kepler.schwinge.homeip.net%3E>.
+   { dg-xfail-run-if "TODO" { *-*-* } } */
 /* { dg-additional-options "-lcuda" } */
 
 #include <openacc.h>
@@ -460,6 +462,438 @@ main (int argc, char **argv)
             abort ();
     }
 
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc data copy (a[0:N]) copy (b[0:N]) copyin (N)
+    {
+
+#pragma acc kernels async
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+#pragma acc wait
+
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 3.0)
+            abort ();
+
+        if (b[i] != 3.0)
+            abort ();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 2.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc data copy (a[0:N]) copy (b[0:N]) copyin (N)
+    {
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+#pragma acc wait (1)
+
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 2.0)
+            abort ();
+
+        if (b[i] != 2.0)
+            abort ();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 0.0;
+        c[i] = 0.0;
+        d[i] = 0.0;
+    }
+
+#pragma acc data copy (a[0:N]) copy (b[0:N]) copy (c[0:N]) copy (d[0:N]) copyin (N)
+    {
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii];
+    }
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii];
+    }
+
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii];
+    }
+
+#pragma acc wait (1)
+
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 3.0)
+            abort ();
+
+        if (b[i] != 9.0)
+            abort ();
+
+        if (c[i] != 4.0)
+            abort ();
+
+        if (d[i] != 1.0)
+            abort ();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 2.0;
+        b[i] = 0.0;
+        c[i] = 0.0;
+        d[i] = 0.0;
+        e[i] = 0.0;
+    }
+
+#pragma acc data copy (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N]) copyin (N)
+    {
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii];
+    }
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii];
+    }
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii];
+    }
+
+#pragma acc kernels wait (1) async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            e[ii] = a[ii] + b[ii] + c[ii] + d[ii];
+    }
+
+#pragma acc wait (1)
+
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 2.0)
+            abort ();
+
+        if (b[i] != 4.0)
+            abort ();
+
+        if (c[i] != 4.0)
+            abort ();
+
+        if (d[i] != 1.0)
+            abort ();
+
+        if (e[i] != 11.0)
+            abort ();
+    }
+
+
+    r = cuStreamCreate (&stream1, CU_STREAM_NON_BLOCKING);
+    if (r != CUDA_SUCCESS)
+    {
+        fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+        abort ();
+    }
+
+    acc_set_cuda_stream (1, stream1);
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 5.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc data copy (a[0:N], b[0:N]) copyin (N)
+    {
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+#pragma acc wait (1)
+
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 5.0)
+            abort ();
+
+        if (b[i] != 5.0)
+            abort ();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 7.0;
+        b[i] = 0.0;
+        c[i] = 0.0;
+        d[i] = 0.0;
+    }
+
+#pragma acc data copy (a[0:N]) copy (b[0:N]) copy (c[0:N]) copy (d[0:N]) copyin (N)
+    {
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii];
+    }
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii];
+    }
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii];
+    }
+
+#pragma acc wait (1)
+
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 7.0)
+            abort ();
+
+        if (b[i] != 49.0)
+            abort ();
+
+        if (c[i] != 4.0)
+            abort ();
+
+        if (d[i] != 1.0)
+            abort ();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 0.0;
+        c[i] = 0.0;
+        d[i] = 0.0;
+        e[i] = 0.0;
+    }
+
+#pragma acc data copy (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N]) copyin (N)
+    {
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii];
+    }
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii];
+    }
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii];
+    }
+
+#pragma acc kernels wait (1) async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            e[ii] = a[ii] + b[ii] + c[ii] + d[ii];
+    }
+
+#pragma acc wait (1)
+
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 3.0)
+            abort ();
+
+        if (b[i] != 9.0)
+            abort ();
+
+        if (c[i] != 4.0)
+            abort ();
+
+        if (d[i] != 1.0)
+            abort ();
+
+        if (e[i] != 17.0)
+            abort ();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 4.0;
+        b[i] = 0.0;
+        c[i] = 0.0;
+        d[i] = 0.0;
+        e[i] = 0.0;
+    }
+
+#pragma acc data copyin (a[0:N], b[0:N], c[0:N]) copyin (N)
+    {
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii];
+    }
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii];
+    }
+
+#pragma acc update host (a[0:N], b[0:N], c[0:N]) wait (1)
+
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 4.0)
+            abort ();
+
+        if (b[i] != 16.0)
+            abort ();
+
+        if (c[i] != 4.0)
+            abort ();
+    }
+
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 5.0;
+        b[i] = 0.0;
+        c[i] = 0.0;
+        d[i] = 0.0;
+        e[i] = 0.0;
+    }
+
+#pragma acc data copyin (a[0:N], b[0:N], c[0:N]) copyin (N)
+    {
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii];
+    }
+
+#pragma acc kernels async (1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii];
+    }
+
+#pragma acc update host (a[0:N], b[0:N], c[0:N]) async (1)
+
+#pragma acc wait (1)
+
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (a[i] != 5.0)
+            abort ();
+
+        if (b[i] != 25.0)
+            abort ();
+
+        if (c[i] != 4.0)
+            abort ();
+    }
+
     acc_shutdown (acc_device_nvidia);
 
     return 0;
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-1.c
index 51c0cf5..410c46c 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-1.c
@@ -586,6 +586,32 @@ main (int argc, char **argv)
 
     for (i = 0; i < N; i++)
     {
+        a[i] = 6.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc parallel pcopy (a[0:N], b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 6.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
         a[i] = 5.0;
         b[i] = 7.0;
     }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
similarity index 75%
rename from libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-2.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
index d9fff6f..2cd98bd 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
@@ -1,4 +1,4 @@
 /* { dg-do run { target lto } } */
 /* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
 
-#include "parallel-1.c"
+#include "data-clauses-kernels.c"
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
new file mode 100644
index 0000000..f7f2d1c
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
@@ -0,0 +1,2 @@
+#define CONSTRUCT kernels
+#include "data-clauses.h"
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
similarity index 75%
rename from libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-2.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
index f76c926..ddcf4e3 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
@@ -1,4 +1,4 @@
 /* { dg-do run { target lto } } */
 /* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
 
-#include "kernels-1.c"
+#include "data-clauses-parallel.c"
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
new file mode 100644
index 0000000..e734b2f
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
@@ -0,0 +1,2 @@
+#define CONSTRUCT parallel
+#include "data-clauses.h"
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
similarity index 56%
rename from libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
index fd9df33..d557bef 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
@@ -1,7 +1,3 @@
-/* { dg-do run } */
-
-#include <stdlib.h>
-
 int i;
 
 int main(void)
@@ -11,145 +7,145 @@ int main(void)
   i = -1;
   j = -2;
   v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) copyin (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) copyin (i, j)
   {
     if (i != -1 || j != -2)
-      abort ();
+      __builtin_abort ();
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
 #if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 #else
   if (v != 1 || i != -1 || j != -2)
-    abort ();
+    __builtin_abort ();
 #endif
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) copyout (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) copyout (i, j)
   {
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) copy (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) copy (i, j)
   {
     if (i != -1 || j != -2)
-      abort ();
+      __builtin_abort ();
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) create (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) create (i, j)
   {
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
 #if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 #else
   if (v != 1 || i != -1 || j != -2)
-    abort ();
+    __builtin_abort ();
 #endif
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copyin (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) present_or_copyin (i, j)
   {
     if (i != -1 || j != -2)
-      abort ();
+      __builtin_abort ();
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
   if (v != 1)
-    abort ();
+    __builtin_abort ();
 #if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 #else
   if (v != 1 || i != -1 || j != -2)
-    abort ();
+    __builtin_abort ();
 #endif
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copyout (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) present_or_copyout (i, j)
   {
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copy (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) present_or_copy (i, j)
   {
     if (i != -1 || j != -2)
-      abort ();
+      __builtin_abort ();
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_create (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) present_or_create (i, j)
   {
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
   if (v != 1)
-    abort ();
+    __builtin_abort ();
 #if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 #else
   if (v != 1 || i != -1 || j != -2)
-    abort ();
+    __builtin_abort ();
 #endif
 
   i = -1;
@@ -158,23 +154,23 @@ int main(void)
 
 #pragma acc data copyin (i, j)
   {
-#pragma acc parallel /* copyout */ present_or_copyout (v) present (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) present (i, j)
     {
       if (i != -1 || j != -2)
-        abort ();
+	__builtin_abort ();
       i = 2;
       j = 1;
       if (i != 2 || j != 1)
-        abort ();
+	__builtin_abort ();
       v = 1;
     }
   }
 #if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 #else
   if (v != 1 || i != -1 || j != -2)
-    abort ();
+    __builtin_abort ();
 #endif
 
   i = -1;
@@ -183,23 +179,23 @@ int main(void)
 
 #pragma acc data copyin(i, j)
   {
-#pragma acc parallel /* copyout */ present_or_copyout (v)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v)
     {
       if (i != -1 || j != -2)
-        abort ();
+	__builtin_abort ();
       i = 2;
       j = 1;
       if (i != 2 || j != 1)
-        abort ();
+	__builtin_abort ();
       v = 1;
     }
   }
 #if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 #else
   if (v != 1 || i != -1 || j != -2)
-    abort ();
+    __builtin_abort ();
 #endif
 
   return 0;
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
index e271a37..8247e7b 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c
@@ -1,5 +1,3 @@
-/* { dg-do run } */
-
 #include <stdlib.h>
 
 int main (void)
@@ -28,5 +26,26 @@ int main (void)
     abort ();
 #endif
 
+  a_1 = a_2 = 0;
+
+#pragma acc data deviceptr (a)
+#pragma acc parallel copyout (a_1, a_2)
+  {
+    a_1 = a;
+    a_2 = &a;
+  }
+
+  if (a != A)
+    abort ();
+  if (a_1 != a)
+    abort ();
+#if ACC_MEM_SHARED
+  if (a_2 != &a)
+    abort ();
+#else
+  if (a_2 == &a)
+    abort ();
+#endif
+
   return 0;
 }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c
index 7f5d3d3..689a443 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c
@@ -1,8 +1,7 @@
-/* { dg-do run } */
-
 #include  <openacc.h>
 
-int main ()
+
+void t1 ()
 {
   int ok = 1;
   int val = 2;
@@ -28,14 +27,115 @@ int main ()
   if (ondev)
     {
       if (!ok)
-	return 1;
+	__builtin_abort ();
       if (val != 2)
-	return 1;
+	__builtin_abort ();
 
       for (int i = 0; i < 32; i++)
 	if (ary[i] != 2 + i)
-	  return 1;
+	  __builtin_abort ();
     }
-  
+}
+
+
+void t2 ()
+{
+  int ok = 1;
+  int val = 2;
+
+#pragma acc data copy(val)
+  {
+#pragma acc parallel present (val)
+    {
+      val = 7;
+    }
+
+#pragma acc parallel firstprivate (val) copy(ok)
+    {
+      ok  = val == 7;
+      val = 9;
+    }
+  }
+
+  if (!ok)
+    __builtin_abort ();
+  if (val != 7)
+    __builtin_abort ();
+}
+
+
+#define N 100
+void t3 ()
+{
+  int a, b[N], c, d, i;
+  int n = acc_get_device_type () == acc_device_nvidia ? N : 1;
+
+  a = 5;
+  for (i = 0; i < n; i++)
+    b[i] = -1;
+
+  #pragma acc parallel num_gangs (n) firstprivate (a)
+  #pragma acc loop gang
+  for (i = 0; i < n; i++)
+    {
+      a = a + i;
+      b[i] = a;
+    }
+
+  for (i = 0; i < n; i++)
+    if (a + i != b[i])
+      __builtin_abort ();
+
+  #pragma acc data copy (a)
+  {
+    #pragma acc parallel firstprivate (a) copyout (c)
+    {
+      a = 10;
+      c = a;
+    }
+
+    /* This version of 'a' should still be 5.  */
+    #pragma acc parallel copyout (d) present (a)
+    {
+      d = a;
+    }
+  }
+
+  if (c != 10)
+    __builtin_abort ();
+  if (d != 5)
+    __builtin_abort ();
+}
+#undef N
+
+
+void t4 ()
+{
+  int x = 5, i, arr[32];
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 3;
+
+#pragma acc parallel firstprivate(x) copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+#pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      arr[i] += x;
+  }
+
+  for (i = 0; i < 32; i++)
+    if (arr[i] != 8)
+      __builtin_abort ();
+}
+
+
+int
+main()
+{
+  t1 ();
+  t2 ();
+  t3 ();
+  t4 ();
+
   return 0;
 }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c
deleted file mode 100644
index 9666542..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c
+++ /dev/null
@@ -1,31 +0,0 @@
-/* { dg-do run } */
-
-#include  <openacc.h>
-
-int main ()
-{
-  int ok = 1;
-  int val = 2;
-
-#pragma acc data copy(val)
-  {
-#pragma acc parallel present (val)
-    {
-      val = 7;
-    }
-
-#pragma acc parallel firstprivate (val) copy(ok)
-    {
-      ok  = val == 7;
-      val = 9;
-    }
-
-  }
-
-  if (!ok)
-    return 1;
-  if(val != 7)
-    return 1;
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-1.c
new file mode 100644
index 0000000..d8ab958
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-1.c
@@ -0,0 +1,48 @@
+#include <assert.h>
+
+#define N 100
+
+void
+test (int *a, int *b, int sarg)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    assert (a[i] == b[i] + sarg);
+}
+
+int
+main ()
+{
+  int a[N], b[N];
+  int i;
+
+  for (i = 0; i < N; i++)
+    b[i] = i+1;
+
+#pragma acc parallel loop gang (static:*) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = b[i] + 0;
+
+  test (a, b, 0);
+
+#pragma acc parallel loop gang (static:1) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = b[i] + 1;
+
+  test (a, b, 1);
+
+#pragma acc parallel loop gang (static:5) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = b[i] + 5;
+
+  test (a, b, 5);
+
+#pragma acc parallel loop gang (static:20) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = b[i] + 20;
+
+  test (a, b, 20);
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
new file mode 100644
index 0000000..ce9632c
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/gang-static-2.c
@@ -0,0 +1,100 @@
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* This code uses nvptx inline assembly guarded with acc_on_device, which is
+   not optimized away at -O0, and then confuses the target assembler.
+   { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
+
+#include <assert.h>
+#include <openacc.h>
+
+#define N 100
+
+#define GANG_ID(I)						\
+  (acc_on_device (acc_device_nvidia)				\
+   ? ({unsigned __r;						\
+       __asm__ volatile ("mov.u32 %0,%%ctaid.x;" : "=r" (__r));	\
+       __r; }) : (I))
+
+int
+test_static(int *a, int num_gangs, int sarg)
+{
+  int i, j;
+
+  if (sarg == 0)
+    sarg = 1;
+
+  for (i = 0; i < N / sarg; i++)
+    for (j = 0; j < sarg; j++)
+      assert (a[i*sarg+j] == i % num_gangs);
+}
+
+int
+test_nonstatic(int *a, int gangs)
+{
+  int i, j;
+
+  for (i = 0; i < N; i+=gangs)
+    for (j = 0; j < gangs; j++)
+      assert (a[i+j] == i/gangs);
+}
+
+int
+main ()
+{
+  int a[N];
+  int i, x;
+
+#pragma acc parallel loop gang (static:*) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = GANG_ID (i);
+
+  test_nonstatic (a, 10);
+
+#pragma acc parallel loop gang (static:1) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = GANG_ID (i);
+
+  test_static (a, 10, 1);
+
+#pragma acc parallel loop gang (static:2) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = GANG_ID (i);
+
+  test_static (a, 10, 2);
+
+#pragma acc parallel loop gang (static:5) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = GANG_ID (i);
+
+  test_static (a, 10, 5);
+
+#pragma acc parallel loop gang (static:20) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = GANG_ID (i);
+
+  test_static (a, 10, 20);
+
+  /* Non-static gang.  */
+#pragma acc parallel loop gang num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = GANG_ID (i);
+
+  test_nonstatic (a, 10);
+
+  /* Static arguments with a variable expression.  */
+
+  x = 20;
+#pragma acc parallel loop gang (static:0+x) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = GANG_ID (i);
+
+  test_static (a, 10, 20);
+
+  x = 20;
+#pragma acc parallel loop gang (static:x) num_gangs (10)
+  for (i = 0; i < 100; i++)
+    a[i] = GANG_ID (i);
+
+  test_static (a, 10, 20);
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
index 6aa3bb7..5398905 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
@@ -1,5 +1,3 @@
-/* { dg-do run } */
-
 #include <openacc.h>
 #include <stdlib.h>
 #include <stdbool.h>
@@ -608,5 +606,357 @@ main(int argc, char **argv)
 	abort ();
 #endif
 
+    for (i = 0; i < N; i++)
+        a[i] = 4.0;
+
+#pragma acc kernels copyin(a[0:N]) copyout(b[0:N]) if(1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 5.0;
+#else
+    exp = 4.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 16.0;
+
+#pragma acc kernels if(0)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 17.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 8.0;
+
+#pragma acc kernels copyin(a[0:N]) copyout(b[0:N]) if(one)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 9.0;
+#else
+    exp = 8.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 22.0;
+
+#pragma acc kernels if(zero)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 23.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 16.0;
+
+#pragma acc kernels copyin(a[0:N]) copyout(b[0:N]) if(true)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 17.0;
+#else
+    exp = 16.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 76.0;
+
+#pragma acc kernels if(false)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 77.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 22.0;
+
+    n = 1;
+
+#pragma acc kernels copyin(a[0:N]) copyout(b[0:N]) if(n)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 23.0;
+#else
+    exp = 22.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 18.0;
+
+    n = 0;
+
+#pragma acc kernels if(n)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 19.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 49.0;
+
+    n = 1;
+
+#pragma acc kernels copyin(a[0:N]) copyout(b[0:N]) if(n + n)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 50.0;
+#else
+    exp = 49.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 38.0;
+
+    n = 0;
+
+#pragma acc kernels if(n + n)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 39.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 91.0;
+
+#pragma acc kernels copyin(a[0:N]) copyout(b[0:N]) if(-2)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 92.0;
+#else
+    exp = 91.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 43.0;
+
+#pragma acc kernels copyin(a[0:N]) copyout(b[0:N]) if(one == 1)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+#if ACC_MEM_SHARED
+    exp = 44.0;
+#else
+    exp = 43.0;
+#endif
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != exp)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+        a[i] = 87.0;
+
+#pragma acc kernels if(one == 0)
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+        {
+            if (acc_on_device (acc_device_host))
+                b[ii] = a[ii] + 1;
+            else
+                b[ii] = a[ii];
+        }
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 88.0)
+            abort();
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        a[i] = 3.0;
+        b[i] = 9.0;
+    }
+
+#if ACC_MEM_SHARED
+    exp = 0.0;
+    exp2 = 0.0;
+#else
+    acc_map_data (a, d_a, N * sizeof (float));
+    acc_map_data (b, d_b, N * sizeof (float));
+    exp = 3.0;
+    exp2 = 9.0;
+#endif
+
     return 0;
 }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
deleted file mode 100644
index 3acfdf5..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
+++ /dev/null
@@ -1,184 +0,0 @@
-/* { dg-do run } */
-
-#include <stdlib.h>
-
-int i;
-
-int main (void)
-{
-  int j, v;
-
-#if 0
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) copyin (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) copyout (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) copy (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) create (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyin (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1)
-    abort ();
-#if ACC_MEM_SHARED
-  if (i != 2 || j != 1)
-    abort ();
-#else
-  if (i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyout (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copy (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_create (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1)
-    abort ();
-#if ACC_MEM_SHARED
-  if (i != 2 || j != 1)
-    abort ();
-#else
-  if (i != -1 || j != -2)
-    abort ();
-#endif
-
-#if 0
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#endif
-
-#if 0
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#endif
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-clauses.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-clauses.c
new file mode 100644
index 0000000..2c42497
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-clauses.c
@@ -0,0 +1,62 @@
+/* Exercise the auto, independent, seq and tile loop clauses inside
+   kernels regions.  */
+
+#include <assert.h>
+
+#define N 100
+
+void
+check (int *a, int *b)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    assert (a[i] == b[i]);
+}
+
+int
+main ()
+{
+  int i, a[N], b[N];
+
+#pragma acc kernels copy(a)
+  {
+#pragma acc loop auto
+    for (i = 0; i < N; i++)
+      a[i] = i;
+  }
+
+  for (i = 0; i < N; i++)
+    b[i] = i;
+
+  check (a, b);
+
+#pragma acc kernels copyout(a)
+  {
+#pragma acc loop independent
+    for (i = 0; i < N; i++)
+      a[i] = i;
+  }
+
+  check (a, b);
+
+#pragma acc kernels present_or_copy(a)
+  {
+#pragma acc loop seq
+    for (i = 0; i < N; i++)
+      a[i] = i;
+  }
+
+  check (a, b);
+
+#pragma acc kernels pcopyout(a) present_or_copyin(b)
+  {
+#pragma acc loop seq
+    for (i = 0; i < N; i++)
+      a[i] = b[i];
+  }
+
+  check (a, b);
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/mode-transitions.c libgomp/testsuite/libgomp.oacc-c-c++-common/mode-transitions.c
new file mode 100644
index 0000000..2394ac8
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/mode-transitions.c
@@ -0,0 +1,895 @@
+/* Miscellaneous test cases for gang/worker/vector mode transitions.  */
+
+#include <assert.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <math.h>
+#include <openacc.h>
+
+
+/* Test basic vector-partitioned mode transitions.  */
+
+void t1()
+{
+  int n = 0, arr[32], i;
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 0;
+
+  #pragma acc parallel copy(n, arr) \
+		       num_gangs(1) num_workers(1) vector_length(32)
+  {
+    int j;
+    n++;
+    #pragma acc loop vector
+    for (j = 0; j < 32; j++)
+      arr[j]++;
+    n++;
+  }
+
+  assert (n == 2);
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == 1);
+}
+
+
+/* Test vector-partitioned, gang-partitioned mode.  */
+
+void t2()
+{
+  int n[32], arr[1024], i;
+  
+  for (i = 0; i < 1024; i++)
+    arr[i] = 0;
+
+  for (i = 0; i < 32; i++)
+    n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) \
+		       num_gangs(32) num_workers(1) vector_length(32)
+  {
+    int j, k;
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      #pragma acc loop vector
+      for (k = 0; k < 32; k++)
+	arr[j * 32 + k]++;
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (n[i] == 2);
+
+  for (i = 0; i < 1024; i++)
+    assert (arr[i] == 1);
+}
+
+
+/* Test conditions inside vector-partitioned loops.  */
+
+void t4()
+{
+  int n[32], arr[1024], i;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  for (i = 0; i < 32; i++)
+    n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) \
+		       num_gangs(32) num_workers(1) vector_length(32)
+  {
+    int j, k;
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop vector
+	for (k = 0; k < 32; k++)
+	  if ((arr[j * 32 + k] % 2) != 0)
+	    arr[j * 32 + k] *= 2;
+      }
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (n[i] == 2);
+
+  for (i = 0; i < 1024; i++)
+    assert (arr[i] == ((i % 2) == 0 ? i : i * 2));
+}
+
+
+/* Test conditions inside gang-partitioned/vector-partitioned loops.  */
+
+void t5()
+{
+  int n[32], arr[1024], i;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  for (i = 0; i < 32; i++)
+    n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) \
+		       num_gangs(32) num_workers(1) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+
+    #pragma acc loop gang vector
+    for (j = 0; j < 1024; j++)
+      if ((arr[j] % 2) != 0)
+	arr[j] *= 2;
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (n[i] == 2);
+
+  for (i = 0; i < 1024; i++)
+    assert (arr[i] == ((i % 2) == 0 ? i : i * 2));
+}
+
+
+/* Test trivial operation of vector-single mode.  */
+
+void t7()
+{
+  int n = 0;
+  #pragma acc parallel copy(n) \
+		       num_gangs(1) num_workers(1) vector_length(32)
+  {
+    n++;
+  }
+  assert (n == 1);
+}
+
+
+/* Test vector-single, gang-partitioned mode.  */
+
+void t8()
+{
+  int arr[1024];
+  int gangs;
+
+  for (gangs = 1; gangs <= 1024; gangs <<= 1)
+    {
+      int i;
+
+      for (i = 0; i < 1024; i++)
+	arr[i] = 0;
+
+      #pragma acc parallel copy(arr) \
+			   num_gangs(gangs) num_workers(1) vector_length(32)
+      {
+	int j;
+	#pragma acc loop gang
+	for (j = 0; j < 1024; j++)
+	  arr[j]++;
+      }
+
+      for (i = 0; i < 1024; i++)
+	assert (arr[i] == 1);
+    }
+}
+
+
+/* Test conditions in vector-single mode.  */
+
+void t9()
+{
+  int arr[1024];
+  int gangs;
+
+  for (gangs = 1; gangs <= 1024; gangs <<= 1)
+    {
+      int i;
+
+      for (i = 0; i < 1024; i++)
+	arr[i] = 0;
+
+      #pragma acc parallel copy(arr) \
+			   num_gangs(gangs) num_workers(1) vector_length(32)
+      {
+	int j;
+	#pragma acc loop gang
+	for (j = 0; j < 1024; j++)
+	  if ((j % 3) == 0)
+	    arr[j]++;
+	  else
+	    arr[j] += 2;
+      }
+
+      for (i = 0; i < 1024; i++)
+	assert (arr[i] == ((i % 3) == 0) ? 1 : 2);
+    }
+}
+
+
+/* Test switch in vector-single mode.  */
+
+void t10()
+{
+  int arr[1024];
+  int gangs;
+
+  for (gangs = 1; gangs <= 1024; gangs <<= 1)
+    {
+      int i;
+
+      for (i = 0; i < 1024; i++)
+	arr[i] = 0;
+
+      #pragma acc parallel copy(arr) \
+			   num_gangs(gangs) num_workers(1) vector_length(32)
+      {
+	int j;
+	#pragma acc loop gang
+	for (j = 0; j < 1024; j++)
+	  switch (j % 5)
+	    {
+	    case 0: arr[j] += 1; break;
+	    case 1: arr[j] += 2; break;
+	    case 2: arr[j] += 3; break;
+	    case 3: arr[j] += 4; break;
+	    case 4: arr[j] += 5; break;
+	    default: arr[j] += 99;
+	    }
+      }
+
+      for (i = 0; i < 1024; i++)
+	assert (arr[i] == (i % 5) + 1);
+    }
+}
+
+
+/* Test switch in vector-single mode, initialise array on device.  */
+
+void t11()
+{
+  int arr[1024];
+  int i;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = 99;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(1024) num_workers(1) vector_length(32)
+  {
+    int j;
+
+    /* This loop and the one following must be distributed to available gangs
+       in the same way to ensure data dependencies are not violated (hence the
+       "static" clauses).  */
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 1024; j++)
+      arr[j] = 0;
+    
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 1024; j++)
+      switch (j % 5)
+	{
+	case 0: arr[j] += 1; break;
+	case 1: arr[j] += 2; break;
+	case 2: arr[j] += 3; break;
+	case 3: arr[j] += 4; break;
+	case 4: arr[j] += 5; break;
+	default: arr[j] += 99;
+	}
+  }
+
+  for (i = 0; i < 1024; i++)
+    assert (arr[i] == (i % 5) + 1);
+}
+
+
+/* Test multiple conditions in vector-single mode.  */
+
+#define NUM_GANGS 4096
+void t12()
+{
+  bool fizz[NUM_GANGS], buzz[NUM_GANGS], fizzbuzz[NUM_GANGS];
+  int i;
+
+  #pragma acc parallel copyout(fizz, buzz, fizzbuzz) \
+		       num_gangs(NUM_GANGS) num_workers(1) vector_length(32)
+  {
+    int j;
+    
+    /* This loop and the one following must be distributed to available gangs
+       in the same way to ensure data dependencies are not violated (hence the
+       "static" clauses).  */
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < NUM_GANGS; j++)
+      fizz[j] = buzz[j] = fizzbuzz[j] = 0;
+    
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < NUM_GANGS; j++)
+      {
+	if ((j % 3) == 0 && (j % 5) == 0)
+	  fizzbuzz[j] = 1;
+	else
+	  {
+	    if ((j % 3) == 0)
+	      fizz[j] = 1;
+	    else if ((j % 5) == 0)
+	      buzz[j] = 1;
+	  }
+      }
+  }
+
+  for (i = 0; i < NUM_GANGS; i++)
+    {
+      assert (fizzbuzz[i] == ((i % 3) == 0 && (i % 5) == 0));
+      assert (fizz[i] == ((i % 3) == 0 && (i % 5) != 0));
+      assert (buzz[i] == ((i % 3) != 0 && (i % 5) == 0));
+    }
+}
+#undef NUM_GANGS
+
+
+/* Test worker-partitioned/vector-single mode.  */
+
+void t13()
+{
+  int arr[32 * 8], i;
+
+  for (i = 0; i < 32 * 8; i++)
+    arr[i] = 0;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	int k;
+	#pragma acc loop worker
+	for (k = 0; k < 8; k++)
+          arr[j * 8 + k] += j * 8 + k;
+      }
+  }
+
+  for (i = 0; i < 32 * 8; i++)
+    assert (arr[i] == i);
+}
+
+
+/* Test worker-single/worker-partitioned transitions.  */
+
+void t16()
+{
+  int n[32], arr[32 * 32], i;
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = 0;
+
+  for (i = 0; i < 32; i++)
+    n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) \
+		       num_gangs(8) num_workers(16) vector_length(32)
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	int k;
+
+	n[j]++;
+
+	#pragma acc loop worker
+	for (k = 0; k < 32; k++)
+          arr[j * 32 + k]++;
+
+	n[j]++;
+
+	#pragma acc loop worker
+	for (k = 0; k < 32; k++)
+          arr[j * 32 + k]++;
+
+	n[j]++;
+
+	#pragma acc loop worker
+	for (k = 0; k < 32; k++)
+          arr[j * 32 + k]++;
+
+	n[j]++;
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (n[i] == 4);
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == 3);
+}
+
+
+/* Test correct synchronisation between worker-partitioned loops.  */
+
+void t17()
+{
+  int arr_a[32 * 32], arr_b[32 * 32], i;
+  int num_workers, num_gangs;
+
+  for (num_workers = 1; num_workers <= 32; num_workers <<= 1)
+    for (num_gangs = 1; num_gangs <= 32; num_gangs <<= 1)
+      {
+	for (i = 0; i < 32 * 32; i++)
+	  arr_a[i] = i;
+
+	#pragma acc parallel copyin(arr_a) copyout(arr_b) \
+			     num_gangs(num_gangs) num_workers(num_workers) vector_length(32)
+	{
+	  int j;
+	  #pragma acc loop gang
+	  for (j = 0; j < 32; j++)
+	    {
+	      int k;
+
+	      #pragma acc loop worker
+	      for (k = 0; k < 32; k++)
+        	arr_b[j * 32 + (31 - k)] = arr_a[j * 32 + k] * 2;
+
+	      #pragma acc loop worker
+	      for (k = 0; k < 32; k++)
+        	arr_a[j * 32 + (31 - k)] = arr_b[j * 32 + k] * 2;
+
+	      #pragma acc loop worker
+	      for (k = 0; k < 32; k++)
+        	arr_b[j * 32 + (31 - k)] = arr_a[j * 32 + k] * 2;
+	    }
+	}
+
+	for (i = 0; i < 32 * 32; i++)
+	  assert (arr_b[i] == (i ^ 31) * 8);
+      }
+}
+
+
+/* Test correct synchronisation between worker+vector-partitioned loops.  */
+
+void t18()
+{
+  int arr_a[32 * 32 * 32], arr_b[32 * 32 * 32], i;
+  int num_workers, num_gangs;
+
+  for (num_workers = 1; num_workers <= 32; num_workers <<= 1)
+    for (num_gangs = 1; num_gangs <= 32; num_gangs <<= 1)
+      {
+	for (i = 0; i < 32 * 32 * 32; i++)
+	  arr_a[i] = i;
+
+	#pragma acc parallel copyin(arr_a) copyout(arr_b) \
+			     num_gangs(num_gangs) num_workers(num_workers) vector_length(32)
+	{
+	  int j;
+	  #pragma acc loop gang
+	  for (j = 0; j < 32; j++)
+	    {
+	      int k;
+
+	      #pragma acc loop worker vector
+	      for (k = 0; k < 32 * 32; k++)
+        	arr_b[j * 32 * 32 + (1023 - k)] = arr_a[j * 32 * 32 + k] * 2;
+
+	      #pragma acc loop worker vector
+	      for (k = 0; k < 32 * 32; k++)
+        	arr_a[j * 32 * 32 + (1023 - k)] = arr_b[j * 32 * 32 + k] * 2;
+
+	      #pragma acc loop worker vector
+	      for (k = 0; k < 32 * 32; k++)
+        	arr_b[j * 32 * 32 + (1023 - k)] = arr_a[j * 32 * 32 + k] * 2;
+	    }
+	}
+
+	for (i = 0; i < 32 * 32 * 32; i++)
+	  assert (arr_b[i] == (i ^ 1023) * 8);
+      }
+}
+
+
+/* Test correct synchronisation between vector-partitioned loops in
+   worker-partitioned mode.  */
+
+void t19()
+{
+  int n[32 * 32], arr_a[32 * 32 * 32], arr_b[32 * 32 * 32], i;
+  int num_workers, num_gangs;
+
+  for (num_workers = 1; num_workers <= 32; num_workers <<= 1)
+    for (num_gangs = 1; num_gangs <= 32; num_gangs <<= 1)
+      {
+	for (i = 0; i < 32 * 32 * 32; i++)
+	  arr_a[i] = i;
+
+	for (i = 0; i < 32 * 32; i++)
+          n[i] = 0;
+
+	#pragma acc parallel copy (n) copyin(arr_a) copyout(arr_b) \
+			     num_gangs(num_gangs) num_workers(num_workers) vector_length(32)
+	{
+	  int j;
+	  #pragma acc loop gang
+	  for (j = 0; j < 32; j++)
+	    {
+	      int k;
+
+	      #pragma acc loop worker
+	      for (k = 0; k < 32; k++)
+		{
+		  int m;
+
+		  n[j * 32 + k]++;
+
+		  #pragma acc loop vector
+		  for (m = 0; m < 32; m++)
+		    {
+	              if (((j * 1024 + k * 32 + m) % 2) == 0)
+			arr_b[j * 1024 + k * 32 + (31 - m)]
+			  = arr_a[j * 1024 + k * 32 + m] * 2;
+		      else
+			arr_b[j * 1024 + k * 32 + (31 - m)]
+			  = arr_a[j * 1024 + k * 32 + m] * 3;
+		    }
+
+		  /* Test returning to vector-single mode...  */
+		  n[j * 32 + k]++;
+
+		  #pragma acc loop vector
+		  for (m = 0; m < 32; m++)
+		    {
+	              if (((j * 1024 + k * 32 + m) % 3) == 0)
+			arr_a[j * 1024 + k * 32 + (31 - m)]
+			  = arr_b[j * 1024 + k * 32 + m] * 5;
+		      else
+			arr_a[j * 1024 + k * 32 + (31 - m)]
+			  = arr_b[j * 1024 + k * 32 + m] * 7;
+		    }
+
+		  /* ...and back-to-back vector loops.  */
+
+		  #pragma acc loop vector
+		  for (m = 0; m < 32; m++)
+		    {
+	              if (((j * 1024 + k * 32 + m) % 2) == 0)
+			arr_b[j * 1024 + k * 32 + (31 - m)]
+			  = arr_a[j * 1024 + k * 32 + m] * 3;
+		      else
+			arr_b[j * 1024 + k * 32 + (31 - m)]
+			  = arr_a[j * 1024 + k * 32 + m] * 2;
+		    }
+		}
+	    }
+	}
+
+	for (i = 0; i < 32 * 32; i++)
+          assert (n[i] == 2);
+
+	for (i = 0; i < 32 * 32 * 32; i++)
+          {
+	    int m = 6 * ((i % 3) == 0 ? 5 : 7);
+	    assert (arr_b[i] == (i ^ 31) * m);
+	  }
+      }
+}
+
+
+/* With -O0, variables are on the stack, not in registers.  Check that worker
+   state propagation handles the stack frame.  */
+
+void t20()
+{
+  int w0 = 0;
+  int w1 = 0;
+  int w2 = 0;
+  int w3 = 0;
+  int w4 = 0;
+  int w5 = 0;
+  int w6 = 0;
+  int w7 = 0;
+
+  int i;
+
+#pragma acc parallel copy (w0, w1, w2, w3, w4, w5, w6, w7) \
+		     num_gangs (1) num_workers (8)
+  {
+    int internal = 100;
+
+#pragma acc loop worker
+    for (i = 0; i < 8; i++)
+      {
+	switch (i)
+	  {
+	  case 0: w0 = internal; break;
+	  case 1: w1 = internal; break;
+	  case 2: w2 = internal; break;
+	  case 3: w3 = internal; break;
+	  case 4: w4 = internal; break;
+	  case 5: w5 = internal; break;
+	  case 6: w6 = internal; break;
+	  case 7: w7 = internal; break;
+	  default: break;
+	  }
+      }
+  }
+
+  if (w0 != 100
+      || w1 != 100
+      || w2 != 100
+      || w3 != 100
+      || w4 != 100
+      || w5 != 100
+      || w6 != 100
+      || w7 != 100)
+    __builtin_abort ();
+}
+
+
+/* Test worker-single/vector-single mode.  */
+
+void t21()
+{
+  int arr[32], i;
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 0;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      arr[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == 1);
+}
+
+
+/* Test worker-single/vector-single mode.  */
+
+void t22()
+{
+  int arr[32], i;
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 0;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc atomic
+	arr[j]++;
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == 1);
+}
+
+
+/* Test condition in worker-single/vector-single mode.  */
+
+void t23()
+{
+  int arr[32], i;
+
+  for (i = 0; i < 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      if ((arr[j] % 2) != 0)
+	arr[j]++;
+      else
+	arr[j] += 2;
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == ((i % 2) != 0) ? i + 1 : i + 2);
+}
+
+
+/* Test switch in worker-single/vector-single mode.  */
+
+void t24()
+{
+  int arr[32], i;
+
+  for (i = 0; i < 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      switch (arr[j] % 5)
+	{
+	case 0: arr[j] += 1; break;
+	case 1: arr[j] += 2; break;
+	case 2: arr[j] += 3; break;
+	case 3: arr[j] += 4; break;
+	case 4: arr[j] += 5; break;
+	default: arr[j] += 99;
+	}
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == i + (i % 5) + 1);
+}
+
+
+/* Test worker-single/vector-partitioned mode.  */
+
+void t25()
+{
+  int arr[32 * 32], i;
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	int k;
+	#pragma acc loop vector
+	for (k = 0; k < 32; k++)
+	  {
+	    #pragma acc atomic
+	    arr[j * 32 + k]++;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + 1);
+}
+
+
+/* Test worker-single, vector-partitioned, gang-redundant mode.  */
+
+#define ACTUAL_GANGS 8
+void t27()
+{
+  int n, arr[32], i;
+  int ondev;
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 0;
+
+  n = 0;
+
+  #pragma acc parallel copy(n, arr) copyout(ondev) \
+	  num_gangs(ACTUAL_GANGS) num_workers(8) vector_length(32)
+  {
+    int j;
+
+    ondev = acc_on_device (acc_device_not_host);
+
+    #pragma acc atomic
+    n++;
+
+    #pragma acc loop vector
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc atomic
+	arr[j] += 1;
+      }
+
+    #pragma acc atomic
+    n++;
+  }
+
+  int m = ondev ? ACTUAL_GANGS : 1;
+  
+  assert (n == m * 2);
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == m);
+}
+#undef ACTUAL_GANGS
+
+
+/* Check if worker-single variables get broadcastd to vectors.  */
+
+#pragma acc routine
+float t28_routine ()
+{
+  return 2.71;
+}
+
+#define N 32
+void t28()
+{
+  float threads[N], v1 = 3.14;
+
+  for (int i = 0; i < N; i++)
+    threads[i] = -1;
+
+#pragma acc parallel num_gangs (1) vector_length (32) copy (v1)
+  {
+    float val = t28_routine ();
+
+#pragma acc loop vector
+    for (int i = 0; i < N; i++)
+      threads[i] = val + v1*i;
+  }
+
+  for (int i = 0; i < N; i++)
+    assert (fabs (threads[i] - (t28_routine () + v1*i)) < 0.0001);
+}
+#undef N
+
+
+int main()
+{
+  t1();
+  t2();
+  t4();
+  t5();
+  t7();
+  t8();
+  t9();
+  t10();
+  t11();
+  t12();
+  t13();
+  t16();
+  t17();
+  t18();
+  t19();
+  t20();
+  t21();
+  t22();
+  t23();
+  t24();
+  t25();
+  t27();
+  t28();
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-variables.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-variables.c
new file mode 100644
index 0000000..53f03d1
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/private-variables.c
@@ -0,0 +1,953 @@
+#include <assert.h>
+#include <openacc.h>
+
+typedef struct {
+  int x, y;
+} vec2;
+
+typedef struct {
+  int x, y, z;
+  int attr[13];
+} vec3_attr;
+
+
+/* Test of gang-private variables declared in local scope with parallel
+   directive.  */
+
+void local_g_1()
+{
+  int i, arr[32];
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 3;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+    int x;
+
+    #pragma acc loop gang(static:1)
+    for (i = 0; i < 32; i++)
+      x = i * 2;
+
+    #pragma acc loop gang(static:1)
+    for (i = 0; i < 32; i++)
+      {
+	if (acc_on_device (acc_device_host))
+	  x = i * 2;
+	arr[i] += x;
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == 3 + i * 2);
+}
+
+
+/* Test of worker-private variables declared in a local scope, broadcasting
+   to vector-partitioned mode.  Back-to-back worker loops.  */
+
+void local_w_1()
+{
+  int i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    int x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+
+	#pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    int x = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared in a local scope, broadcasting
+   to vector-partitioned mode.  Successive vector loops.  */
+
+void local_w_2()
+{
+  int i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    int x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	    
+	    x = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared in a local scope, broadcasting
+   to vector-partitioned mode.  Aggregate worker variable.  */
+
+void local_w_3()
+{
+  int i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    vec2 pt;
+	    
+	    pt.x = i ^ j * 3;
+	    pt.y = i | j * 5;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt.x * k;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt.y * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared in a local scope, broadcasting
+   to vector-partitioned mode.  Addressable worker variable.  */
+
+void local_w_4()
+{
+  int i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    vec2 pt, *ptp;
+	    
+	    ptp = &pt;
+	    
+	    pt.x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += ptp->x * k;
+
+	    ptp->y = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt.y * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared in a local scope, broadcasting
+   to vector-partitioned mode.  Array worker variable.  */
+
+void local_w_5()
+{
+  int i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    int pt[2];
+	    
+	    pt[0] = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt[0] * k;
+
+	    pt[1] = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt[1] * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of gang-private variables declared on loop directive.  */
+
+void loop_g_1()
+{
+  int x = 5, i, arr[32];
+
+  for (i = 0; i < 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+    #pragma acc loop gang private(x)
+    for (i = 0; i < 32; i++)
+      {
+	x = i * 2;
+	arr[i] += x;
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == i * 3);
+}
+
+
+/* Test of gang-private variables declared on loop directive, with broadcasting
+   to partitioned workers.  */
+
+void loop_g_2()
+{
+  int x = 5, i, arr[32 * 32];
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+    #pragma acc loop gang private(x)
+    for (i = 0; i < 32; i++)
+      {
+	x = i * 2;
+
+	#pragma acc loop worker
+	for (int j = 0; j < 32; j++)
+	  arr[i * 32 + j] += x;
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + (i / 32) * 2);
+}
+
+
+/* Test of gang-private variables declared on loop directive, with broadcasting
+   to partitioned vectors.  */
+
+void loop_g_3()
+{
+  int x = 5, i, arr[32 * 32];
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+    #pragma acc loop gang private(x)
+    for (i = 0; i < 32; i++)
+      {
+	x = i * 2;
+
+	#pragma acc loop vector
+	for (int j = 0; j < 32; j++)
+	  arr[i * 32 + j] += x;
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + (i / 32) * 2);
+}
+
+
+/* Test of gang-private addressable variable declared on loop directive, with
+   broadcasting to partitioned workers.  */
+
+void loop_g_4()
+{
+  int x = 5, i, arr[32 * 32];
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+    #pragma acc loop gang private(x)
+    for (i = 0; i < 32; i++)
+      {
+        int *p = &x;
+
+	x = i * 2;
+
+	#pragma acc loop worker
+	for (int j = 0; j < 32; j++)
+	  arr[i * 32 + j] += x;
+
+	(*p)--;
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + (i / 32) * 2);
+}
+
+
+/* Test of gang-private array variable declared on loop directive, with
+   broadcasting to partitioned workers.  */
+
+void loop_g_5()
+{
+  int x[8], i, arr[32 * 32];
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+    #pragma acc loop gang private(x)
+    for (i = 0; i < 32; i++)
+      {
+        for (int j = 0; j < 8; j++)
+	  x[j] = j * 2;
+
+	#pragma acc loop worker
+	for (int j = 0; j < 32; j++)
+	  arr[i * 32 + j] += x[j % 8];
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + (i % 8) * 2);
+}
+
+
+/* Test of gang-private aggregate variable declared on loop directive, with
+   broadcasting to partitioned workers.  */
+
+void loop_g_6()
+{
+  int i, arr[32 * 32];
+  vec3_attr pt;
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+    #pragma acc loop gang private(pt)
+    for (i = 0; i < 32; i++)
+      {
+        pt.x = i;
+	pt.y = i * 2;
+	pt.z = i * 4;
+	pt.attr[5] = i * 6;
+
+	#pragma acc loop worker
+	for (int j = 0; j < 32; j++)
+	  arr[i * 32 + j] += pt.x + pt.y + pt.z + pt.attr[5];
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + (i / 32) * 13);
+}
+
+
+/* Test of vector-private variables declared on loop directive.  */
+
+void loop_v_1()
+{
+  int x, i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+
+	    #pragma acc loop vector private(x)
+	    for (k = 0; k < 32; k++)
+	      {
+		x = i ^ j * 3;
+		arr[i * 1024 + j * 32 + k] += x * k;
+	      }
+
+	    #pragma acc loop vector private(x)
+	    for (k = 0; k < 32; k++)
+	      {
+		x = i | j * 5;
+		arr[i * 1024 + j * 32 + k] += x * k;
+	      }
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of vector-private variables declared on loop directive. Array type.  */
+
+void loop_v_2()
+{
+  int pt[2], i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+
+	    #pragma acc loop vector private(pt)
+	    for (k = 0; k < 32; k++)
+	      {
+	        pt[0] = i ^ j * 3;
+		pt[1] = i | j * 5;
+		arr[i * 1024 + j * 32 + k] += pt[0] * k;
+		arr[i * 1024 + j * 32 + k] += pt[1] * k;
+	      }
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared on a loop directive.  */
+
+void loop_w_1()
+{
+  int x = 5, i, arr[32 * 32];
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker private(x)
+	for (j = 0; j < 32; j++)
+	  {
+	    x = i ^ j * 3;
+	    /* Try to ensure 'x' accesses doesn't get optimized into a
+	       temporary.  */
+	    __asm__ __volatile__ ("");
+	    arr[i * 32 + j] += x;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + ((i / 32) ^ (i % 32) * 3));
+}
+
+
+/* Test of worker-private variables declared on a loop directive, broadcasting
+   to vector-partitioned mode.  */
+
+void loop_w_2()
+{
+  int x = 5, i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker private(x)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared on a loop directive, broadcasting
+   to vector-partitioned mode.  Back-to-back worker loops.  */
+
+void loop_w_3()
+{
+  int x = 5, i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker private(x)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+
+	#pragma acc loop worker private(x)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    x = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared on a loop directive, broadcasting
+   to vector-partitioned mode.  Successive vector loops.  */
+
+void loop_w_4()
+{
+  int x = 5, i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker private(x)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	    
+	    x = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared on a loop directive, broadcasting
+   to vector-partitioned mode.  Addressable worker variable.  */
+
+void loop_w_5()
+{
+  int x = 5, i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker private(x)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    int *p = &x;
+	    
+	    x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	    
+	    *p = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared on a loop directive, broadcasting
+   to vector-partitioned mode.  Aggregate worker variable.  */
+
+void loop_w_6()
+{
+  int i, arr[32 * 32 * 32];
+  vec2 pt;
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker private(pt)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    
+	    pt.x = i ^ j * 3;
+	    pt.y = i | j * 5;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt.x * k;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt.y * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared on loop directive, broadcasting
+   to vector-partitioned mode.  Array worker variable.  */
+
+void loop_w_7()
+{
+  int i, arr[32 * 32 * 32];
+  int pt[2];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  /* "pt" is treated as "present_or_copy" on the parallel directive because it
+     is an array variable.  */
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        /* But here, it is made private per-worker.  */
+        #pragma acc loop worker private(pt)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    
+	    pt[0] = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt[0] * k;
+
+	    pt[1] = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt[1] * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of gang-private variables declared on the parallel directive.  */
+
+void parallel_g_1()
+{
+  int x = 5, i, arr[32];
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 3;
+
+  #pragma acc parallel private(x) copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  {
+    #pragma acc loop gang(static:1)
+    for (i = 0; i < 32; i++)
+      x = i * 2;
+
+    #pragma acc loop gang(static:1)
+    for (i = 0; i < 32; i++)
+      {
+	if (acc_on_device (acc_device_host))
+	  x = i * 2;
+	arr[i] += x;
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == 3 + i * 2);
+}
+
+
+/* Test of gang-private array variable declared on the parallel directive.  */
+
+void parallel_g_2()
+{
+  int x[32], i, arr[32 * 32];
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel private(x) copy(arr) num_gangs(32) num_workers(2) vector_length(32)
+  {
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        int j;
+	for (j = 0; j < 32; j++)
+	  x[j] = j * 2;
+	
+	#pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  arr[i * 32 + j] += x[31 - j];
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + (31 - (i % 32)) * 2);
+}
+
+
+int main ()
+{
+  local_g_1();
+  local_w_1();
+  local_w_2();
+  local_w_3();
+  local_w_4();
+  local_w_5();
+  loop_g_1();
+  loop_g_2();
+  loop_g_3();
+  loop_g_4();
+  loop_g_5();
+  loop_g_6();
+  loop_v_1();
+  loop_v_2();
+  loop_w_1();
+  loop_w_2();
+  loop_w_3();
+  loop_w_4();
+  loop_w_5();
+  loop_w_6();
+  loop_w_7();
+  parallel_g_1();
+  parallel_g_2();
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c
new file mode 100644
index 0000000..b23c758
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c
@@ -0,0 +1,129 @@
+/* Tests of reduction on loop directive.  */
+
+#include <assert.h>
+
+
+/* Test of reduction on loop directive (gangs, non-private reduction
+   variable).  */
+
+void g_np_1()
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  res = hres = 1;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang reduction(*:res)
+    for (i = 0; i < 12; i++)
+      res *= arr[i];
+  }
+
+  for (i = 0; i < 12; i++)
+    hres *= arr[i];
+
+  assert (res == hres);
+}
+
+
+/* Test of reduction on loop directive (gangs and vectors, non-private
+   reduction variable).  */
+
+void gv_np_1()
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang vector reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+}
+
+
+/* Test of reduction on loop directive (gangs and workers, non-private
+   reduction variable).  */
+
+void gw_np_1()
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang worker reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+}
+
+
+/* Test of reduction on loop directive (gangs, workers and vectors, non-private
+   reduction variable).  */
+
+void gwv_np_1()
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang worker vector reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+}
+
+
+int main()
+{
+  g_np_1();
+  gv_np_1();
+  gw_np_1();
+  gwv_np_1();
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/routine-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/routine-1.c
new file mode 100644
index 0000000..f112457
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-1.c
@@ -0,0 +1,88 @@
+// { dg-additional-options "-fno-exceptions" }
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#pragma acc routine
+int fact(int n)
+{
+  if (n == 0 || n == 1)
+    return 1;
+  else
+    return n * fact (n - 1);
+}
+
+int main()
+{
+  int *s, *g, *w, *v, *gw, *gv, *wv, *gwv, i, n = 10;
+
+  s = (int *) malloc (sizeof (int) * n);
+  g = (int *) malloc (sizeof (int) * n);
+  w = (int *) malloc (sizeof (int) * n);
+  v = (int *) malloc (sizeof (int) * n);
+  gw = (int *) malloc (sizeof (int) * n);
+  gv = (int *) malloc (sizeof (int) * n);
+  wv = (int *) malloc (sizeof (int) * n);
+  gwv = (int *) malloc (sizeof (int) * n);
+
+#pragma acc parallel loop async copyout(s[0:n]) seq
+  for (i = 0; i < n; i++)
+    s[i] = fact (i);
+
+#pragma acc parallel loop async copyout(g[0:n]) gang
+  for (i = 0; i < n; i++)
+    g[i] = fact (i);
+
+#pragma acc parallel loop async copyout(w[0:n]) worker
+  for (i = 0; i < n; i++)
+    w[i] = fact (i);
+
+#pragma acc parallel loop async copyout(v[0:n]) vector
+  for (i = 0; i < n; i++)
+    v[i] = fact (i);
+
+#pragma acc parallel loop async copyout(gw[0:n]) gang worker
+  for (i = 0; i < n; i++)
+    gw[i] = fact (i);
+
+#pragma acc parallel loop async copyout(gv[0:n]) gang vector
+  for (i = 0; i < n; i++)
+    gv[i] = fact (i);
+
+#pragma acc parallel loop async copyout(wv[0:n]) worker vector
+  for (i = 0; i < n; i++)
+    wv[i] = fact (i);
+
+#pragma acc parallel loop async copyout(gwv[0:n]) gang worker vector
+  for (i = 0; i < n; i++)
+    gwv[i] = fact (i);
+
+#pragma acc wait
+
+  for (i = 0; i < n; i++)
+    if (s[i] != fact (i))
+      abort ();
+  for (i = 0; i < n; i++)
+    if (g[i] != s[i])
+      abort ();
+  for (i = 0; i < n; i++)
+    if (w[i] != s[i])
+      abort ();
+  for (i = 0; i < n; i++)
+    if (v[i] != s[i])
+      abort ();
+  for (i = 0; i < n; i++)
+    if (gw[i] != s[i])
+      abort ();
+  for (i = 0; i < n; i++)
+    if (gv[i] != s[i])
+      abort ();
+  for (i = 0; i < n; i++)
+    if (wv[i] != s[i])
+      abort ();
+  for (i = 0; i < n; i++)
+    if (gwv[i] != s[i])
+      abort ();
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c
new file mode 100644
index 0000000..d6ff44d
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c
@@ -0,0 +1,123 @@
+#include <stdlib.h>
+#include <stdio.h>
+
+#define M 8
+#define N 32
+
+#pragma acc routine vector
+void
+vector (int *a)
+{
+  int i;
+
+#pragma acc loop vector
+  for (i = 0; i < N; i++)
+    a[i] -= a[i]; 
+}
+
+#pragma acc routine worker
+void
+worker (int *b)
+{
+  int i, j;
+
+#pragma acc loop worker
+  for (i = 0; i < N; i++)
+    {
+#pragma acc loop vector
+      for (j = 0; j < M; j++)
+        b[i * M + j] += b[i  * M + j]; 
+    }
+}
+
+#pragma acc routine gang
+void
+gang (int *a)
+{
+  int i;
+
+#pragma acc loop gang worker vector
+  for (i = 0; i < N; i++)
+    a[i] -= i; 
+}
+
+#pragma acc routine seq
+void
+seq (int *a)
+{
+  int i;
+
+  for (i = 0; i < N; i++)
+    a[i] += 1;
+}
+
+int
+main(int argc, char **argv)
+{
+  int i;
+  int a[N];
+  int b[M * N];
+
+  i = 0;
+
+  for (i = 0; i < N; i++)
+    a[i] = 0;
+
+#pragma acc parallel copy (a[0:N])
+  {
+#pragma acc loop seq
+    for (i = 0; i < N; i++)
+      seq (&a[0]);
+  }
+
+  for (i = 0; i < N; i++)
+    {
+      if (a[i] != N)
+	abort ();
+    }
+
+#pragma acc parallel copy (a[0:N])
+  {
+#pragma acc loop seq
+    for (i = 0; i < N; i++)
+      gang (&a[0]);
+  }
+
+  for (i = 0; i < N; i++)
+    {
+      if (a[i] != N + (N * (-1 * i)))
+	abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    a[i] = i;
+
+#pragma acc parallel copy (b[0:M*N])
+  {
+    worker (&b[0]);
+  }
+
+  for (i = 0; i < N; i++)
+    {
+      if (a[i] != i)
+	abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    a[i] = i;
+
+#pragma acc parallel copy (a[0:N])
+  {
+#pragma acc loop
+    for (i = 0; i < N; i++)
+      vector (&a[0]);
+  }
+
+  for (i = 0; i < N; i++)
+    {
+      if (a[i] != 0)
+	abort ();
+    }
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c
new file mode 100644
index 0000000..b5cbc90
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c
@@ -0,0 +1,76 @@
+/* This code uses nvptx inline assembly guarded with acc_on_device, which is
+   not optimized away at -O0, and then confuses the target assembler.
+   { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
+
+#include <stdio.h>
+#include <openacc.h>
+
+#define NUM_WORKERS 16
+#define NUM_VECTORS 32
+#define WIDTH 64
+#define HEIGHT 32
+
+#define WORK_ID(I,N)						\
+  (acc_on_device (acc_device_nvidia)				\
+   ? ({unsigned __r;						\
+       __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (__r));	\
+       __r; }) : (I % N))
+#define VEC_ID(I,N)						\
+  (acc_on_device (acc_device_nvidia)				\
+   ? ({unsigned __r;						\
+       __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (__r));	\
+       __r; }) : (I % N))
+
+#pragma acc routine worker
+void __attribute__ ((noinline))
+  WorkVec (int *ptr, int w, int h, int nw, int nv)
+{
+#pragma acc loop worker
+  for (int i = 0; i < h; i++)
+#pragma acc loop vector
+    for (int j = 0; j < w; j++)
+      ptr[i*w + j] = (WORK_ID (i, nw) << 8) | VEC_ID(j, nv);
+}
+
+int DoWorkVec (int nw)
+{
+  int ary[HEIGHT][WIDTH];
+  int err = 0;
+
+  for (int ix = 0; ix != HEIGHT; ix++)
+    for (int jx = 0; jx != WIDTH; jx++)
+      ary[ix][jx] = 0xdeadbeef;
+
+  printf ("spawning %d ...", nw); fflush (stdout);
+  
+#pragma acc parallel num_workers(nw) vector_length (NUM_VECTORS) copy (ary)
+  {
+    WorkVec ((int *)ary, WIDTH, HEIGHT, nw, NUM_VECTORS);
+  }
+
+  for (int ix = 0; ix != HEIGHT; ix++)
+    for (int jx = 0; jx != WIDTH; jx++)
+      {
+	int exp = ((ix % nw) << 8) | (jx % NUM_VECTORS);
+	
+	if (ary[ix][jx] != exp)
+	  {
+	    printf ("\nary[%d][%d] = %#x expected %#x", ix, jx,
+		    ary[ix][jx], exp);
+	    err = 1;
+	  }
+      }
+  printf (err ? " failed\n" : " ok\n");
+  
+  return err;
+}
+
+int main ()
+{
+  int err = 0;
+
+  for (int W = 1; W <= NUM_WORKERS; W <<= 1)
+    err |= DoWorkVec (W);
+
+  return err;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/update-1-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/update-1-2.c
deleted file mode 100644
index 82c3192..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/update-1-2.c
+++ /dev/null
@@ -1,361 +0,0 @@
-/* Copy of update-1.c with self exchanged with host for #pragma acc update.  */
-
-/* { dg-do run } */
-/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
-
-#include <openacc.h>
-#include <string.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <stdbool.h>
-
-int
-main (int argc, char **argv)
-{
-    int N = 8;
-    int NDIV2 = N / 2;
-    float *a, *b, *c;
-    float *d_a, *d_b, *d_c;
-    int i;
-
-    a = (float *) malloc (N * sizeof (float));
-    b = (float *) malloc (N * sizeof (float));
-    c = (float *) malloc (N * sizeof (float));
-
-    d_a = (float *) acc_malloc (N * sizeof (float));
-    d_b = (float *) acc_malloc (N * sizeof (float));
-    d_c = (float *) acc_malloc (N * sizeof (float));
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 3.0;
-        b[i] = 0.0;
-    }
-
-    acc_map_data (a, d_a, N * sizeof (float));
-    acc_map_data (b, d_b, N * sizeof (float));
-    acc_map_data (c, d_c, N * sizeof (float));
-
-#pragma acc update device (a[0:N], b[0:N])
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update self (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        if (a[i] != 3.0)
-            abort ();
-
-        if (b[i] != 3.0)
-            abort ();
-    }
-
-    if (!acc_is_present (&a[0], (N * sizeof (float))))
-      abort ();
-
-    if (!acc_is_present (&b[0], (N * sizeof (float))))
-      abort ();
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 5.0;
-        b[i] = 1.0;
-    }
-
-#pragma acc update device (a[0:N], b[0:N])
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update self (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        if (a[i] != 5.0)
-            abort ();
-
-        if (b[i] != 5.0)
-            abort ();
-    }
-
-    if (!acc_is_present (&a[0], (N * sizeof (float))))
-      abort ();
-
-    if (!acc_is_present (&b[0], (N * sizeof (float))))
-      abort ();
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 5.0;
-        b[i] = 1.0;
-    }
-
-#pragma acc update device (a[0:N], b[0:N])
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update host (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        if (a[i] != 5.0)
-            abort ();
-
-        if (b[i] != 5.0)
-            abort ();
-    }
-
-    if (!acc_is_present (&a[0], (N * sizeof (float))))
-      abort ();
-
-    if (!acc_is_present (&b[0], (N * sizeof (float))))
-      abort ();
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 6.0;
-        b[i] = 0.0;
-    }
-
-#pragma acc update device (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 9.0;
-    }
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update self (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        if (a[i] != 6.0)
-            abort ();
-
-        if (b[i] != 6.0)
-            abort ();
-    }
-
-    if (!acc_is_present (&a[0], (N * sizeof (float))))
-      abort ();
-
-    if (!acc_is_present (&b[0], (N * sizeof (float))))
-      abort ();
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 7.0;
-        b[i] = 2.0;
-    }
-
-#pragma acc update device (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 9.0;
-    }
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update self (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        if (a[i] != 7.0)
-            abort ();
-
-        if (b[i] != 7.0)
-            abort ();
-    }
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 9.0;
-    }
-
-#pragma acc update device (a[0:N])
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update self (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        if (a[i] != 9.0)
-            abort ();
-
-        if (b[i] != 9.0)
-            abort ();
-    }
-
-    if (!acc_is_present (&a[0], (N * sizeof (float))))
-      abort ();
-
-    if (!acc_is_present (&b[0], (N * sizeof (float))))
-      abort ();
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 5.0;
-    }
-
-#pragma acc update device (a[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 6.0;
-    }
-
-#pragma acc update device (a[0:NDIV2])
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update self (a[0:N], b[0:N])
-
-    for (i = 0; i < NDIV2; i++)
-    {
-        if (a[i] != 6.0)
-            abort ();
-
-        if (b[i] != 6.0)
-            abort ();
-    }
-
-    for (i = NDIV2; i < N; i++)
-    {
-        if (a[i] != 5.0)
-            abort ();
-
-        if (b[i] != 5.0)
-            abort ();
-    }
-
-    if (!acc_is_present (&a[0], (N * sizeof (float))))
-      abort ();
-
-    if (!acc_is_present (&b[0], (N * sizeof (float))))
-      abort ();
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 0.0;
-    }
-
-#pragma acc update device (a[0:4])
-
-#pragma acc parallel present (a[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            a[ii] = a[ii] + 1.0;
-    }
-
-#pragma acc update self (a[4:4])
-
-    for (i = 0; i < NDIV2; i++)
-    {
-        if (a[i] != 0.0)
-            abort ();
-    }
-
-    for (i = NDIV2; i < N; i++)
-    {
-        if (a[i] != 6.0)
-            abort ();
-    }
-
-#pragma acc update self (a[0:4])
-
-    for (i = 0; i < NDIV2; i++)
-    {
-        if (a[i] != 1.0)
-            abort ();
-    }
-
-    for (i = NDIV2; i < N; i++)
-    {
-        if (a[i] != 6.0)
-            abort ();
-    }
-
-    a[2] = 9;
-    a[3] = 9;
-    a[4] = 9;
-    a[5] = 9;
-
-#pragma acc update device (a[2:4])
-
-#pragma acc parallel present (a[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            a[ii] = a[ii] + 1.0;
-    }
-
-#pragma acc update self (a[2:4])
-
-    for (i = 0; i < 2; i++)
-    {
-      if (a[i] != 1.0)
-	abort ();
-    }
-
-    for (i = 2; i < 6; i++)
-    {
-      if (a[i] != 10.0)
-	abort ();
-    }
-
-    for (i = 6; i < N; i++)
-    {
-        if (a[i] != 6.0)
-            abort ();
-    }
-
-    return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c
index 8a51ee3..807347f 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/vector-loop.c
@@ -20,7 +20,7 @@ main (void)
 
 #pragma acc parallel vector_length (32) copyin (a,b) copyout (c)
   {
-#pragma acc loop /* vector clause is missing, since it's not yet supported.  */
+#pragma acc loop vector
     for (unsigned int i = 0; i < n; i++)
       c[i] = a[i] + b[i];
   }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-1a.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-1a.c
deleted file mode 100644
index 99c6dfb..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-1a.c
+++ /dev/null
@@ -1,28 +0,0 @@
-#include <assert.h>
-
-/* Test worker-single/vector-single mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr[32], i;
-
-  for (i = 0; i < 32; i++)
-    arr[i] = 0;
-
-  #pragma acc parallel copy(arr) num_gangs(8) num_workers(8) vector_length(32)
-  {
-    int j;
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-	#pragma acc atomic
-	arr[j]++;
-      }
-  }
-
-  for (i = 0; i < 32; i++)
-    assert (arr[i] == 1);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-4.c
deleted file mode 100644
index 84080d0..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-4.c
+++ /dev/null
@@ -1,28 +0,0 @@
-#include <assert.h>
-
-/* Test worker-single/vector-partitioned mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr[32], i;
-
-  for (i = 0; i < 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(1) num_workers(8) vector_length(32)
-      {
-	int k;
-	#pragma acc loop vector
-	for (k = 0; k < 32; k++)
-	  {
-	    #pragma acc atomic
-	    arr[k]++;
-	  }
-      }
-
-  for (i = 0; i < 32; i++)
-    assert (arr[i] == i + 1);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-6.c
deleted file mode 100644
index cbc3e37..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-6.c
+++ /dev/null
@@ -1,46 +0,0 @@
-#include <assert.h>
-
-#if defined(ACC_DEVICE_TYPE_host)
-#define ACTUAL_GANGS 1
-#else
-#define ACTUAL_GANGS 8
-#endif
-
-/* Test worker-single, vector-partitioned, gang-redundant mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int n, arr[32], i;
-
-  for (i = 0; i < 32; i++)
-    arr[i] = 0;
-
-  n = 0;
-
-  #pragma acc parallel copy(n, arr) num_gangs(ACTUAL_GANGS) num_workers(8) \
-	  vector_length(32)
-  {
-    int j;
-
-    #pragma acc atomic
-    n++;
-
-    #pragma acc loop vector
-    for (j = 0; j < 32; j++)
-      {
-	#pragma acc atomic
-	arr[j] += 1;
-      }
-
-    #pragma acc atomic
-    n++;
-  }
-
-  assert (n == ACTUAL_GANGS * 2);
-
-  for (i = 0; i < 32; i++)
-    assert (arr[i] == ACTUAL_GANGS);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90 libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90
index b6e637b..01728bd 100644
--- libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/asyncwait-1.f90
@@ -132,4 +132,126 @@ program asyncwait
      if (d(i) .ne. 1.0) call abort
      if (e(i) .ne. 11.0) call abort
   end do
+
+  a(:) = 3.0
+  b(:) = 0.0
+
+  !$acc data copy (a(1:N)) copy (b(1:N))
+
+  !$acc kernels async
+  !$acc loop
+  do i = 1, N
+     b(i) = a(i)
+  end do
+  !$acc end kernels
+
+  !$acc wait
+  !$acc end data
+
+  do i = 1, N
+     if (a(i) .ne. 3.0) call abort
+     if (b(i) .ne. 3.0) call abort
+  end do
+
+  a(:) = 2.0
+  b(:) = 0.0
+
+  !$acc data copy (a(1:N)) copy (b(1:N))
+
+  !$acc kernels async (1)
+  !$acc loop
+  do i = 1, N
+     b(i) = a(i)
+  end do
+  !$acc end kernels
+
+  !$acc wait (1)
+  !$acc end data
+
+  do i = 1, N
+     if (a(i) .ne. 2.0) call abort
+     if (b(i) .ne. 2.0) call abort
+  end do
+
+  a(:) = 3.0
+  b(:) = 0.0
+  c(:) = 0.0
+  d(:) = 0.0
+
+  !$acc data copy (a(1:N)) copy (b(1:N)) copy (c(1:N)) copy (d(1:N))
+
+  !$acc kernels async (1)
+  do i = 1, N
+     b(i) = (a(i) * a(i) * a(i)) / a(i)
+  end do
+  !$acc end kernels
+
+  !$acc kernels async (1)
+  do i = 1, N
+     c(i) = (a(i) * 4) / a(i)
+  end do
+  !$acc end kernels
+
+  !$acc kernels async (1)
+  !$acc loop
+  do i = 1, N
+     d(i) = ((a(i) * a(i) + a(i)) / a(i)) - a(i)
+  end do
+  !$acc end kernels
+
+  !$acc wait (1)
+  !$acc end data
+
+  do i = 1, N
+     if (a(i) .ne. 3.0) call abort
+     if (b(i) .ne. 9.0) call abort
+     if (c(i) .ne. 4.0) call abort
+     if (d(i) .ne. 1.0) call abort
+  end do
+
+  a(:) = 2.0
+  b(:) = 0.0
+  c(:) = 0.0
+  d(:) = 0.0
+  e(:) = 0.0
+
+  !$acc data copy (a(1:N), b(1:N), c(1:N), d(1:N), e(1:N))
+
+  !$acc kernels async (1)
+  do i = 1, N
+     b(i) = (a(i) * a(i) * a(i)) / a(i)
+  end do
+  !$acc end kernels
+
+  !$acc kernels async (1)
+  !$acc loop
+  do i = 1, N
+     c(i) = (a(i) * 4) / a(i)
+  end do
+  !$acc end kernels
+
+  !$acc kernels async (1)
+  !$acc loop
+  do i = 1, N
+     d(i) = ((a(i) * a(i) + a(i)) / a(i)) - a(i)
+  end do
+  !$acc end kernels
+
+  !$acc kernels wait (1) async (1)
+  !$acc loop
+  do i = 1, N
+     e(i) = a(i) + b(i) + c(i) + d(i)
+  end do
+  !$acc end kernels
+
+  !$acc wait (1)
+  !$acc end data
+
+  do i = 1, N
+     if (a(i) .ne. 2.0) call abort
+     if (b(i) .ne. 4.0) call abort
+     if (c(i) .ne. 4.0) call abort
+     if (d(i) .ne. 1.0) call abort
+     if (e(i) .ne. 11.0) call abort
+  end do
 end program asyncwait
diff --git libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90 libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90
index bade52b..fe131b6 100644
--- libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/asyncwait-2.f90
@@ -1,6 +1,6 @@
 ! { dg-do run }
 
-program parallel_wait
+program asyncwait
   integer, parameter :: N = 64
   real, allocatable :: a(:), b(:), c(:)
   integer i
@@ -33,8 +33,33 @@ program parallel_wait
   do i = 1, N
     if (c(i) .ne. 2.0) call abort
   end do
+
+  !$acc kernels async (0)
+  !$acc loop
+  do i = 1, N
+    a(i) = 1
+  end do
+  !$acc end kernels
+
+  !$acc kernels async (1)
+  !$acc loop
+  do i = 1, N
+    b(i) = 1
+  end do
+  !$acc end kernels
+
+  !$acc kernels wait (0, 1)
+  !$acc loop
+  do i = 1, N
+    c(i) = a(i) + b(i)
+  end do
+  !$acc end kernels
+
+  do i = 1, N
+    if (c(i) .ne. 2.0) call abort
+  end do
   
   deallocate (a)
   deallocate (b)
   deallocate (c)
-end program parallel_wait
+end program asyncwait
diff --git libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90 libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90
index d48dc11..fa96a01 100644
--- libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/asyncwait-3.f90
@@ -1,6 +1,6 @@
 ! { dg-do run }
 
-program parallel_wait
+program asyncwait
   integer, parameter :: N = 64
   real, allocatable :: a(:), b(:), c(:)
   integer i
@@ -35,8 +35,35 @@ program parallel_wait
   do i = 1, N
     if (c(i) .ne. 2.0) call abort
   end do
+
+  !$acc kernels async (0)
+  !$acc loop
+  do i = 1, N
+    a(i) = 1
+  end do
+  !$acc end kernels
+
+  !$acc kernels async (1)
+  !$acc loop
+  do i = 1, N
+    b(i) = 1
+  end do
+  !$acc end kernels
+
+  !$acc wait (0, 1)
+
+  !$acc kernels
+  !$acc loop
+  do i = 1, N
+    c(i) = a(i) + b(i)
+  end do
+  !$acc end kernels
+
+  do i = 1, N
+    if (c(i) .ne. 2.0) call abort
+  end do
   
   deallocate (a)
   deallocate (b)
   deallocate (c)
-end program parallel_wait
+end program asyncwait
diff --git libgomp/testsuite/libgomp.oacc-fortran/clauses-1.f90 libgomp/testsuite/libgomp.oacc-fortran/clauses-1.f90
new file mode 100644
index 0000000..e6ab78d
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/clauses-1.f90
@@ -0,0 +1,290 @@
+! { dg-do run }
+! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
+
+program main
+  use openacc
+  implicit none
+
+  integer, parameter :: N = 32
+  real, allocatable :: a(:), b(:), c(:)
+  integer i
+
+  i = 0
+
+  allocate (a(N))
+  allocate (b(N))
+  allocate (c(N))
+
+  a(:) = 3.0
+  b(:) = 0.0
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N))
+    do i = 1, N
+      b(i) = a(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 3.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 5.0
+  b(:) = 1.0
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N))
+    do i = 1, N
+      b(i) = a(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 5.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 6.0
+  b(:) = 0.0
+
+  call acc_copyin (a, sizeof (a))
+
+  a(:) = 9.0
+
+  !$acc parallel present_or_copyin (a(1:N)) copyout (b(1:N))
+    do i = 1, N
+      b(i) = a(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 6.0) call abort
+  end do
+
+  call acc_copyout (a, sizeof (a))
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 6.0
+  b(:) = 0.0
+
+  !$acc parallel copyin (a(1:N)) present_or_copyout (b(1:N))
+     do i = 1, N
+       b(i) = a(i)
+     end do
+  !$acc end parallel
+
+  do i = 1, N
+     if (b(i) .ne. 6.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 5.0
+  b(:) = 2.0
+
+  call acc_copyin (b, sizeof (b))
+
+  !$acc parallel copyin (a(1:N)) present_or_copyout (b(1:N))
+     do i = 1, N
+       b(i) = a(i)
+     end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 5.0) call abort
+    if (b(i) .ne. 2.0) call abort
+  end do
+
+  call acc_copyout (b, sizeof (b))
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 3.0;
+  b(:) = 4.0;
+
+  !$acc parallel copy (a(1:N)) copyout (b(1:N))
+    do i = 1, N
+      a(i) = a(i) + 1
+      b(i) = a(i) + 2
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 4.0) call abort
+    if (b(i) .ne. 6.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 4.0
+  b(:) = 7.0
+
+  !$acc parallel present_or_copy (a(1:N)) present_or_copy (b(1:N))
+    do i = 1, N
+      a(i) = a(i) + 1
+      b(i) = b(i) + 2
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 5.0) call abort
+    if (b(i) .ne. 9.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 3.0
+  b(:) = 7.0
+
+  call acc_copyin (a, sizeof (a))
+  call acc_copyin (b, sizeof (b))
+
+  !$acc parallel present_or_copy (a(1:N)) present_or_copy (b(1:N))
+    do i = 1, N
+      a(i) = a(i) + 1
+      b(i) = b(i) + 2
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 3.0) call abort
+    if (b(i) .ne. 7.0) call abort
+  end do
+
+  call acc_copyout (a, sizeof (a))
+  call acc_copyout (b, sizeof (b))
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 3.0
+  b(:) = 7.0
+
+  !$acc parallel copyin (a(1:N)) create (c(1:N)) copyout (b(1:N))
+    do i = 1, N
+      c(i) = a(i)
+      b(i) = c(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 3.0) call abort
+    if (b(i) .ne. 3.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+  if (acc_is_present (c) .eqv. .TRUE.) call abort
+
+  a(:) = 4.0
+  b(:) = 8.0
+
+  !$acc parallel copyin (a(1:N)) present_or_create (c(1:N)) copyout (b(1:N))
+    do i = 1, N
+      c(i) = a(i)
+      b(i) = c(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 4.0) call abort
+    if (b(i) .ne. 4.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+  if (acc_is_present (c) .eqv. .TRUE.) call abort
+
+  a(:) = 4.0
+
+  call acc_copyin (a, sizeof (a))
+  call acc_copyin (b, sizeof (b))
+  call acc_copyin (c, sizeof (c))
+
+  !$acc parallel present (a(1:N)) present (c(1:N)) present (b(1:N))
+    do i = 1, N
+      c(i) = a(i)
+      b(i) = c(i)
+    end do
+  !$acc end parallel
+
+  call acc_copyout (a, sizeof (a))
+  call acc_copyout (b, sizeof (b))
+  call acc_copyout (c, sizeof (c))
+  
+  do i = 1, N
+    if (a(i) .ne. 4.0) call abort
+    if (b(i) .ne. 4.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+  if (acc_is_present (c) .eqv. .TRUE.) call abort
+
+  a(:) = 6.0
+  b(:) = 0.0
+
+  call acc_copyin (a, sizeof (a))
+
+  a(:) = 9.0
+
+  !$acc parallel pcopyin (a(1:N)) copyout (b(1:N))
+    do i = 1, N
+      b(i) = a(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 6.0) call abort
+  end do
+  
+  call acc_copyout (a, sizeof (a))
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 6.0
+  b(:) = 0.0
+
+  !$acc parallel copyin (a(1:N)) pcopyout (b(1:N))
+   do i = 1, N
+     b(i) = a(i)
+   end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 6.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 5.0
+  b(:) = 7.0
+
+  !$acc parallel copyin (a(1:N)) pcreate (c(1:N)) copyout (b(1:N))
+    do i = 1, N
+      c(i) = a(i)
+      b(i) = c(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 5.0) call abort
+    if (b(i) .ne. 5.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+  if (acc_is_present (c) .eqv. .TRUE.) call abort
+
+end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90 libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
index f717d1b..2d4b707 100644
--- libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
@@ -1,29 +1,22 @@
 ! { dg-do run  { target openacc_nvidia_accel_selected } }
 
+! Tests to exercise the declare directive along with
+! the clauses: copy
+!              copyin
+!              copyout
+!              create
+!              present
+!              present_or_copy
+!              present_or_copyin
+!              present_or_copyout
+!              present_or_create
+
 module vars
   implicit none
   integer z
   !$acc declare create (z)
 end module vars
 
-subroutine subr6 (a, d)
-  implicit none
-  integer, parameter :: N = 8
-  integer :: i
-  integer :: a(N)
-  !$acc declare deviceptr (a)
-  integer :: d(N)
-
-  i = 0
-
-  !$acc parallel copy (d)
-    do i = 1, N
-      d(i) = a(i) + a(i)
-    end do
-  !$acc end parallel
-
-end subroutine
-
 subroutine subr5 (a, b, c, d)
   implicit none
   integer, parameter :: N = 8
@@ -201,15 +194,6 @@ subroutine subr0 (a, b, c, d)
     if (d(i) .ne. 13) call abort
   end do
 
-  call subr6 (a, d)
-
-  call test (a, .true.)
-  call test (d, .false.)
-
-  do i = 1, N
-    if (d(i) .ne. 16) call abort
-  end do
-
 end subroutine
 
 program main
@@ -241,8 +225,7 @@ program main
     if (a(i) .ne. 8) call abort
     if (b(i) .ne. 8) call abort
     if (c(i) .ne. 8) call abort
-    if (d(i) .ne. 16) call abort
+    if (d(i) .ne. 13) call abort
   end do
 
-
 end program
diff --git libgomp/testsuite/libgomp.oacc-fortran/default-1.f90 libgomp/testsuite/libgomp.oacc-fortran/default-1.f90
new file mode 100644
index 0000000..1059089
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/default-1.f90
@@ -0,0 +1,54 @@
+! { dg-do run }
+
+program main
+  implicit none
+  real a, b
+  real c
+  !$acc declare create (c)
+
+  a = 2.0
+  b = 0.0
+
+  !$acc parallel copy (a) create (b) default (none)
+    b = a
+    a = 1.0
+    a = a + b
+  !$acc end parallel
+
+  if (a .ne. 3.0) call abort
+
+  !$acc kernels copy (a) create (b) default (none)
+    b = a
+    a = 1.0
+    a = a + b
+  !$acc end kernels
+
+  if (a .ne. 4.0) call abort
+
+  !$acc parallel default (none) copy (a) create (b)
+    b = a
+    a = 1.0
+    a = a + b
+  !$acc end parallel
+
+  if (a .ne. 5.0) call abort
+
+  !$acc parallel default (none) copy (a)
+    c = a
+    a = 1.0
+    a = a + c
+  !$acc end parallel
+
+  if (a .ne. 6.0) call abort
+
+  !$acc data copy (a)
+  !$acc parallel default (none)
+    c = a
+    a = 1.0
+    a = a + c
+  !$acc end parallel
+  !$acc end data
+
+  if (a .ne. 7.0) call abort
+
+end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/firstprivate-1.f90 libgomp/testsuite/libgomp.oacc-fortran/firstprivate-1.f90
new file mode 100644
index 0000000..d3f9093
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/firstprivate-1.f90
@@ -0,0 +1,42 @@
+! { dg-do run }
+
+program firstprivate
+  integer, parameter :: Nupper=100
+  integer :: a, b(Nupper), c, d, n
+  include "openacc_lib.h"
+
+  if (acc_get_device_type () .eq. acc_device_nvidia) then
+     n = Nupper
+  else
+     n = 1
+  end if
+
+  b(:) = -1
+  a = 5
+
+  !$acc parallel firstprivate (a) num_gangs (n)
+  !$acc loop gang
+  do i = 1, n
+     a = a + i
+     b(i) = a
+  end do
+  !$acc end parallel
+
+  do i = 1, n
+     if (b(i) .ne. i + a) call abort ()
+  end do
+
+  !$acc data copy (a)
+  !$acc parallel firstprivate (a) copyout (c)
+  a = 10
+  c = a
+  !$acc end parallel
+
+  !$acc parallel copyout (d) present (a)
+  d = a
+  !$acc end parallel
+  !$acc end data
+
+  if (c .ne. 10) call abort ()
+  if (d .ne. 5) call abort ()
+end program firstprivate
diff --git libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90 libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
new file mode 100644
index 0000000..7d56060
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/gang-static-1.f90
@@ -0,0 +1,79 @@
+! { dg-do run }
+
+program main
+  integer, parameter :: n = 100
+  integer i, a(n), b(n)
+  integer x
+
+  do i = 1, n
+     b(i) = i
+  end do
+
+  !$acc parallel loop gang (static:*) num_gangs (10)
+  do i = 1, n
+     a(i) = b(i) + 0
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 0, n)
+
+  !$acc parallel loop gang (static:1) num_gangs (10)
+  do i = 1, n
+     a(i) = b(i) + 1
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 1, n)
+
+  !$acc parallel loop gang (static:2) num_gangs (10)
+  do i = 1, n
+     a(i) = b(i) + 2
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 2, n)
+
+  !$acc parallel loop gang (static:5) num_gangs (10)
+  do i = 1, n
+     a(i) = b(i) + 5
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 5, n)
+
+  !$acc parallel loop gang (static:20) num_gangs (10)
+  do i = 1, n
+     a(i) = b(i) + 20
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 20, n)
+
+  x = 5
+  !$acc parallel loop gang (static:0+x) num_gangs (10)
+  do i = 1, n
+     a(i) = b(i) + 5
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 5, n)
+
+  x = 10
+  !$acc parallel loop gang (static:x) num_gangs (10)
+  do i = 1, n
+     a(i) = b(i) + 10
+  end do
+  !$acc end parallel loop
+
+  call test (a, b, 10, n)
+end program main
+
+subroutine test (a, b, sarg, n)
+  integer n
+  integer a (n), b(n), sarg
+  integer i
+
+  do i = 1, n
+     if (a(i) .ne. b(i) + sarg) call abort ()
+  end do
+end subroutine test
diff --git libgomp/testsuite/libgomp.oacc-fortran/if-1.f90 libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
new file mode 100644
index 0000000..44055e1
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/if-1.f90
@@ -0,0 +1,886 @@
+! { dg-do run }
+! { dg-additional-options "-cpp" }
+
+program main
+  use openacc
+  implicit none
+
+  integer, parameter :: N = 8
+  integer, parameter :: one = 1
+  integer, parameter :: zero = 0
+  integer i, nn
+  real, allocatable :: a(:), b(:)
+  real exp, exp2
+
+  i = 0
+
+  allocate (a(N))
+  allocate (b(N))
+
+  a(:) = 4.0
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N)) if (1 == 1)
+     do i = 1, N
+        if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+          b(i) = a(i) + 1
+        else
+          b(i) = a(i)
+        end if
+     end do
+  !$acc end parallel
+
+#if ACC_MEM_SHARED
+  exp = 5.0
+#else
+  exp = 4.0
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 16.0
+
+  !$acc parallel if (0 == 1)
+     do i = 1, N
+       if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+         b(i) = a(i) + 1
+       else
+         b(i) = a(i)
+       end if
+     end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 17.0) call abort
+  end do
+
+  a(:) = 8.0
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N)) if (one == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+#if ACC_MEM_SHARED
+  exp = 9.0
+#else
+  exp = 8.0
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 22.0
+
+  !$acc parallel if (zero == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 23.0) call abort
+  end do
+
+  a(:) = 16.0
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N)) if (.TRUE.)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+#if ACC_MEM_SHARED
+  exp = 17.0;
+#else
+  exp = 16.0;
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 76.0
+
+  !$acc parallel if (.FALSE.)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 77.0) call abort
+  end do
+
+  a(:) = 22.0
+
+  nn = 1
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N)) if (nn == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+#if ACC_MEM_SHARED
+  exp = 23.0;
+#else
+  exp = 22.0;
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 18.0
+
+  nn = 0
+
+  !$acc parallel if (nn == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 19.0) call abort
+  end do
+
+  a(:) = 49.0
+
+  nn = 1
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N)) if ((nn + nn) > 0)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+#if ACC_MEM_SHARED
+  exp = 50.0
+#else
+  exp = 49.0
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 38.0
+
+  nn = 0;
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N)) if ((nn + nn) > 0)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 39.0) call abort
+  end do
+
+  a(:) = 91.0
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N)) if (-2 > 0)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 92.0) call abort
+  end do
+
+  a(:) = 43.0
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N)) if (one == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+#if ACC_MEM_SHARED
+  exp = 44.0
+#else
+  exp = 43.0
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 87.0
+
+  !$acc parallel if (one == 0)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 88.0) call abort
+  end do
+
+  a(:) = 3.0
+  b(:) = 9.0
+
+#if ACC_MEM_SHARED
+  exp = 0.0
+  exp2 = 0.0
+#else
+  call acc_copyin (a, sizeof (a))
+  call acc_copyin (b, sizeof (b))
+  exp = 3.0;
+  exp2 = 9.0;
+#endif
+
+  !$acc update device (a(1:N), b(1:N)) if (1 == 1)
+
+  a(:) = 0.0
+  b(:) = 0.0
+
+  !$acc update host (a(1:N), b(1:N)) if (1 == 1)
+
+  do i = 1, N
+    if (a(i) .ne. exp) call abort
+    if (b(i) .ne. exp2) call abort
+  end do
+
+  a(:) = 6.0
+  b(:) = 12.0
+
+  !$acc update device (a(1:N), b(1:N)) if (0 == 1)
+
+  a(:) = 0.0
+  b(:) = 0.0
+
+  !$acc update host (a(1:N), b(1:N)) if (1 == 1)
+
+  do i = 1, N
+    if (a(i) .ne. exp) call abort
+    if (b(i) .ne. exp2) call abort
+  end do
+
+  a(:) = 26.0
+  b(:) = 21.0
+
+  !$acc update device (a(1:N), b(1:N)) if (1 == 1)
+
+  a(:) = 0.0
+  b(:) = 0.0
+
+  !$acc update host (a(1:N), b(1:N)) if (0 == 1)
+
+  do i = 1, N
+    if (a(i) .ne. 0.0) call abort
+    if (b(i) .ne. 0.0) call abort
+  end do
+
+#if !ACC_MEM_SHARED
+  call acc_copyout (a, sizeof (a))
+  call acc_copyout (b, sizeof (b))
+#endif
+
+  a(:) = 4.0
+  b(:) = 0.0
+
+  !$acc data copyin (a(1:N)) copyout (b(1:N)) if (1 == 1)
+
+    !$acc parallel present (a(1:N))
+       do i = 1, N
+           b(i) = a(i)
+       end do
+    !$acc end parallel
+  !$acc end data
+
+  do i = 1, N
+    if (b(i) .ne. 4.0) call abort
+  end do
+
+  a(:) = 8.0
+  b(:) = 1.0
+
+  !$acc data copyin (a(1:N)) copyout (b(1:N)) if (0 == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc end data
+
+  a(:) = 18.0
+  b(:) = 21.0
+
+  !$acc data copyin (a(1:N)) if (1 == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (a) .eqv. .FALSE.) call abort
+#endif
+
+    !$acc data copyout (b(1:N)) if (0 == 1)
+#if !ACC_MEM_SHARED
+      if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+        !$acc data copyout (b(1:N)) if (1 == 1)
+
+        !$acc parallel present (a(1:N)) present (b(1:N))
+          do i = 1, N
+            b(i) = a(i)
+          end do
+      !$acc end parallel
+
+    !$acc end data
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+    !$acc end data
+  !$acc end data
+
+  do i = 1, N
+   if (b(1) .ne. 18.0) call abort
+  end do
+
+  !$acc enter data copyin (b(1:N)) if (0 == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (0 == 1)
+
+  !$acc enter data copyin (b(1:N)) if (1 == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .FALSE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (1 == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc enter data copyin (b(1:N)) if (zero == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (zero == 1)
+
+  !$acc enter data copyin (b(1:N)) if (one == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .FALSE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (one == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc enter data copyin (b(1:N)) if (one == 0)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (one == 0)
+
+  !$acc enter data copyin (b(1:N)) if (one == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .FALSE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (one == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  a(:) = 4.0
+
+  !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if (1 == 1)
+     do i = 1, N
+        if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+          b(i) = a(i) + 1
+        else
+          b(i) = a(i)
+        end if
+     end do
+  !$acc end kernels
+
+#if ACC_MEM_SHARED
+  exp = 5.0
+#else
+  exp = 4.0
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 16.0
+
+  !$acc kernels if (0 == 1)
+     do i = 1, N
+       if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+         b(i) = a(i) + 1
+       else
+         b(i) = a(i)
+       end if
+     end do
+  !$acc end kernels
+
+  do i = 1, N
+    if (b(i) .ne. 17.0) call abort
+  end do
+
+  a(:) = 8.0
+
+  !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if (one == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+#if ACC_MEM_SHARED
+  exp = 9.0
+#else
+  exp = 8.0
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 22.0
+
+  !$acc kernels if (zero == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+  do i = 1, N
+    if (b(i) .ne. 23.0) call abort
+  end do
+
+  a(:) = 16.0
+
+  !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if (.TRUE.)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+#if ACC_MEM_SHARED
+  exp = 17.0;
+#else
+  exp = 16.0;
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 76.0
+
+  !$acc kernels if (.FALSE.)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+  do i = 1, N
+    if (b(i) .ne. 77.0) call abort
+  end do
+
+  a(:) = 22.0
+
+  nn = 1
+
+  !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if (nn == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+#if ACC_MEM_SHARED
+  exp = 23.0;
+#else
+  exp = 22.0;
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 18.0
+
+  nn = 0
+
+  !$acc kernels if (nn == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+  do i = 1, N
+    if (b(i) .ne. 19.0) call abort
+  end do
+
+  a(:) = 49.0
+
+  nn = 1
+
+  !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if ((nn + nn) > 0)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+#if ACC_MEM_SHARED
+  exp = 50.0
+#else
+  exp = 49.0
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 38.0
+
+  nn = 0;
+
+  !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if ((nn + nn) > 0)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+  do i = 1, N
+    if (b(i) .ne. 39.0) call abort
+  end do
+
+  a(:) = 91.0
+
+  !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if (-2 > 0)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+  do i = 1, N
+    if (b(i) .ne. 92.0) call abort
+  end do
+
+  a(:) = 43.0
+
+  !$acc kernels copyin (a(1:N)) copyout (b(1:N)) if (one == 1)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+#if ACC_MEM_SHARED
+  exp = 44.0
+#else
+  exp = 43.0
+#endif
+
+  do i = 1, N
+    if (b(i) .ne. exp) call abort
+  end do
+
+  a(:) = 87.0
+
+  !$acc kernels if (one == 0)
+    do i = 1, N
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) then
+        b(i) = a(i) + 1
+      else
+        b(i) = a(i)
+      end if
+    end do
+  !$acc end kernels
+
+  do i = 1, N
+    if (b(i) .ne. 88.0) call abort
+  end do
+
+  a(:) = 3.0
+  b(:) = 9.0
+
+#if ACC_MEM_SHARED
+  exp = 0.0
+  exp2 = 0.0
+#else
+  call acc_copyin (a, sizeof (a))
+  call acc_copyin (b, sizeof (b))
+  exp = 3.0;
+  exp2 = 9.0;
+#endif
+
+  !$acc update device (a(1:N), b(1:N)) if (1 == 1)
+
+  a(:) = 0.0
+  b(:) = 0.0
+
+  !$acc update host (a(1:N), b(1:N)) if (1 == 1)
+
+  do i = 1, N
+    if (a(i) .ne. exp) call abort
+    if (b(i) .ne. exp2) call abort
+  end do
+
+  a(:) = 6.0
+  b(:) = 12.0
+
+  !$acc update device (a(1:N), b(1:N)) if (0 == 1)
+
+  a(:) = 0.0
+  b(:) = 0.0
+
+  !$acc update host (a(1:N), b(1:N)) if (1 == 1)
+
+  do i = 1, N
+    if (a(i) .ne. exp) call abort
+    if (b(i) .ne. exp2) call abort
+  end do
+
+  a(:) = 26.0
+  b(:) = 21.0
+
+  !$acc update device (a(1:N), b(1:N)) if (1 == 1)
+
+  a(:) = 0.0
+  b(:) = 0.0
+
+  !$acc update host (a(1:N), b(1:N)) if (0 == 1)
+
+  do i = 1, N
+    if (a(i) .ne. 0.0) call abort
+    if (b(i) .ne. 0.0) call abort
+  end do
+
+#if !ACC_MEM_SHARED
+  call acc_copyout (a, sizeof (a))
+  call acc_copyout (b, sizeof (b))
+#endif
+
+  a(:) = 4.0
+  b(:) = 0.0
+
+  !$acc data copyin (a(1:N)) copyout (b(1:N)) if (1 == 1)
+
+    !$acc kernels present (a(1:N))
+       do i = 1, N
+           b(i) = a(i)
+       end do
+    !$acc end kernels
+  !$acc end data
+
+  do i = 1, N
+    if (b(i) .ne. 4.0) call abort
+  end do
+
+  a(:) = 8.0
+  b(:) = 1.0
+
+  !$acc data copyin (a(1:N)) copyout (b(1:N)) if (0 == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc end data
+
+  a(:) = 18.0
+  b(:) = 21.0
+
+  !$acc data copyin (a(1:N)) if (1 == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (a) .eqv. .FALSE.) call abort
+#endif
+
+    !$acc data copyout (b(1:N)) if (0 == 1)
+#if !ACC_MEM_SHARED
+      if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+        !$acc data copyout (b(1:N)) if (1 == 1)
+
+        !$acc kernels present (a(1:N)) present (b(1:N))
+          do i = 1, N
+            b(i) = a(i)
+          end do
+      !$acc end kernels
+
+    !$acc end data
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+    !$acc end data
+  !$acc end data
+
+  do i = 1, N
+   if (b(1) .ne. 18.0) call abort
+  end do
+
+  !$acc enter data copyin (b(1:N)) if (0 == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (0 == 1)
+
+  !$acc enter data copyin (b(1:N)) if (1 == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .FALSE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (1 == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc enter data copyin (b(1:N)) if (zero == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (zero == 1)
+
+  !$acc enter data copyin (b(1:N)) if (one == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .FALSE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (one == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc enter data copyin (b(1:N)) if (one == 0)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (one == 0)
+
+  !$acc enter data copyin (b(1:N)) if (one == 1)
+
+#if !ACC_MEM_SHARED
+    if (acc_is_present (b) .eqv. .FALSE.) call abort
+#endif
+
+  !$acc exit data delete (b(1:N)) if (one == 1)
+
+#if !ACC_MEM_SHARED
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+#endif
+
+end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/implicit-firstprivate-ref.f90 libgomp/testsuite/libgomp.oacc-fortran/implicit-firstprivate-ref.f90
new file mode 100644
index 0000000..a5f3840
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/implicit-firstprivate-ref.f90
@@ -0,0 +1,42 @@
+! This test checks if the runtime can properly handle implicit
+! firstprivate varaibles inside subroutines in modules.
+
+! { dg-do run }
+
+module test_mod
+  contains
+    subroutine test(x)
+
+      IMPLICIT NONE
+
+      INTEGER      :: x, y, j
+
+      x = 5
+
+      !$ACC PARALLEL LOOP copyout (y)
+      DO j=1,10
+         y=x
+      ENDDO
+      !$ACC END PARALLEL LOOP
+
+      y = -1;
+
+      !$ACC PARALLEL LOOP firstprivate (y) copyout (x)
+      DO j=1,10
+         x=y
+      ENDDO
+      !$ACC END PARALLEL LOOP
+    end subroutine test
+end module test_mod
+
+program t
+  use test_mod
+
+  INTEGER      :: x_min
+
+  x_min = 8
+
+  CALL test(x_min)
+
+  if (x_min .ne. -1) call abort
+end program t
diff --git libgomp/testsuite/libgomp.oacc-fortran/pr68813.f90 libgomp/testsuite/libgomp.oacc-fortran/pr68813.f90
new file mode 100644
index 0000000..735350f
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/pr68813.f90
@@ -0,0 +1,19 @@
+program foo
+  implicit none
+  integer, parameter :: n = 100
+  integer, dimension(n,n) :: a
+  integer :: i, j, sum = 0
+
+  a = 1
+
+  !$acc parallel copyin(a(1:n,1:n)) firstprivate (sum)
+  !$acc loop gang reduction(+:sum)
+  do i=1, n
+     !$acc loop vector reduction(+:sum)
+     do j=1, n
+        sum = sum + a(i, j)
+     enddo
+  enddo
+  !$acc end parallel
+
+end program foo
diff --git libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90 libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90
new file mode 100644
index 0000000..3c1940b
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90
@@ -0,0 +1,544 @@
+! Miscellaneous tests for private variables.
+
+! { dg-do run }
+
+
+! Test of gang-private variables declared on loop directive.
+
+subroutine t1()
+  integer :: x, i, arr(32)
+
+  do i = 1, 32
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang private(x)
+  do i = 1, 32
+     x = i * 2;
+     arr(i) = arr(i) + x
+  end do
+  !$acc end parallel
+
+  do i = 1, 32
+     if (arr(i) .ne. i * 3) call abort
+  end do
+end subroutine t1
+
+
+! Test of gang-private variables declared on loop directive, with broadcasting
+! to partitioned workers.
+
+subroutine t2()
+  integer :: x, i, j, arr(0:32*32)
+
+  do i = 0, 32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang private(x)
+  do i = 0, 31
+     x = i * 2;
+
+     !$acc loop worker
+     do j = 0, 31
+        arr(i * 32 + j) = arr(i * 32 + j) + x
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 * 32 - 1
+     if (arr(i) .ne. i + (i / 32) * 2) call abort
+  end do
+end subroutine t2
+
+
+! Test of gang-private variables declared on loop directive, with broadcasting
+! to partitioned vectors.
+
+subroutine t3()
+  integer :: x, i, j, arr(0:32*32)
+
+  do i = 0, 32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang private(x)
+  do i = 0, 31
+     x = i * 2;
+
+     !$acc loop vector
+     do j = 0, 31
+        arr(i * 32 + j) = arr(i * 32 + j) + x
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 * 32 - 1
+     if (arr(i) .ne. i + (i / 32) * 2) call abort
+  end do
+end subroutine t3
+
+
+! Test of gang-private addressable variable declared on loop directive, with
+! broadcasting to partitioned workers.
+
+subroutine t4()
+  type vec3
+     integer x, y, z, attr(13)
+  end type vec3
+
+  integer i, j, arr(0:32*32)
+  type(vec3) pt
+  
+  do i = 0, 32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang private(pt)
+  do i = 0, 31
+     pt%x = i
+     pt%y = i * 2
+     pt%z = i * 4
+     pt%attr(5) = i * 6
+
+     !$acc loop vector
+     do j = 0, 31
+        arr(i * 32 + j) = arr(i * 32 + j) + pt%x + pt%y + pt%z + pt%attr(5);
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 * 32 - 1
+     if (arr(i) .ne. i + (i / 32) * 13) call abort
+  end do
+end subroutine t4
+
+
+! Test of vector-private variables declared on loop directive.
+
+subroutine t5()
+  integer :: x, i, j, k, idx, arr(0:32*32*32)
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker
+     do j = 0, 31
+        !$acc loop vector private(x)
+        do k = 0, 31
+           x = ieor(i, j * 3)
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+        !$acc loop vector private(x)
+        do k = 0, 31
+           x = ior(i, j * 5)
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t5
+
+
+! Test of vector-private variables declared on loop directive. Array type.
+
+subroutine t6()
+  integer :: i, j, k, idx, arr(0:32*32*32), pt(2)
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker
+     do j = 0, 31
+        !$acc loop vector private(x, pt)
+        do k = 0, 31
+           pt(1) = ieor(i, j * 3)
+           pt(2) = ior(i, j * 5)
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt(1) * k
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt(2) * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t6
+
+
+! Test of worker-private variables declared on a loop directive.
+
+subroutine t7()
+  integer :: x, i, j, arr(0:32*32)
+  common x
+
+  do i = 0, 32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang private(x)
+  do i = 0, 31
+     !$acc loop worker private(x)
+     do j = 0, 31
+        x = ieor(i, j * 3)
+        arr(i * 32 + j) = arr(i * 32 + j) + x
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 * 32 - 1
+     if (arr(i) .ne. i + ieor(i / 32, mod(i, 32) * 3)) call abort
+  end do
+end subroutine t7
+
+
+! Test of worker-private variables declared on a loop directive, broadcasting
+! to vector-partitioned mode.
+
+subroutine t8()
+  integer :: x, i, j, k, idx, arr(0:32*32*32)
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker private(x)
+     do j = 0, 31
+        x = ieor(i, j * 3)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k) call abort
+        end do
+     end do
+  end do
+end subroutine t8
+
+
+! Test of worker-private variables declared on a loop directive, broadcasting
+! to vector-partitioned mode.  Back-to-back worker loops.
+
+subroutine t9()
+  integer :: x, i, j, k, idx, arr(0:32*32*32)
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker private(x)
+     do j = 0, 31
+        x = ieor(i, j * 3)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+     end do
+
+     !$acc loop worker private(x)
+     do j = 0, 31
+        x = ior(i, j * 5)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t9
+
+
+! Test of worker-private variables declared on a loop directive, broadcasting
+! to vector-partitioned mode.  Successive vector loops.  */
+
+subroutine t10()
+  integer :: x, i, j, k, idx, arr(0:32*32*32)
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker private(x)
+     do j = 0, 31
+        x = ieor(i, j * 3)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+
+        x = ior(i, j * 5)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t10
+
+
+! Test of worker-private variables declared on a loop directive, broadcasting
+! to vector-partitioned mode.  Addressable worker variable.
+
+subroutine t11()
+  integer :: i, j, k, idx, arr(0:32*32*32)
+  integer, target :: x
+  integer, pointer :: p
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker private(x, p)
+     do j = 0, 31
+        p => x
+        x = ieor(i, j * 3)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+
+        p = ior(i, j * 5)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t11
+
+
+! Test of worker-private variables declared on a loop directive, broadcasting
+! to vector-partitioned mode.  Aggregate worker variable.
+
+subroutine t12()
+  type vec2
+     integer x, y
+  end type vec2
+  
+  integer :: i, j, k, idx, arr(0:32*32*32)
+  type(vec2) :: pt
+  
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker private(pt)
+     do j = 0, 31
+        pt%x = ieor(i, j * 3)
+        pt%y = ior(i, j * 5)
+        
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt%x * k
+        end do
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt%y * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t12
+
+
+! Test of worker-private variables declared on loop directive, broadcasting
+! to vector-partitioned mode.  Array worker variable.
+
+subroutine t13()
+  integer :: i, j, k, idx, arr(0:32*32*32), pt(2)
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker private(pt)
+     do j = 0, 31
+        pt(1) = ieor(i, j * 3)
+        pt(2) = ior(i, j * 5)
+        
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt(1) * k
+        end do
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt(2) * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t13
+
+
+! Test of gang-private variables declared on the parallel directive.
+
+subroutine t14()
+  use openacc
+  integer :: x = 5
+  integer, parameter :: n = 32
+  integer :: arr(n)
+
+  do i = 1, n
+    arr(i) = 3
+  end do
+
+  !$acc parallel private(x) copy(arr) num_gangs(n) num_workers(8) vector_length(32)
+    !$acc loop gang(static:1)
+    do i = 1, n
+      x = i * 2;
+    end do
+
+   !$acc loop gang(static:1)
+    do i = 1, n
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) x = i * 2
+      arr(i) = arr(i) + x
+    end do
+  !$acc end parallel
+
+  do i = 1, n
+    if (arr(i) .ne. (3 + i * 2)) call abort
+  end do
+
+end subroutine t14
+
+
+program main
+  call t1()
+  call t2()
+  call t3()
+  call t4()
+  call t5()
+  call t6()
+  call t7()
+  call t8()
+  call t9()
+  call t10()
+  call t11()
+  call t12()
+  call t13()
+  call t14()
+end program main


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [gomp4] Update OpenACC test cases
  2016-03-30 15:55   ` Thomas Schwinge
@ 2016-04-04 10:40     ` Thomas Schwinge
  2016-04-12 11:08       ` Merge libgomp.oacc-c-c++-common/loop-reduction-*.c into libgomp.oacc-c-c++-common/reduction-7.c (was: [gomp4] Update OpenACC test cases) Thomas Schwinge
  0 siblings, 1 reply; 5+ messages in thread
From: Thomas Schwinge @ 2016-04-04 10:40 UTC (permalink / raw)
  To: gcc-patches
  Cc: Jakub Jelinek, Julian Brown, Chung-Lin Tang, Cesar Philippidis,
	James Norris, Tom de Vries, Nathan Sidwell

Hi!

On Wed, 30 Mar 2016 17:14:35 +0200, I wrote:
> On Wed, 30 Mar 2016 16:13:32 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Wed, Mar 30, 2016 at 04:06:30PM +0200, Thomas Schwinge wrote:
> > > This is to integrate into trunk a large amount of the test case updates
> > > that we have accumulated on gomp-4_0-branch.  OK to commit?

> Committed in r234575, as posted:

In r234713 merged into gomp-4_0-branch, with additional (cleanup) changes
(conceptually to be applied before the r234575 trunk merge itself):

commit 182c875b73f6a69ad62239e8510c17b566acfd9b
Merge: e7e9a60 6a5dcab
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Apr 4 10:09:03 2016 +0000

    svn merge -r 234572:234575 svn+ssh://gcc.gnu.org/svn/gcc/trunk
    
    Additional changes:
    
    	gcc/
    	* c-c++-common/goacc/firstprivate.c: Remove file, moving its
    	content into...
    	* c-c++-common/goacc/clauses-fail.c: ... this file, and...
    	* c-c++-common/goacc/parallel-1.c: ... this file.
    	* c-c++-common/goacc/host_data-3.c: Remove file.
    	* c-c++-common/goacc/host_data-4.c: Remove file, moving its
    	content into...
    	* c-c++-common/goacc/host_data-2.c: ... this file.
    	* c-c++-common/goacc/kernels-loop-acc-loop-ptr-it.c: Remove file,
    	moving its content into...
    	* c-c++-common/goacc/kernels-1.c: ... this file.
    	* c-c++-common/goacc/loop-2.c: Rename to...
    	* c-c++-common/goacc/loop-2-parallel.c: ... this file.
    	* c-c++-common/goacc/loop-3.c: Remove (invalid) nesting tests
    	covered elsewhere.
    	* c-c++-common/goacc/loop-4.c: Rename to...
    	* c-c++-common/goacc/loop-2-kernels.c: ... this file.
    	* c-c++-common/goacc/loop-nest-1.c: Remove file.
    	* c-c++-common/goacc/loop-tile-k1.c: Remove file, moving its
    	content into...
    	* c-c++-common/goacc/tile.c: ... this file.
    	* c-c++-common/goacc/loop-tile-p1.c: Remove file, moving its
    	content into...
    	* c-c++-common/goacc/tile.c: ... this file.
    	* c-c++-common/goacc/non-routine.c: Remove file, moving its
    	content into...
    	* c-c++-common/goacc/nesting-fail-1.c: ... this file.
    	* c-c++-common/goacc/parallel-empty.c: Remove file, moving its
    	content into...
    	* c-c++-common/goacc/parallel-1.c: ... this file.
    	* c-c++-common/goacc/parallel-eternal.c: Remove file, moving its
    	content into...
    	* c-c++-common/goacc/parallel-1.c: ... this file.
    	* c-c++-common/goacc/parallel-noreturn.c: Remove file, moving its
    	content into...
    	* c-c++-common/goacc/parallel-1.c: ... this file.
    	* c-c++-common/goacc/routine-6.c: Remove file, moving its
    	content into...
    	* c-c++-common/goacc/routine-3.c: ... this file.
    	* c-c++-common/goacc/routine-7.c: Remove file, moving its
    	content into...
    	* c-c++-common/goacc/routine-4.c: ... this file.
    	* g++.dg/goacc/template-reduction.C: Remove file.
    	* gfortran.dg/goacc/parallel-tree.f95: Use dg-warning directives
    	instead of specifying the -w compiler option.
    	libgomp/
    	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Don't XFAIL.
    	* testsuite/libgomp.oacc-c-c++-common/routine-1.c: Extend testing
    	to cover more parallelism levels, and asynchronous kernel
    	launches.
    	* testsuite/libgomp.oacc-c-c++-common/data-3.c: Remove OpenACC
    	present directives.
    	* testsuite/libgomp.oacc-fortran/collapse-5.f90: Remove OpenACC
    	copy directives.
    	* testsuite/libgomp.oacc-fortran/collapse-6.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/collapse-7.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/collapse-8.f90: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/firstprivate-3.c: Remove
    	file.
    	* testsuite/libgomp.oacc-c-c++-common/firstprivate-4.c: Remove
    	file, moving its content into...
    	* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: ... this
    	file.
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-par-gang-1.c:
    	Remove file, moving its content into...
    	* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: ... this
    	file.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-4.c: Rename
    	file to...
    	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-clauses.c:
    	... this file.  Clean up dg-* directives.
    	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Use
    	dg-warning directives instead of specifying the -w compiler
    	option.
    	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/routine-4.c: Change
    	parallelism used instead of specifying the -w compiler option.
    	* testsuite/libgomp.oacc-fortran/routine-7.f90: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c:
    	Merge this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c:
    	... this file into...
    	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: ... this new
    	file.  Use dg-warning directives instead of specifying the -w
    	compiler option.
    	* testsuite/libgomp.oacc-c-c++-common/vec-partn-1.c: Merge this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/vec-partn-2.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/vec-partn-4.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/vec-partn-5.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/vec-partn-6.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/vec-single-1.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/vec-single-2.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/vec-single-3.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/vec-single-4.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/vec-single-5.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/vec-single-6.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/vector-broadcast.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/worker-partn-1.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/worker-partn-2.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/worker-partn-3.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/worker-partn-4.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/worker-partn-5.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/worker-partn-6.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/worker-partn-7.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/worker-partn-8.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/worker-single-1.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/worker-single-1a.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/worker-single-2.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/worker-single-3.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/worker-single-4.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/worker-single-5.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/worker-single-6.c: ... this
    	file, and...
    	* testsuite/libgomp.oacc-c-c++-common/mode-transitions.c: ... this
    	new file.  Use dg-warning directives instead of specifying the -w
    	compiler option.
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-local-gang-1.c:
    	Merge this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-1.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-2.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-3.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-4.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-5.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-1.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-2.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-3.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-4.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-5.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-6.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-vector-1.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-vector-2.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-1.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-2.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-3.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-4.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-5.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-6.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-7.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-par-gang-2.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/private-vars-par-gang-3.c:
    	... this file into...
    	* testsuite/libgomp.oacc-c-c++-common/private-variables.c:
    	... this new file.  Use dg-warning directives instead of
    	specifying the -w compiler option.
    	* testsuite/libgomp.oacc-fortran/private-vars-loop-gang-1.f90:
    	Merge this file, and...
    	* testsuite/libgomp.oacc-fortran/private-vars-loop-gang-2.f90:
    	... this file, and...
    	* testsuite/libgomp.oacc-fortran/private-vars-loop-gang-3.f90:
    	... this file, and...
    	* testsuite/libgomp.oacc-fortran/private-vars-loop-gang-6.f90:
    	... this file, and...
    	* testsuite/libgomp.oacc-fortran/private-vars-loop-vector-1.f90:
    	... this file, and...
    	* testsuite/libgomp.oacc-fortran/private-vars-loop-vector-2.f90:
    	... this file, and...
    	* testsuite/libgomp.oacc-fortran/private-vars-loop-worker-1.f90:
    	... this file, and...
    	* testsuite/libgomp.oacc-fortran/private-vars-loop-worker-2.f90:
    	... this file, and...
    	* testsuite/libgomp.oacc-fortran/private-vars-loop-worker-3.f90:
    	... this file, and...
    	* testsuite/libgomp.oacc-fortran/private-vars-loop-worker-4.f90:
    	... this file, and...
    	* testsuite/libgomp.oacc-fortran/private-vars-loop-worker-5.f90:
    	... this file, and...
    	* testsuite/libgomp.oacc-fortran/private-vars-loop-worker-6.f90:
    	... this file, and...
    	* testsuite/libgomp.oacc-fortran/private-vars-loop-worker-7.f90:
    	... this file, and...
    	* testsuite/libgomp.oacc-fortran/private-vars-par-gang-2.f90:
    	... this file into...
    	* testsuite/libgomp.oacc-fortran/private-variables.f90: ... this
    	new file.  Use dg-warning directives instead of specifying the -w
    	compiler option.
    	* testsuite/libgomp.oacc-c-c++-common/routine-2.c: Remove file.
    	* testsuite/libgomp.oacc-c-c++-common/routine-vec-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/routine-work-1.c: Likewise.
    	* testsuite/libgomp.oacc-fortran/update-1-2.f90: Likewise.
    
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@234713 138bc75d-0d04-0410-961f-82ee72b054a4

 gcc/testsuite/ChangeLog                            |   49 +
 gcc/testsuite/ChangeLog.gomp                       |   48 +
 gcc/testsuite/c-c++-common/goacc/clauses-fail.c    |   12 +
 .../c-c++-common/goacc/combined-directives.c       |    5 +-
 gcc/testsuite/c-c++-common/goacc/firstprivate.c    |    9 -
 gcc/testsuite/c-c++-common/goacc/host_data-1.c     |   15 +-
 gcc/testsuite/c-c++-common/goacc/host_data-2.c     |   61 +-
 gcc/testsuite/c-c++-common/goacc/host_data-3.c     |   18 -
 gcc/testsuite/c-c++-common/goacc/host_data-4.c     |   14 -
 gcc/testsuite/c-c++-common/goacc/host_data-5.c     |   23 -
 gcc/testsuite/c-c++-common/goacc/host_data-6.c     |   25 -
 gcc/testsuite/c-c++-common/goacc/if-clause-2.c     |    2 -
 gcc/testsuite/c-c++-common/goacc/kernels-1.c       |   45 +
 gcc/testsuite/c-c++-common/goacc/kernels-empty.c   |    6 -
 gcc/testsuite/c-c++-common/goacc/kernels-eternal.c |   11 -
 .../goacc/kernels-loop-acc-loop-ptr-it.c           |   14 -
 .../c-c++-common/goacc/kernels-noreturn.c          |   12 -
 .../goacc/{loop-4.c => loop-2-kernels.c}           |   10 +-
 .../goacc/{loop-2.c => loop-2-parallel.c}          |   10 +-
 gcc/testsuite/c-c++-common/goacc/loop-3.c          |   74 --
 gcc/testsuite/c-c++-common/goacc/loop-clauses.c    |    2 -
 gcc/testsuite/c-c++-common/goacc/loop-nest-1.c     |   16 -
 gcc/testsuite/c-c++-common/goacc/loop-tile-k1.c    |  132 ---
 gcc/testsuite/c-c++-common/goacc/loop-tile-p1.c    |  128 ---
 gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c  |   18 +-
 gcc/testsuite/c-c++-common/goacc/non-routine.c     |   16 -
 gcc/testsuite/c-c++-common/goacc/parallel-1.c      |   38 +
 gcc/testsuite/c-c++-common/goacc/parallel-empty.c  |    6 -
 .../c-c++-common/goacc/parallel-eternal.c          |   11 -
 .../c-c++-common/goacc/parallel-noreturn.c         |   12 -
 gcc/testsuite/c-c++-common/goacc/reduction-1.c     |   37 +-
 gcc/testsuite/c-c++-common/goacc/reduction-2.c     |   25 +-
 gcc/testsuite/c-c++-common/goacc/reduction-3.c     |   25 +-
 gcc/testsuite/c-c++-common/goacc/reduction-4.c     |   35 +-
 gcc/testsuite/c-c++-common/goacc/routine-3.c       |  110 +-
 gcc/testsuite/c-c++-common/goacc/routine-4.c       |   73 ++
 gcc/testsuite/c-c++-common/goacc/routine-6.c       |  120 --
 gcc/testsuite/c-c++-common/goacc/routine-7.c       |   94 --
 gcc/testsuite/c-c++-common/goacc/tile.c            |  258 ++++-
 gcc/testsuite/c-c++-common/goacc/use_device-1.c    |   14 -
 gcc/testsuite/g++.dg/goacc/template-reduction.C    |  100 --
 gcc/testsuite/g++.dg/goacc/template.C              |   73 +-
 .../gfortran.dg/goacc/combined-directives.f90      |   20 +-
 gcc/testsuite/gfortran.dg/goacc/loop-1.f95         |   15 +-
 gcc/testsuite/gfortran.dg/goacc/loop-5.f95         |    3 -
 gcc/testsuite/gfortran.dg/goacc/loop-6.f95         |    8 -
 gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90    |    1 -
 gcc/testsuite/gfortran.dg/goacc/parallel-tree.f95  |    7 +-
 libgomp/ChangeLog                                  |   68 ++
 libgomp/ChangeLog.gomp                             |  209 ++++
 .../libgomp.oacc-c++/template-reduction.C          |   12 +-
 .../libgomp.oacc-c-c++-common/asyncwait-1.c        |    2 -
 .../combined-directives-1.c                        |    2 +
 .../testsuite/libgomp.oacc-c-c++-common/data-3.c   |   18 +-
 ...{kernels-2.c => data-clauses-kernels-ipa-pta.c} |    2 +-
 .../{kernels-1.c => data-clauses-kernels.c}        |    4 +-
 ...arallel-2.c => data-clauses-parallel-ipa-pta.c} |    2 +-
 .../data-clauses-parallel.c                        |    2 +
 .../libgomp.oacc-c-c++-common/data-clauses.h       |   88 +-
 .../libgomp.oacc-c-c++-common/firstprivate-1.c     |  116 +-
 .../libgomp.oacc-c-c++-common/firstprivate-2.c     |   29 -
 .../libgomp.oacc-c-c++-common/firstprivate-3.c     |   31 -
 .../libgomp.oacc-c-c++-common/firstprivate-4.c     |   54 -
 .../{kernels-loop-4.c => kernels-loop-clauses.c}   |    2 -
 .../libgomp.oacc-c-c++-common/loop-auto-1.c        |    4 +-
 .../loop-reduction-gang-np-1.c                     |   45 -
 .../loop-reduction-gv-np-1.c                       |   30 -
 .../loop-reduction-gw-np-1.c                       |   30 -
 .../loop-reduction-gwv-np-1.c                      |   28 -
 .../loop-reduction-gwv-np-2.c                      |   34 -
 .../loop-reduction-gwv-np-3.c                      |   33 -
 .../loop-reduction-gwv-np-4.c                      |   55 -
 .../loop-reduction-vector-p-1.c                    |   43 -
 .../loop-reduction-vector-p-2.c                    |   41 -
 .../loop-reduction-worker-p-1.c                    |   43 -
 .../loop-reduction-wv-p-1.c                        |   41 -
 .../loop-reduction-wv-p-2.c                        |   45 -
 .../loop-reduction-wv-p-3.c                        |   38 -
 .../testsuite/libgomp.oacc-c-c++-common/loop-w-1.c |    2 +-
 .../libgomp.oacc-c-c++-common/mode-transitions.c   | 1186 ++++++++++++++++++++
 .../testsuite/libgomp.oacc-c-c++-common/nested-2.c |    2 +
 .../libgomp.oacc-c-c++-common/parallel-1.c         |    6 -
 .../libgomp.oacc-c-c++-common/private-variables.c  |  966 ++++++++++++++++
 .../private-vars-local-gang-1.c                    |   41 -
 .../private-vars-local-worker-1.c                  |   54 -
 .../private-vars-local-worker-2.c                  |   49 -
 .../private-vars-local-worker-3.c                  |   55 -
 .../private-vars-local-worker-4.c                  |   58 -
 .../private-vars-local-worker-5.c                  |   51 -
 .../private-vars-loop-gang-1.c                     |   29 -
 .../private-vars-loop-gang-2.c                     |   33 -
 .../private-vars-loop-gang-3.c                     |   33 -
 .../private-vars-loop-gang-4.c                     |   37 -
 .../private-vars-loop-gang-5.c                     |   34 -
 .../private-vars-loop-gang-6.c                     |   42 -
 .../private-vars-loop-vector-1.c                   |   51 -
 .../private-vars-loop-vector-2.c                   |   46 -
 .../private-vars-loop-worker-1.c                   |   38 -
 .../private-vars-loop-worker-2.c                   |   43 -
 .../private-vars-loop-worker-3.c                   |   54 -
 .../private-vars-loop-worker-4.c                   |   49 -
 .../private-vars-loop-worker-5.c                   |   51 -
 .../private-vars-loop-worker-6.c                   |   55 -
 .../private-vars-loop-worker-7.c                   |   54 -
 .../private-vars-par-gang-1.c                      |   27 -
 .../private-vars-par-gang-2.c                      |   38 -
 .../private-vars-par-gang-3.c                      |   35 -
 .../libgomp.oacc-c-c++-common/reduction-7.c        |  488 ++++++++
 .../libgomp.oacc-c-c++-common/routine-1.c          |   90 +-
 .../libgomp.oacc-c-c++-common/routine-2.c          |   42 -
 .../libgomp.oacc-c-c++-common/routine-4.c          |    8 +-
 .../libgomp.oacc-c-c++-common/routine-g-1.c        |    3 +-
 .../libgomp.oacc-c-c++-common/routine-vec-1.c      |   47 -
 .../libgomp.oacc-c-c++-common/routine-w-1.c        |    2 +-
 .../libgomp.oacc-c-c++-common/routine-work-1.c     |   55 -
 .../libgomp.oacc-c-c++-common/update-1-2.c         |  361 ------
 .../libgomp.oacc-c-c++-common/vec-partn-1.c        |   30 -
 .../libgomp.oacc-c-c++-common/vec-partn-2.c        |   43 -
 .../libgomp.oacc-c-c++-common/vec-partn-3.c        |   54 -
 .../libgomp.oacc-c-c++-common/vec-partn-4.c        |   46 -
 .../libgomp.oacc-c-c++-common/vec-partn-5.c        |   42 -
 .../libgomp.oacc-c-c++-common/vec-partn-6.c        |   77 --
 .../libgomp.oacc-c-c++-common/vec-single-1.c       |   17 -
 .../libgomp.oacc-c-c++-common/vec-single-2.c       |   34 -
 .../libgomp.oacc-c-c++-common/vec-single-3.c       |   37 -
 .../libgomp.oacc-c-c++-common/vec-single-4.c       |   42 -
 .../libgomp.oacc-c-c++-common/vec-single-5.c       |   45 -
 .../libgomp.oacc-c-c++-common/vec-single-6.c       |   51 -
 .../libgomp.oacc-c-c++-common/vector-broadcast.c   |   38 -
 .../libgomp.oacc-c-c++-common/worker-partn-1.c     |   32 -
 .../libgomp.oacc-c-c++-common/worker-partn-2.c     |   44 -
 .../libgomp.oacc-c-c++-common/worker-partn-3.c     |   54 -
 .../libgomp.oacc-c-c++-common/worker-partn-4.c     |   56 -
 .../libgomp.oacc-c-c++-common/worker-partn-5.c     |   47 -
 .../libgomp.oacc-c-c++-common/worker-partn-6.c     |   45 -
 .../libgomp.oacc-c-c++-common/worker-partn-7.c     |   90 --
 .../libgomp.oacc-c-c++-common/worker-partn-8.c     |   51 -
 .../libgomp.oacc-c-c++-common/worker-single-1.c    |   27 -
 .../libgomp.oacc-c-c++-common/worker-single-1a.c   |   30 -
 .../libgomp.oacc-c-c++-common/worker-single-2.c    |   30 -
 .../libgomp.oacc-c-c++-common/worker-single-3.c    |   35 -
 .../libgomp.oacc-c-c++-common/worker-single-4.c    |   35 -
 .../libgomp.oacc-c-c++-common/worker-single-5.c    |   51 -
 .../libgomp.oacc-c-c++-common/worker-single-6.c    |   50 -
 .../testsuite/libgomp.oacc-fortran/collapse-5.f90  |    2 +-
 .../testsuite/libgomp.oacc-fortran/collapse-6.f90  |    2 +-
 .../testsuite/libgomp.oacc-fortran/collapse-7.f90  |    2 +-
 .../testsuite/libgomp.oacc-fortran/collapse-8.f90  |    2 +-
 .../libgomp.oacc-fortran/private-variables.f90     |  552 +++++++++
 .../private-vars-loop-gang-1.f90                   |   23 -
 .../private-vars-loop-gang-2.f90                   |   28 -
 .../private-vars-loop-gang-3.f90                   |   28 -
 .../private-vars-loop-gang-6.f90                   |   36 -
 .../private-vars-loop-vector-1.f90                 |   39 -
 .../private-vars-loop-vector-2.f90                 |   36 -
 .../private-vars-loop-worker-1.f90                 |   27 -
 .../private-vars-loop-worker-2.f90                 |   34 -
 .../private-vars-loop-worker-3.f90                 |   46 -
 .../private-vars-loop-worker-4.f90                 |   43 -
 .../private-vars-loop-worker-5.f90                 |   46 -
 .../private-vars-loop-worker-6.f90                 |   47 -
 .../private-vars-loop-worker-7.f90                 |   42 -
 .../private-vars-par-gang-2.f90                    |   32 -
 .../testsuite/libgomp.oacc-fortran/routine-7.f90   |    4 +-
 .../testsuite/libgomp.oacc-fortran/update-1-2.f90  |  239 ----
 165 files changed, 4533 insertions(+), 5336 deletions(-)

diff --git gcc/testsuite/ChangeLog gcc/testsuite/ChangeLog
index 658e6c5..f4a73a7 100644
--- gcc/testsuite/ChangeLog
+++ gcc/testsuite/ChangeLog
@@ -1,3 +1,52 @@
+2016-03-30  Thomas Schwinge  <thomas@codesourcery.com>
+	    Julian Brown  <julian@codesourcery.com>
+	    Chung-Lin Tang  <cltang@codesourcery.com>
+	    Cesar Philippidis  <cesar@codesourcery.com>
+	    James Norris  <jnorris@codesourcery.com>
+	    Tom de Vries  <tom@codesourcery.com>
+	    Nathan Sidwell  <nathan@codesourcery.com>
+
+	* c-c++-common/goacc/combined-directives.c: Clean up dg-*
+	directives.
+	* c-c++-common/goacc/loop-clauses.c: Likewise.
+	* g++.dg/goacc/template.C: Likewise.
+	* gfortran.dg/goacc/combined-directives.f90: Likewise.
+	* gfortran.dg/goacc/loop-1.f95: Likewise.
+	* gfortran.dg/goacc/loop-5.f95: Likewise.
+	* gfortran.dg/goacc/loop-6.f95: Likewise.
+	* gfortran.dg/goacc/loop-tree-1.f90: Likewise.
+	* c-c++-common/goacc-gomp/nesting-1.c: Update.
+	* c-c++-common/goacc-gomp/nesting-fail-1.c: Likewise.
+	* c-c++-common/goacc/clauses-fail.c: Likewise.
+	* c-c++-common/goacc/parallel-1.c: Likewise.
+	* c-c++-common/goacc/reduction-1.c: Likewise.
+	* c-c++-common/goacc/reduction-2.c: Likewise.
+	* c-c++-common/goacc/reduction-3.c: Likewise.
+	* c-c++-common/goacc/reduction-4.c: Likewise.
+	* c-c++-common/goacc/routine-3.c: Likewise.
+	* c-c++-common/goacc/routine-4.c: Likewise.
+	* c-c++-common/goacc/routine-5.c: Likewise.
+	* c-c++-common/goacc/tile.c: Likewise.
+	* g++.dg/goacc/template.C: Likewise.
+	* gfortran.dg/goacc/combined-directives.f90: Likewise.
+	* c-c++-common/goacc/nesting-1.c: Move dg-error test cases into...
+	* c-c++-common/goacc/nesting-fail-1.c: ... this file.  Update.
+	* c-c++-common/goacc/kernels-1.c: Update.  Incorporate...
+	* c-c++-common/goacc/kernels-empty.c: ... this file, and...
+	* c-c++-common/goacc/kernels-eternal.c: ... this file, and...
+	* c-c++-common/goacc/kernels-noreturn.c: ... this file.
+	* c-c++-common/goacc/host_data-1.c: New file.  Incorporate...
+	* c-c++-common/goacc/use_device-1.c: ... this file.
+	* c-c++-common/goacc/host_data-2.c: New file.  Incorporate...
+	* c-c++-common/goacc/host_data-5.c: ... this file, and...
+	* c-c++-common/goacc/host_data-6.c: ... this file.
+	* c-c++-common/goacc/loop-2-kernels.c: New file.
+	* c-c++-common/goacc/loop-2-parallel.c: Likewise.
+	* c-c++-common/goacc/loop-3.c: Likewise.
+	* g++.dg/goacc/reference.C: Likewise.
+	* g++.dg/goacc/routine-1.C: Likewise.
+	* g++.dg/goacc/routine-2.C: Likewise.
+
 2016-03-30  Richard Biener  <rguenther@suse.de>
 
 	PR middle-end/70450
diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index 4e5f866..d845a6c 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -1,3 +1,51 @@
+2016-04-04  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* c-c++-common/goacc/firstprivate.c: Remove file, moving its
+	content into...
+	* c-c++-common/goacc/clauses-fail.c: ... this file, and...
+	* c-c++-common/goacc/parallel-1.c: ... this file.
+	* c-c++-common/goacc/host_data-3.c: Remove file.
+	* c-c++-common/goacc/host_data-4.c: Remove file, moving its
+	content into...
+	* c-c++-common/goacc/host_data-2.c: ... this file.
+	* c-c++-common/goacc/kernels-loop-acc-loop-ptr-it.c: Remove file,
+	moving its content into...
+	* c-c++-common/goacc/kernels-1.c: ... this file.
+	* c-c++-common/goacc/loop-2.c: Rename to...
+	* c-c++-common/goacc/loop-2-parallel.c: ... this file.
+	* c-c++-common/goacc/loop-3.c: Remove (invalid) nesting tests
+	covered elsewhere.
+	* c-c++-common/goacc/loop-4.c: Rename to...
+	* c-c++-common/goacc/loop-2-kernels.c: ... this file.
+	* c-c++-common/goacc/loop-nest-1.c: Remove file.
+	* c-c++-common/goacc/loop-tile-k1.c: Remove file, moving its
+	content into...
+	* c-c++-common/goacc/tile.c: ... this file.
+	* c-c++-common/goacc/loop-tile-p1.c: Remove file, moving its
+	content into...
+	* c-c++-common/goacc/tile.c: ... this file.
+	* c-c++-common/goacc/non-routine.c: Remove file, moving its
+	content into...
+	* c-c++-common/goacc/nesting-fail-1.c: ... this file.
+	* c-c++-common/goacc/parallel-empty.c: Remove file, moving its
+	content into...
+	* c-c++-common/goacc/parallel-1.c: ... this file.
+	* c-c++-common/goacc/parallel-eternal.c: Remove file, moving its
+	content into...
+	* c-c++-common/goacc/parallel-1.c: ... this file.
+	* c-c++-common/goacc/parallel-noreturn.c: Remove file, moving its
+	content into...
+	* c-c++-common/goacc/parallel-1.c: ... this file.
+	* c-c++-common/goacc/routine-6.c: Remove file, moving its
+	content into...
+	* c-c++-common/goacc/routine-3.c: ... this file.
+	* c-c++-common/goacc/routine-7.c: Remove file, moving its
+	content into...
+	* c-c++-common/goacc/routine-4.c: ... this file.
+	* g++.dg/goacc/template-reduction.C: Remove file.
+	* gfortran.dg/goacc/parallel-tree.f95: Use dg-warning directives
+	instead of specifying the -w compiler option.
+
 2016-03-11  Cesar Philippidis  <cesar@codesourcery.com>
 
 	* c-c++-common/goacc/combined-directives-2.c: New test.
diff --git gcc/testsuite/c-c++-common/goacc/clauses-fail.c gcc/testsuite/c-c++-common/goacc/clauses-fail.c
index 661d364..853d010 100644
--- gcc/testsuite/c-c++-common/goacc/clauses-fail.c
+++ gcc/testsuite/c-c++-common/goacc/clauses-fail.c
@@ -1,3 +1,5 @@
+/* Miscellaneous tests where clause parsing is expected to fail.  */
+
 void
 f (void)
 {
@@ -17,3 +19,13 @@ f (void)
   for (i = 0; i < 2; ++i)
     ;
 }
+
+
+void
+f2 (void)
+{
+  int a, b[100];
+
+#pragma acc parallel firstprivate (b[10:20]) /* { dg-error "expected ... before ... token" } */
+  ;
+}
diff --git gcc/testsuite/c-c++-common/goacc/combined-directives.c gcc/testsuite/c-c++-common/goacc/combined-directives.c
index af2cfaa..2ef5b53 100644
--- gcc/testsuite/c-c++-common/goacc/combined-directives.c
+++ gcc/testsuite/c-c++-common/goacc/combined-directives.c
@@ -1,7 +1,8 @@
-// { dg-do compile }
-// { dg-options "-fopenacc -fdump-tree-gimple" }
+// { dg-additional-options "-fdump-tree-gimple" }
 
 // TODO
+// Remove the comments from the reduction test
+// after the FE learns that reduction variables may appear in data clauses too.
 // Enable and update tree scanning for reduction clauses.
 // Add device_type clauses and tree scanning.
 
diff --git gcc/testsuite/c-c++-common/goacc/firstprivate.c gcc/testsuite/c-c++-common/goacc/firstprivate.c
deleted file mode 100644
index 22eed82..0000000
--- gcc/testsuite/c-c++-common/goacc/firstprivate.c
+++ /dev/null
@@ -1,9 +0,0 @@
-void
-foo (void)
-{
-  int a, b[100];
-#pragma acc parallel firstprivate (a, b)
-    ;
-#pragma acc parallel firstprivate (b[10:20]) /* { dg-error "expected" } */
-    ;
-}
diff --git gcc/testsuite/c-c++-common/goacc/host_data-1.c gcc/testsuite/c-c++-common/goacc/host_data-1.c
index a8922df..0c7a857 100644
--- gcc/testsuite/c-c++-common/goacc/host_data-1.c
+++ gcc/testsuite/c-c++-common/goacc/host_data-1.c
@@ -1,5 +1,4 @@
 /* Test valid use of host_data directive.  */
-/* { dg-do compile } */
 
 int v1[3][3];
 
@@ -9,3 +8,17 @@ f (void)
 #pragma acc host_data use_device(v1)
   ;
 }
+
+
+void bar (float *, float *);
+
+void
+foo (float *x, float *y)
+{
+  int n = 1 << 10;
+#pragma acc data create(x[0:n]) copyout(y[0:n])
+  {
+#pragma acc host_data use_device(x,y)
+    bar (x, y);
+  }
+}
diff --git gcc/testsuite/c-c++-common/goacc/host_data-2.c gcc/testsuite/c-c++-common/goacc/host_data-2.c
index eb420ad..bdce424 100644
--- gcc/testsuite/c-c++-common/goacc/host_data-2.c
+++ gcc/testsuite/c-c++-common/goacc/host_data-2.c
@@ -1,14 +1,14 @@
 /* Test invalid use of host_data directive.  */
-/* { dg-do compile } */
 
 int v0;
-#pragma acc host_data use_device(v0) /* { dg-error "expected" } */
+#pragma acc host_data use_device(v0) /* { dg-error "expected declaration specifiers before" } */
+
 
 void
 f (void)
 {
   int v2 = 3;
-#pragma acc host_data copy(v2) /* { dg-error "not valid for" } */
+#pragma acc host_data copy(v2) /* { dg-error ".copy. is not valid for ..pragma acc host_data." } */
   ;
 
 #pragma acc host_data use_device(v2)
@@ -21,3 +21,58 @@ f (void)
   /* { dg-error ".use_device_ptr. variable is neither a pointer nor an array" "" { target c } 19 } */
   /* { dg-error ".use_device_ptr. variable is neither a pointer, nor an arraynor reference to pointer or array" "" { target c++ } 19 } */
 }
+
+
+void
+f2 (void)
+{
+  int x[100];
+
+#pragma acc enter data copyin (x)
+  /* Specifying an array index is not valid for host_data/use_device.  */
+#pragma acc host_data use_device (x[4]) /* { dg-error "expected '\\\)' before '\\\[' token" } */
+  ;
+#pragma acc exit data delete (x)
+}
+
+
+void
+f3 (void)
+{
+  int x[100];
+
+#pragma acc data copyin (x[25:50])
+  {
+    int *xp;
+#pragma acc host_data use_device (x)
+    {
+      /* This use of the present clause is undefined behavior for OpenACC.  */
+#pragma acc parallel present (x) copyout (xp) /* { dg-error "variable .x. declared in enclosing .host_data. region" } */
+      {
+        xp = x;
+      }
+    }
+  }
+}
+
+
+void
+f4 (void)
+{
+  int x[50];
+
+#pragma acc data copyin (x[10:30])
+  {
+    int *xp;
+#pragma acc host_data use_device (x)
+    {
+      /* Here 'x' being implicitly firstprivate for the parallel region
+	 conflicts with it being declared as use_device in the enclosing
+	 host_data region.  */
+#pragma acc parallel copyout (xp)
+      {
+        xp = x; /* { dg-error "variable .x. declared in enclosing .host_data. region" } */
+      }
+    }
+  }
+}
diff --git gcc/testsuite/c-c++-common/goacc/host_data-3.c gcc/testsuite/c-c++-common/goacc/host_data-3.c
deleted file mode 100644
index f9621c9..0000000
--- gcc/testsuite/c-c++-common/goacc/host_data-3.c
+++ /dev/null
@@ -1,18 +0,0 @@
-/* { dg-do compile } */
-
-int main (int argc, char* argv[])
-{
-  int x = 5, y;
-
-  #pragma acc enter data copyin (x)
-  /* It's not clear what attempts to use non-pointer variables "directly"
-     (rather than merely taking their address) should do in host_data regions. 
-     We choose to make it an error.  */
-  #pragma acc host_data use_device (x) /* TODO { dg-error "" } */
-  {
-    y = x;
-  }
-  #pragma acc exit data delete (x)
-
-  return y - 5;
-}
diff --git gcc/testsuite/c-c++-common/goacc/host_data-4.c gcc/testsuite/c-c++-common/goacc/host_data-4.c
deleted file mode 100644
index 3dac5f3..0000000
--- gcc/testsuite/c-c++-common/goacc/host_data-4.c
+++ /dev/null
@@ -1,14 +0,0 @@
-/* { dg-do compile } */
-
-int main (int argc, char* argv[])
-{
-  int x[100];
-
-  #pragma acc enter data copyin (x)
-  /* Specifying an array index is not valid for host_data/use_device.  */
-  #pragma acc host_data use_device (x[4]) /* { dg-error "expected '\\\)' before '\\\[' token" } */
-    ;
-  #pragma acc exit data delete (x)
-
-  return 0;
-}
diff --git gcc/testsuite/c-c++-common/goacc/host_data-5.c gcc/testsuite/c-c++-common/goacc/host_data-5.c
deleted file mode 100644
index a4206c8..0000000
--- gcc/testsuite/c-c++-common/goacc/host_data-5.c
+++ /dev/null
@@ -1,23 +0,0 @@
-/* { dg-do compile } */
-
-#define N 1024
-
-int main (int argc, char* argv[])
-{
-  int x[N];
-
-#pragma acc data copyin (x[0:N])
-  {
-    int *xp;
-#pragma acc host_data use_device (x)
-    {
-      /* This use of the present clause is undefined behavior for OpenACC.  */
-#pragma acc parallel present (x) copyout (xp) /* { dg-error "variable 'x' declared in enclosing 'host_data' region" } */
-      {
-        xp = x;
-      }
-    }
-  }
-
-  return 0;
-}
diff --git gcc/testsuite/c-c++-common/goacc/host_data-6.c gcc/testsuite/c-c++-common/goacc/host_data-6.c
deleted file mode 100644
index 8be7912..0000000
--- gcc/testsuite/c-c++-common/goacc/host_data-6.c
+++ /dev/null
@@ -1,25 +0,0 @@
-/* { dg-do compile } */
-
-#define N 1024
-
-int main (int argc, char* argv[])
-{
-  int x[N];
-
-#pragma acc data copyin (x[0:N])
-  {
-    int *xp;
-#pragma acc host_data use_device (x)
-    {
-      /* Here 'x' being implicitly firstprivate for the parallel region
-	 conflicts with it being declared as use_device in the enclosing
-	 host_data region.  */
-#pragma acc parallel copyout (xp)
-      {
-        xp = x; /* { dg-error "variable 'x' declared in enclosing 'host_data' region" } */
-      }
-    }
-  }
-
-  return 0;
-}
diff --git gcc/testsuite/c-c++-common/goacc/if-clause-2.c gcc/testsuite/c-c++-common/goacc/if-clause-2.c
index bf7d6ed..5ab8459 100644
--- gcc/testsuite/c-c++-common/goacc/if-clause-2.c
+++ gcc/testsuite/c-c++-common/goacc/if-clause-2.c
@@ -1,5 +1,3 @@
-/* { dg-additional-options "-Wall" } */
-
 void
 f (short c)
 {
diff --git gcc/testsuite/c-c++-common/goacc/kernels-1.c gcc/testsuite/c-c++-common/goacc/kernels-1.c
new file mode 100644
index 0000000..4fcf86e
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-1.c
@@ -0,0 +1,45 @@
+int
+kernels_empty (void)
+{
+#pragma acc kernels
+  ;
+
+  return 0;
+}
+
+int
+kernels_eternal (void)
+{
+#pragma acc kernels
+  {
+    while (1)
+      ;
+  }
+
+  return 0;
+}
+
+int
+kernels_noreturn (void)
+{
+#pragma acc kernels
+  __builtin_abort ();
+
+  return 0;
+}
+
+
+float b[10][15][10];
+
+void
+kernels_loop_ptr_it (void)
+{
+  float *i;
+
+#pragma acc kernels
+  {
+#pragma acc loop
+    for (i = &b[0][0][0]; i < &b[0][0][10]; i++)
+      ;
+  }
+}
diff --git gcc/testsuite/c-c++-common/goacc/kernels-empty.c gcc/testsuite/c-c++-common/goacc/kernels-empty.c
deleted file mode 100644
index e91b81c..0000000
--- gcc/testsuite/c-c++-common/goacc/kernels-empty.c
+++ /dev/null
@@ -1,6 +0,0 @@
-void
-foo (void)
-{
-#pragma acc kernels
-  ;
-}
diff --git gcc/testsuite/c-c++-common/goacc/kernels-eternal.c gcc/testsuite/c-c++-common/goacc/kernels-eternal.c
deleted file mode 100644
index edc17d2..0000000
--- gcc/testsuite/c-c++-common/goacc/kernels-eternal.c
+++ /dev/null
@@ -1,11 +0,0 @@
-int
-main (void)
-{
-#pragma acc kernels
-  {
-    while (1)
-      ;
-  }
-
-  return 0;
-}
diff --git gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop-ptr-it.c gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop-ptr-it.c
deleted file mode 100644
index d806a01..0000000
--- gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop-ptr-it.c
+++ /dev/null
@@ -1,14 +0,0 @@
-float b[10][15][10];
-
-void
-foo (void)
-{
-  float *i;
-
-#pragma acc kernels
-  {
-#pragma acc loop
-    for (i = &b[0][0][0]; i < &b[0][0][10]; i++)
-      ;
-  }
-}
diff --git gcc/testsuite/c-c++-common/goacc/kernels-noreturn.c gcc/testsuite/c-c++-common/goacc/kernels-noreturn.c
deleted file mode 100644
index 1a8cc67..0000000
--- gcc/testsuite/c-c++-common/goacc/kernels-noreturn.c
+++ /dev/null
@@ -1,12 +0,0 @@
-int
-main (void)
-{
-
-#pragma acc kernels
-  {
-    __builtin_abort ();
-  }
-
-  return 0;
-}
-
diff --git gcc/testsuite/c-c++-common/goacc/loop-4.c gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
similarity index 98%
rename from gcc/testsuite/c-c++-common/goacc/loop-4.c
rename to gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
index efc28cd..01ad32d 100644
--- gcc/testsuite/c-c++-common/goacc/loop-4.c
+++ gcc/testsuite/c-c++-common/goacc/loop-2-kernels.c
@@ -1,7 +1,4 @@
-/* { dg-do compile } */
-
-int
-main ()
+void K(void)
 {
   int i, j;
 
@@ -120,7 +117,6 @@ main ()
       { }
   }
 
-
 #pragma acc kernels loop auto
   for (i = 0; i < 10; i++)
     { }
@@ -139,7 +135,7 @@ main ()
 #pragma acc kernels loop gang(static:*)
   for (i = 0; i < 10; i++)
     { }
-  
+
 #pragma acc kernels loop worker
   for (i = 0; i < 10; i++)
     { }
@@ -190,6 +186,4 @@ main ()
 #pragma acc kernels loop vector auto // { dg-error "'auto' conflicts" "" { target c } }
   for (i = 0; i < 10; i++) // { dg-error "'auto' conflicts" "" { target c++ } }
     { }
-
-  return 0;
 }
diff --git gcc/testsuite/c-c++-common/goacc/loop-2.c gcc/testsuite/c-c++-common/goacc/loop-2-parallel.c
similarity index 98%
rename from gcc/testsuite/c-c++-common/goacc/loop-2.c
rename to gcc/testsuite/c-c++-common/goacc/loop-2-parallel.c
index 37c1dec..0ef5741 100644
--- gcc/testsuite/c-c++-common/goacc/loop-2.c
+++ gcc/testsuite/c-c++-common/goacc/loop-2-parallel.c
@@ -1,11 +1,7 @@
-/* { dg-do compile } */
-
-int
-main ()
+void P(void)
 {
   int i, j;
 
-
 #pragma acc parallel
   {
 #pragma acc loop auto
@@ -163,8 +159,4 @@ main ()
 #pragma acc parallel loop vector auto // { dg-error "'auto' conflicts" "" { target c } }
   for (i = 0; i < 10; i++) // { dg-error "'auto' conflicts" "" { target c++ } }
     { }
-
-
-  return 0;
 }
-
diff --git gcc/testsuite/c-c++-common/goacc/loop-3.c gcc/testsuite/c-c++-common/goacc/loop-3.c
index f95b907..44b65a8 100644
--- gcc/testsuite/c-c++-common/goacc/loop-3.c
+++ gcc/testsuite/c-c++-common/goacc/loop-3.c
@@ -1,7 +1,3 @@
-/* { dg-do compile } */
-/* { dg-additional-options "-fmax-errors=200" } */
-
-
 void par1 (void)
 {
   int i, j;
@@ -35,44 +31,6 @@ void par1 (void)
    }
 }
 
-void k2 (void)
-{
-  int i, j;
-
-#pragma acc kernels loop gang
-  for (i = 0; i < 10; i++)
-    {
-#pragma acc kernels loop gang // { dg-error "OpenACC construct inside of non-OpenACC region" }
-      for (j = 1; j < 10; j++)
-	{ }
-    }
-
-#pragma acc kernels loop worker
-  for (i = 0; i < 10; i++)
-    {
-#pragma acc kernels loop worker // { dg-error "OpenACC construct inside of non-OpenACC region" }
-      for (j = 1; j < 10; j++)
-	{ }
-#pragma acc kernels loop gang // { dg-error "OpenACC construct inside of non-OpenACC region" }
-      for (j = 1; j < 10; j++)
-	{ }
-    }
-  
-#pragma acc kernels loop vector
-  for (i = 0; i < 10; i++)
-    {
-#pragma acc kernels loop vector // { dg-error "OpenACC construct inside of non-OpenACC region" }
-      for (j = 1; j < 10; j++)
-	{ }
-#pragma acc kernels loop worker // { dg-error "OpenACC construct inside of non-OpenACC region" }
-      for (j = 1; j < 10; j++)
-	{ }
-#pragma acc kernels loop gang // { dg-error "OpenACC construct inside of non-OpenACC region" }
-      for (j = 1; j < 10; j++)
-	{ }
-    }
-}
-
 void p2 (void)
 {
   int i, j;
@@ -84,30 +42,12 @@ void p2 (void)
   for (i = 0; i < 10; i++) // { dg-error "argument not permitted" "" { target c++ } }
     { }
 
-#pragma acc parallel loop gang
-  for (i = 0; i < 10; i++)
-    {
-#pragma acc parallel loop gang // { dg-error "OpenACC construct inside of non-OpenACC region" }
-    for (j = 1; j < 10; j++)
-      { }
-    }
-
 #pragma acc parallel loop worker(5) // { dg-error "argument not permitted" "" { target c } }
   for (i = 0; i < 10; i++) // { dg-error "argument not permitted" "" { target c++ } }
     { }
 #pragma acc parallel loop worker(num:5) // { dg-error "argument not permitted" "" { target c } }
   for (i = 0; i < 10; i++) // { dg-error "argument not permitted" "" { target c++ } }
     { }
-#pragma acc parallel loop worker
-  for (i = 0; i < 10; i++)
-    {
-#pragma acc parallel loop worker // { dg-error "OpenACC construct inside of non-OpenACC region" }
-      for (j = 1; j < 10; j++)
-	{ }
-#pragma acc parallel loop gang // { dg-error "OpenACC construct inside of non-OpenACC region" }
-      for (j = 1; j < 10; j++)
-	{ }
-    }
 
 #pragma acc parallel loop vector(5) // { dg-error "argument not permitted" "" { target c } }
   for (i = 0; i < 10; i++) // { dg-error "argument not permitted" "" { target c++ } }
@@ -115,18 +55,4 @@ void p2 (void)
 #pragma acc parallel loop vector(length:5) // { dg-error "argument not permitted" "" { target c } }
   for (i = 0; i < 10; i++) // { dg-error "argument not permitted" "" { target c++ } }
     { }
-#pragma acc parallel loop vector
-  for (i = 0; i < 10; i++)
-    {
-#pragma acc parallel loop vector // { dg-error "OpenACC construct inside of non-OpenACC region" }
-      for (j = 1; j < 10; j++)
-	{ }
-#pragma acc parallel loop worker // { dg-error "OpenACC construct inside of non-OpenACC region" }
-      for (j = 1; j < 10; j++)
-	{ }
-#pragma acc parallel loop gang // { dg-error "OpenACC construct inside of non-OpenACC region" }
-      for (j = 1; j < 10; j++)
-	{ }
-    }
 }
-
diff --git gcc/testsuite/c-c++-common/goacc/loop-clauses.c gcc/testsuite/c-c++-common/goacc/loop-clauses.c
index 4449776..f3c7207 100644
--- gcc/testsuite/c-c++-common/goacc/loop-clauses.c
+++ gcc/testsuite/c-c++-common/goacc/loop-clauses.c
@@ -1,5 +1,3 @@
-/* { dg-do compile } */
-
 int
 main ()
 {
diff --git gcc/testsuite/c-c++-common/goacc/loop-nest-1.c gcc/testsuite/c-c++-common/goacc/loop-nest-1.c
deleted file mode 100644
index d498d06..0000000
--- gcc/testsuite/c-c++-common/goacc/loop-nest-1.c
+++ /dev/null
@@ -1,16 +0,0 @@
-/* { dg-do compile } */
-
-int
-main ()
-{
-  int i, j;
-#pragma acc kernels loop gang
-  for (i = 0; i < 10; i++)
-    {
-#pragma acc kernels loop gang // { dg-error "OpenACC construct inside of" "" }
-      for (i = 0; i < 10; i++)
-	{ }
-    }
-
-  return 0;
-}
diff --git gcc/testsuite/c-c++-common/goacc/loop-tile-k1.c gcc/testsuite/c-c++-common/goacc/loop-tile-k1.c
deleted file mode 100644
index 45cbe2d..0000000
--- gcc/testsuite/c-c++-common/goacc/loop-tile-k1.c
+++ /dev/null
@@ -1,132 +0,0 @@
-/* { dg-do compile } */
-/* { dg-additional-options "-fmax-errors=200" } */
-
-void
-kern (void)
-{
-  int i, j;
-
-#pragma acc kernels
-  {
-#pragma acc loop tile // { dg-error "expected" }
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop tile() // { dg-error "expected" }
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop tile(1)
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop tile(2)
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop tile(6-2) 
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop tile(6+2) 
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop tile(*) 
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop tile(*, 1) 
-    for (i = 0; i < 10; i++)
-      {
-	for (j = 0; j < 10; i++)
-	  { }
-      }
-#pragma acc loop tile(-2) // { dg-warning "'tile' value must be positive" }
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop tile(i)
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop tile(2, 2, 1)
-    for (i = 2; i < 4; i++)
-      for (i = 4; i < 6; i++)
-	{ }
-#pragma acc loop tile(2, 2)
-    for (i = 1; i < 5; i+=2)
-      for (j = i+1; j < 7; i++)
-	{ }
-#pragma acc loop vector tile(*) 
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop worker tile(*) 
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop gang tile(*) 
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop vector gang tile(*) 
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop vector worker tile(*) 
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop gang worker tile(*) 
-    for (i = 0; i < 10; i++)
-      { }
-   }
-}
-
-
-void k3 (void)
-{
-  int i, j;
-
-#pragma acc kernels loop tile // { dg-error "expected" }
-  for (i = 0; i < 10; i++)
-    { }
-#pragma acc kernels loop tile() // { dg-error "expected" }
-  for (i = 0; i < 10; i++)
-    { }
-#pragma acc kernels loop tile(1) 
-  for (i = 0; i < 10; i++)
-    { }
-#pragma acc kernels loop tile(*) 
-  for (i = 0; i < 10; i++)
-    { }
-#pragma acc kernels loop tile(*, 1) 
-  for (i = 0; i < 10; i++)
-    {
-      for (j = 1; j < 10; j++)
-	{ }
-    }
-#pragma acc kernels loop tile(-2) // { dg-warning "'tile' value must be positive" }
-  for (i = 1; i < 10; i++)
-    { }
-#pragma acc kernels loop tile(i)
-  for (i = 1; i < 10; i++)
-    { }
-#pragma acc kernels loop tile(2, 2, 1)
-  for (i = 1; i < 3; i++)
-    {
-      for (j = 4; j < 6; j++)
-	{ }
-    }    
-#pragma acc kernels loop tile(2, 2)
-  for (i = 1; i < 5; i++)
-    {
-      for (j = i + 1; j < 7; j += i)
-	{ }
-    }
-#pragma acc kernels loop vector tile(*) 
-  for (i = 0; i < 10; i++)
-    { }
-#pragma acc kernels loop worker tile(*) 
-  for (i = 0; i < 10; i++)
-    { }
-#pragma acc kernels loop gang tile(*) 
-  for (i = 0; i < 10; i++)
-    { }
-#pragma acc kernels loop vector gang tile(*) 
-  for (i = 0; i < 10; i++)
-    { }
-#pragma acc kernels loop vector worker tile(*) 
-  for (i = 0; i < 10; i++)
-    { }
-#pragma acc kernels loop gang worker tile(*) 
-  for (i = 0; i < 10; i++)
-    { }
-}
diff --git gcc/testsuite/c-c++-common/goacc/loop-tile-p1.c gcc/testsuite/c-c++-common/goacc/loop-tile-p1.c
deleted file mode 100644
index 665bc15..0000000
--- gcc/testsuite/c-c++-common/goacc/loop-tile-p1.c
+++ /dev/null
@@ -1,128 +0,0 @@
-/* { dg-do compile } */
-/* { dg-additional-options "-fmax-errors=200" } */
-
-
-void par (void)
-{
-  int i, j;
-
-#pragma acc parallel
-  {
-#pragma acc loop tile // { dg-error "expected" }
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop tile() // { dg-error "expected" }
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop tile(1) 
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop tile(*) 
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop tile(2) 
-    for (i = 0; i < 10; i++)
-      {
-	for (j = 1; j < 10; j++)
-	  { }
-      }
-#pragma acc loop tile(-2) // { dg-warning "'tile' value must be positive" }
-    for (i = 1; i < 10; i++)
-      { }
-#pragma acc loop tile(i)
-    for (i = 1; i < 10; i++)
-      { }
-#pragma acc loop tile(2, 2, 1)
-    for (i = 1; i < 3; i++)
-      {
-	for (j = 4; j < 6; j++)
-	  { }
-      } 
-#pragma acc loop tile(2, 2)
-    for (i = 1; i < 5; i+=2)
-      {
-	for (j = i + 1; j < 7; j+=i)
-	  { }
-      }
-#pragma acc loop vector tile(*) 
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop worker tile(*) 
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop gang tile(*) 
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop vector gang tile(*) 
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop vector worker tile(*) 
-    for (i = 0; i < 10; i++)
-      { }
-#pragma acc loop gang worker tile(*) 
-    for (i = 0; i < 10; i++)
-      { }
-  }
-}
-void p3 (void)
-{
-  int i, j;
-
-  
-#pragma acc parallel loop tile // { dg-error "expected" }
-  for (i = 0; i < 10; i++)
-    { }
-#pragma acc parallel loop tile() // { dg-error "expected" }
-  for (i = 0; i < 10; i++)
-    { }
-#pragma acc parallel loop tile(1) 
-  for (i = 0; i < 10; i++)
-    { }
-#pragma acc parallel loop tile(*) 
-  for (i = 0; i < 10; i++)
-    { }
-#pragma acc parallel loop tile(*, 1) 
-  for (i = 0; i < 10; i++)
-    {
-      for (j = 1; j < 10; j++)
-	{ }
-    }
-#pragma acc parallel loop tile(-2) // { dg-warning "'tile' value must be positive" }
-  for (i = 1; i < 10; i++)
-    { }
-#pragma acc parallel loop tile(i)
-  for (i = 1; i < 10; i++)
-    { }
-#pragma acc parallel loop tile(2, 2, 1)
-  for (i = 1; i < 3; i++)
-    {
-      for (j = 4; j < 6; j++)
-        { }
-    }    
-#pragma acc parallel loop tile(2, 2)
-  for (i = 1; i < 5; i+=2)
-    {
-      for (j = i + 1; j < 7; j++)
-        { }
-    }
-#pragma acc parallel loop vector tile(*) 
-  for (i = 0; i < 10; i++)
-    { }
-#pragma acc parallel loop worker tile(*) 
-  for (i = 0; i < 10; i++)
-    { }
-#pragma acc parallel loop gang tile(*) 
-  for (i = 0; i < 10; i++)
-    { }
-#pragma acc parallel loop vector gang tile(*) 
-  for (i = 0; i < 10; i++)
-    { }
-#pragma acc parallel loop vector worker tile(*) 
-  for (i = 0; i < 10; i++)
-    { }
-#pragma acc parallel loop gang worker tile(*) 
-  for (i = 0; i < 10; i++)
-    { }
-
-}
-
diff --git gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c
index ac96d38..93a9111 100644
--- gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c
+++ gcc/testsuite/c-c++-common/goacc/nesting-fail-1.c
@@ -7,9 +7,9 @@ f_acc_parallel (void)
 {
 #pragma acc parallel
   {
-#pragma acc parallel /* { dg-error ".parallel. construct inside of .parallel. region" } */
+#pragma acc parallel /* { dg-bogus ".parallel. construct inside of .parallel. region" "not implemented" { xfail *-*-* } } */
     ;
-#pragma acc kernels /* { dg-error ".kernels. construct inside of .parallel. region" } */
+#pragma acc kernels /* { dg-bogus ".kernels. construct inside of .parallel. region" "not implemented" { xfail *-*-* } } */
     ;
 #pragma acc data /* { dg-error ".data. construct inside of .parallel. region" } */
     ;
@@ -26,9 +26,9 @@ f_acc_kernels (void)
 {
 #pragma acc kernels
   {
-#pragma acc parallel /* { dg-error ".parallel. construct inside of .kernels. region" } */
+#pragma acc parallel /* { dg-bogus ".parallel. construct inside of .kernels. region" "not implemented" { xfail *-*-* } } */
     ;
-#pragma acc kernels /* { dg-error ".kernels. construct inside of .kernels. region" } */
+#pragma acc kernels /* { dg-bogus ".kernels. construct inside of .kernels. region" "not implemented" { xfail *-*-* } } */
     ;
 #pragma acc data /* { dg-error ".data. construct inside of .kernels. region" } */
     ;
@@ -64,3 +64,13 @@ f_acc_routine (void)
 #pragma acc parallel /* { dg-error "OpenACC region inside of OpenACC routine, nested parallelism not supported yet" } */
   ;
 }
+
+void
+f (void)
+{
+  int i, v = 0;
+
+#pragma acc loop gang reduction (+:v) /* { dg-error "loop directive must be associated with an OpenACC compute region" } */
+  for (i = 0; i < 10; i++)
+    v++;
+}
diff --git gcc/testsuite/c-c++-common/goacc/non-routine.c gcc/testsuite/c-c++-common/goacc/non-routine.c
deleted file mode 100644
index 688ea02..0000000
--- gcc/testsuite/c-c++-common/goacc/non-routine.c
+++ /dev/null
@@ -1,16 +0,0 @@
-/* This program validates the behavior of acc loops which are
-   not associated with a parallel or kernles region or routine.  */
-
-/* { dg-do compile } */
-
-int
-main ()
-{
-  int i, v = 0;
-
-#pragma acc loop gang reduction (+:v) /* { dg-error "loop directive must be associated with an OpenACC compute region" } */
-  for (i = 0; i < 10; i++)
-    v++;
-
-  return v;
-}
diff --git gcc/testsuite/c-c++-common/goacc/parallel-1.c gcc/testsuite/c-c++-common/goacc/parallel-1.c
new file mode 100644
index 0000000..6c6cc88
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/parallel-1.c
@@ -0,0 +1,38 @@
+int
+parallel_empty (void)
+{
+#pragma acc parallel
+  ;
+
+  return 0;
+}
+
+int
+parallel_eternal (void)
+{
+#pragma acc parallel
+  {
+    while (1)
+      ;
+  }
+
+  return 0;
+}
+
+int
+parallel_noreturn (void)
+{
+#pragma acc parallel
+  __builtin_abort ();
+
+  return 0;
+}
+
+int
+parallel_clauses (void)
+{
+  int a, b[100];
+
+#pragma acc parallel firstprivate (a, b)
+  ;
+}
diff --git gcc/testsuite/c-c++-common/goacc/parallel-empty.c gcc/testsuite/c-c++-common/goacc/parallel-empty.c
deleted file mode 100644
index a860526..0000000
--- gcc/testsuite/c-c++-common/goacc/parallel-empty.c
+++ /dev/null
@@ -1,6 +0,0 @@
-void
-foo (void)
-{
-#pragma acc parallel
-  ;
-}
diff --git gcc/testsuite/c-c++-common/goacc/parallel-eternal.c gcc/testsuite/c-c++-common/goacc/parallel-eternal.c
deleted file mode 100644
index 51eac76..0000000
--- gcc/testsuite/c-c++-common/goacc/parallel-eternal.c
+++ /dev/null
@@ -1,11 +0,0 @@
-int
-main (void)
-{
-#pragma acc parallel
-  {
-    while (1)
-      ;
-  }
-
-  return 0;
-}
diff --git gcc/testsuite/c-c++-common/goacc/parallel-noreturn.c gcc/testsuite/c-c++-common/goacc/parallel-noreturn.c
deleted file mode 100644
index ec840bd..0000000
--- gcc/testsuite/c-c++-common/goacc/parallel-noreturn.c
+++ /dev/null
@@ -1,12 +0,0 @@
-int
-main (void)
-{
-
-#pragma acc parallel
-  {
-    __builtin_abort ();
-  }
-
-  return 0;
-}
-
diff --git gcc/testsuite/c-c++-common/goacc/reduction-1.c gcc/testsuite/c-c++-common/goacc/reduction-1.c
index 59cb6f4..3c1c2dd 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-1.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-1.c
@@ -1,6 +1,5 @@
 /* Integer reductions.  */
 
-#define vl 32
 #define n 1000
 
 int
@@ -11,56 +10,56 @@ main(void)
   int lresult;
 
   /* '+' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (+:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (*:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
   /* 'max' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (max:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (max:result)
   for (i = 0; i < n; i++)
     result = result > array[i] ? result : array[i];
 
   /* 'min' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (min:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (min:result)
   for (i = 0; i < n; i++)
     result = result < array[i] ? result : array[i];
 
   /* '&' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (&:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&:result)
   for (i = 0; i < n; i++)
     result &= array[i];
 
   /* '|' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (|:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (|:result)
   for (i = 0; i < n; i++)
     result |= array[i];
 
   /* '^' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (^:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (^:result)
   for (i = 0; i < n; i++)
     result ^= array[i];
 
   /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (&&:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (||:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
 
diff --git gcc/testsuite/c-c++-common/goacc/reduction-2.c gcc/testsuite/c-c++-common/goacc/reduction-2.c
index 4889241..c3105a2 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-2.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-2.c
@@ -1,6 +1,5 @@
 /* float reductions.  */
 
-#define vl 32
 #define n 1000
 
 int
@@ -11,38 +10,38 @@ main(void)
   int lresult;
 
   /* '+' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (+:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (*:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
   /* 'max' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (max:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (max:result)
   for (i = 0; i < n; i++)
     result = result > array[i] ? result : array[i];
 
   /* 'min' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (min:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (min:result)
   for (i = 0; i < n; i++)
     result = result < array[i] ? result : array[i];
 
   /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (&&:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (||:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
 
diff --git gcc/testsuite/c-c++-common/goacc/reduction-3.c gcc/testsuite/c-c++-common/goacc/reduction-3.c
index b19224e2..4dbde04 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-3.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-3.c
@@ -1,6 +1,5 @@
 /* double reductions.  */
 
-#define vl 32
 #define n 1000
 
 int
@@ -11,38 +10,38 @@ main(void)
   int lresult;
 
   /* '+' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (+:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (*:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
   /* 'max' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (max:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (max:result)
   for (i = 0; i < n; i++)
     result = result > array[i] ? result : array[i];
 
   /* 'min' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (min:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (min:result)
   for (i = 0; i < n; i++)
     result = result < array[i] ? result : array[i];
 
   /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (&&:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (result > array[i]);
 
   /* '||' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (||:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (result > array[i]);
 
diff --git gcc/testsuite/c-c++-common/goacc/reduction-4.c gcc/testsuite/c-c++-common/goacc/reduction-4.c
index 88d7f70..c4572b9 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-4.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-4.c
@@ -1,6 +1,5 @@
 /* complex reductions.  */
 
-#define vl 32
 #define n 1000
 
 int
@@ -11,44 +10,26 @@ main(void)
   int lresult;
 
   /* '+' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (+:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (+:result)
   for (i = 0; i < n; i++)
     result += array[i];
 
   /* '*' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (*:result)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (*:result)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-  /* 'max' reductions.  */
-#if 0
-  // error: 'result' has invalid type for 'reduction(max)'
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (max:result)
-  for (i = 0; i < n; i++)
-    result = result > array[i] ? result : array[i];
-#endif
-
-  /* 'min' reductions.  */
-#if 0
-  // error: 'result' has invalid type for 'reduction(min)'
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (min:result)
-  for (i = 0; i < n; i++)
-    result = result < array[i] ? result : array[i];
-#endif
-
   /* '&&' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (&&:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (&&:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult && (__real__(result) > __real__(array[i]));
 
   /* '||' reductions.  */
-#pragma acc parallel vector_length (vl)
-#pragma acc loop vector reduction (||:lresult)
+#pragma acc parallel
+#pragma acc loop gang worker vector reduction (||:lresult)
   for (i = 0; i < n; i++)
     lresult = lresult || (__real__(result) > __real__(array[i]));
 
diff --git gcc/testsuite/c-c++-common/goacc/routine-3.c gcc/testsuite/c-c++-common/goacc/routine-3.c
index e8e5f04..b322d26 100644
--- gcc/testsuite/c-c++-common/goacc/routine-3.c
+++ gcc/testsuite/c-c++-common/goacc/routine-3.c
@@ -1,64 +1,118 @@
+/* Test invalid calls to routines.  */
+
 #pragma acc routine gang
-void gang (void) /* { dg-message "declared here" 3 } */
+int
+gang () /* { dg-message "declared here" 3 } */
 {
   #pragma acc loop gang worker vector
   for (int i = 0; i < 10; i++)
     {
     }
+
+  return 1;
 }
 
 #pragma acc routine worker
-void worker (void) /* { dg-message "declared here" 2 } */
+int
+worker () /* { dg-message "declared here" 2 } */
 {
   #pragma acc loop worker vector
   for (int i = 0; i < 10; i++)
     {
     }
+
+  return 1;
 }
 
 #pragma acc routine vector
-void vector (void) /* { dg-message "declared here" 1 } */
+int
+vector () /* { dg-message "declared here" } */
 {
   #pragma acc loop vector
   for (int i = 0; i < 10; i++)
     {
     }
+
+  return 1;
 }
 
 #pragma acc routine seq
-void seq (void)
+int
+seq ()
 {
+  return 1;
 }
 
-int main ()
+int
+main ()
 {
-
-#pragma acc parallel num_gangs (32) num_workers (32) vector_length (32)
+  int red = 0;
+#pragma acc parallel copy (red)
   {
-    #pragma acc loop gang /* { dg-message "loop here" 1 } */
+    /* Independent/seq loop tests.  */
+#pragma acc loop reduction (+:red) // { dg-warning "insufficient partitioning" }
+    for (int i = 0; i < 10; i++)
+      red += gang ();
+
+#pragma acc loop reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += worker ();
+
+#pragma acc loop reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += vector ();
+
+    /* Gang routine tests.  */
+#pragma acc loop gang reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += gang (); // { dg-error "routine call uses same" }
+
+#pragma acc loop worker reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += gang (); // { dg-error "routine call uses same" }
+
+#pragma acc loop vector reduction (+:red)  /* { dg-message "containing loop" } */
     for (int i = 0; i < 10; i++)
-      {
-	gang (); /*  { dg-error "routine call uses same" } */
-	worker ();
-	vector ();
-	seq ();
-      }
-    #pragma acc loop worker /* { dg-message "loop here" 2 } */
+      red += gang (); // { dg-error "routine call uses same" }
+
+    /* Worker routine tests.  */
+#pragma acc loop gang reduction (+:red)
     for (int i = 0; i < 10; i++)
-      {
-	gang (); /*  { dg-error "routine call uses same" } */
-	worker (); /*  { dg-error "routine call uses same" } */
-	vector ();
-	seq ();
-      }
-    #pragma acc loop vector /* { dg-message "loop here" 3 } */
+      red += worker ();
+
+#pragma acc loop worker reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += worker (); // { dg-error "routine call uses same" }
+
+#pragma acc loop vector reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += worker (); // { dg-error "routine call uses same" }
+
+    /* Vector routine tests.  */
+#pragma acc loop gang reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += vector ();
+
+#pragma acc loop worker reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += vector ();
+
+#pragma acc loop vector reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += vector (); // { dg-error "routine call uses same" }
+
+    /* Seq routine tests.  */
+#pragma acc loop gang reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += seq ();
+
+#pragma acc loop worker reduction (+:red)
+    for (int i = 0; i < 10; i++)
+      red += seq ();
+
+#pragma acc loop vector reduction (+:red)
     for (int i = 0; i < 10; i++)
-      {
-	gang (); /*  { dg-error "routine call uses same" } */
-	worker (); /*  { dg-error "routine call uses same" } */
-	vector (); /*  { dg-error "routine call uses same" } */
-	seq ();
-      }
+      red += seq ();
   }
 
   return 0;
diff --git gcc/testsuite/c-c++-common/goacc/routine-4.c gcc/testsuite/c-c++-common/goacc/routine-4.c
index 004d713..3e5fc4f 100644
--- gcc/testsuite/c-c++-common/goacc/routine-4.c
+++ gcc/testsuite/c-c++-common/goacc/routine-4.c
@@ -1,3 +1,4 @@
+/* Test invalid intra-routine parallelism.  */
 
 void gang (void);
 void worker (void);
@@ -14,6 +15,24 @@ void seq (void)
   worker ();  /* { dg-error "routine call uses" } */
   vector ();  /* { dg-error "routine call uses" } */
   seq ();
+
+  int red;
+
+#pragma acc loop reduction (+:red) // { dg-warning "insufficient partitioning" }
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop gang reduction (+:red) // { dg-error "disallowed by containing routine" }
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop worker reduction (+:red) // { dg-error "disallowed by containing routine" }
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop vector reduction (+:red) // { dg-error "disallowed by containing routine" }
+  for (int i = 0; i < 10; i++)
+    red ++;
 }
 
 void vector (void) /* { dg-message "declared here" 1 } */
@@ -22,6 +41,24 @@ void vector (void) /* { dg-message "declared here" 1 } */
   worker ();  /* { dg-error "routine call uses" } */
   vector ();
   seq ();
+
+  int red;
+
+#pragma acc loop reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop gang reduction (+:red) // { dg-error "disallowed by containing routine" }
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop worker reduction (+:red) // { dg-error "disallowed by containing routine" }
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop vector reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
 }
 
 void worker (void) /* { dg-message "declared here" 2 } */
@@ -30,6 +67,24 @@ void worker (void) /* { dg-message "declared here" 2 } */
   worker ();
   vector ();
   seq ();
+
+  int red;
+
+#pragma acc loop reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop gang reduction (+:red) // { dg-error "disallowed by containing routine" }
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop worker reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop vector reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
 }
 
 void gang (void) /* { dg-message "declared here" 3 } */
@@ -38,4 +93,22 @@ void gang (void) /* { dg-message "declared here" 3 } */
   worker ();
   vector ();
   seq ();
+
+  int red;
+
+#pragma acc loop reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop gang reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop worker reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
+
+#pragma acc loop vector reduction (+:red)
+  for (int i = 0; i < 10; i++)
+    red ++;
 }
diff --git gcc/testsuite/c-c++-common/goacc/routine-6.c gcc/testsuite/c-c++-common/goacc/routine-6.c
deleted file mode 100644
index 778efb1..0000000
--- gcc/testsuite/c-c++-common/goacc/routine-6.c
+++ /dev/null
@@ -1,120 +0,0 @@
-/* Test invalid calls to routines.  */
-/* { dg-do compile } */
-
-#pragma acc routine gang
-int
-gang () /* { dg-message "declared here" 3 } */
-{
-  #pragma acc loop gang worker vector
-  for (int i = 0; i < 10; i++)
-    {
-    }
-
-  return 1;
-}
-
-#pragma acc routine worker
-int
-worker () /* { dg-message "declared here" 2 } */
-{
-  #pragma acc loop worker vector
-  for (int i = 0; i < 10; i++)
-    {
-    }
-
-  return 1;
-}
-
-#pragma acc routine vector
-int
-vector () /* { dg-message "declared here" } */
-{
-  #pragma acc loop vector
-  for (int i = 0; i < 10; i++)
-    {
-    }
-
-  return 1;
-}
-
-#pragma acc routine seq
-int
-seq ()
-{
-  return 1;
-}
-
-int
-main ()
-{
-  int red = 0;
-#pragma acc parallel copy (red)
-  {
-    /* Independent/seq loop tests.  */
-#pragma acc loop reduction (+:red) // { dg-warning "insufficient partitioning" }
-    for (int i = 0; i < 10; i++)
-      red += gang ();
-
-#pragma acc loop reduction (+:red)
-    for (int i = 0; i < 10; i++)
-      red += worker ();
-
-#pragma acc loop reduction (+:red)
-    for (int i = 0; i < 10; i++)
-      red += vector ();
-
-    /* Gang routine tests.  */
-#pragma acc loop gang reduction (+:red)  /* { dg-message "containing loop" } */
-    for (int i = 0; i < 10; i++)
-      red += gang (); // { dg-error "routine call uses same" }
-
-#pragma acc loop worker reduction (+:red)  /* { dg-message "containing loop" } */
-    for (int i = 0; i < 10; i++)
-      red += gang (); // { dg-error "routine call uses same" }
-
-#pragma acc loop vector reduction (+:red)  /* { dg-message "containing loop" } */
-    for (int i = 0; i < 10; i++)
-      red += gang (); // { dg-error "routine call uses same" }
-
-    /* Worker routine tests.  */
-#pragma acc loop gang reduction (+:red)
-    for (int i = 0; i < 10; i++)
-      red += worker ();
-
-#pragma acc loop worker reduction (+:red)  /* { dg-message "containing loop" } */
-    for (int i = 0; i < 10; i++)
-      red += worker (); // { dg-error "routine call uses same" }
-
-#pragma acc loop vector reduction (+:red)  /* { dg-message "containing loop" } */
-    for (int i = 0; i < 10; i++)
-      red += worker (); // { dg-error "routine call uses same" }
-
-    /* Vector routine tests.  */
-#pragma acc loop gang reduction (+:red)
-    for (int i = 0; i < 10; i++)
-      red += vector ();
-
-#pragma acc loop worker reduction (+:red)
-    for (int i = 0; i < 10; i++)
-      red += vector ();
-
-#pragma acc loop vector reduction (+:red)  /* { dg-message "containing loop" } */
-    for (int i = 0; i < 10; i++)
-      red += vector (); // { dg-error "routine call uses same" }
-
-    /* Seq routine tests.  */
-#pragma acc loop gang reduction (+:red)
-    for (int i = 0; i < 10; i++)
-      red += seq ();
-
-#pragma acc loop worker reduction (+:red)
-    for (int i = 0; i < 10; i++)
-      red += seq ();
-
-#pragma acc loop vector reduction (+:red)
-    for (int i = 0; i < 10; i++)
-      red += seq ();
-  }
-
-  return 0;
-}
diff --git gcc/testsuite/c-c++-common/goacc/routine-7.c gcc/testsuite/c-c++-common/goacc/routine-7.c
deleted file mode 100644
index 9cae140..0000000
--- gcc/testsuite/c-c++-common/goacc/routine-7.c
+++ /dev/null
@@ -1,94 +0,0 @@
-/* Test invalid intra-routine parallelism.  */
-/* { dg-do compile } */
-
-#pragma acc routine gang
-int
-gang (int red)
-{
-#pragma acc loop reduction (+:red)
-  for (int i = 0; i < 10; i++)
-    red ++;
-
-#pragma acc loop gang reduction (+:red)
-  for (int i = 0; i < 10; i++)
-    red ++;
-
-#pragma acc loop worker reduction (+:red)
-  for (int i = 0; i < 10; i++)
-    red ++;
-
-#pragma acc loop vector reduction (+:red)
-  for (int i = 0; i < 10; i++)
-    red ++;
-
-  return 1;
-}
-
-#pragma acc routine worker
-int
-worker (int red)
-{
-#pragma acc loop reduction (+:red)
-  for (int i = 0; i < 10; i++)
-    red ++;
-
-#pragma acc loop gang reduction (+:red) // { dg-error "disallowed by containing routine" }
-  for (int i = 0; i < 10; i++)
-    red ++;
-
-#pragma acc loop worker reduction (+:red)
-  for (int i = 0; i < 10; i++)
-    red ++;
-
-#pragma acc loop vector reduction (+:red)
-  for (int i = 0; i < 10; i++)
-    red ++;
-
-  return 1;
-}
-
-#pragma acc routine vector
-int
-vector (int red)
-{
-#pragma acc loop reduction (+:red)
-  for (int i = 0; i < 10; i++)
-    red ++;
-
-#pragma acc loop gang reduction (+:red) // { dg-error "disallowed by containing routine" }
-  for (int i = 0; i < 10; i++)
-    red ++;
-
-#pragma acc loop worker reduction (+:red) // { dg-error "disallowed by containing routine" }
-  for (int i = 0; i < 10; i++)
-    red ++;
-
-#pragma acc loop vector reduction (+:red)
-  for (int i = 0; i < 10; i++)
-    red ++;
-
-  return 1;
-}
-
-#pragma acc routine seq
-int
-seq (int red)
-{
-#pragma acc loop reduction (+:red) // { dg-warning "insufficient partitioning" }
-  for (int i = 0; i < 10; i++)
-    red ++;
-
-#pragma acc loop gang reduction (+:red) // { dg-error "disallowed by containing routine" }
-  for (int i = 0; i < 10; i++)
-    red ++;
-
-#pragma acc loop worker reduction (+:red) // { dg-error "disallowed by containing routine" }
-  for (int i = 0; i < 10; i++)
-    red ++;
-
-#pragma acc loop vector reduction (+:red) // { dg-error "disallowed by containing routine" }
-  for (int i = 0; i < 10; i++)
-    red ++;
-
-  return 1;
-}
diff --git gcc/testsuite/c-c++-common/goacc/tile.c gcc/testsuite/c-c++-common/goacc/tile.c
index 2a81427..8e70e71 100644
--- gcc/testsuite/c-c++-common/goacc/tile.c
+++ gcc/testsuite/c-c++-common/goacc/tile.c
@@ -1,5 +1,3 @@
-/* { dg-do compile } */
-
 int
 main ()
 {
@@ -71,3 +69,259 @@ main ()
 
   return 0;
 }
+
+
+void par (void)
+{
+  int i, j;
+
+#pragma acc parallel
+  {
+#pragma acc loop tile // { dg-error "expected" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile() // { dg-error "expected" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(1) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(2) 
+    for (i = 0; i < 10; i++)
+      {
+	for (j = 1; j < 10; j++)
+	  { }
+      }
+#pragma acc loop tile(-2) // { dg-warning "'tile' value must be positive" }
+    for (i = 1; i < 10; i++)
+      { }
+#pragma acc loop tile(i)
+    for (i = 1; i < 10; i++)
+      { }
+#pragma acc loop tile(2, 2, 1)
+    for (i = 1; i < 3; i++)
+      {
+	for (j = 4; j < 6; j++)
+	  { }
+      } 
+#pragma acc loop tile(2, 2)
+    for (i = 1; i < 5; i+=2)
+      {
+	for (j = i + 1; j < 7; j+=i)
+	  { }
+      }
+#pragma acc loop vector tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector gang tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector worker tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang worker tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+  }
+}
+void p3 (void)
+{
+  int i, j;
+
+  
+#pragma acc parallel loop tile // { dg-error "expected" }
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop tile() // { dg-error "expected" }
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop tile(1) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop tile(*, 1) 
+  for (i = 0; i < 10; i++)
+    {
+      for (j = 1; j < 10; j++)
+	{ }
+    }
+#pragma acc parallel loop tile(-2) // { dg-warning "'tile' value must be positive" }
+  for (i = 1; i < 10; i++)
+    { }
+#pragma acc parallel loop tile(i)
+  for (i = 1; i < 10; i++)
+    { }
+#pragma acc parallel loop tile(2, 2, 1)
+  for (i = 1; i < 3; i++)
+    {
+      for (j = 4; j < 6; j++)
+        { }
+    }    
+#pragma acc parallel loop tile(2, 2)
+  for (i = 1; i < 5; i+=2)
+    {
+      for (j = i + 1; j < 7; j++)
+        { }
+    }
+#pragma acc parallel loop vector tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop worker tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop gang tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop vector gang tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop vector worker tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc parallel loop gang worker tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+
+}
+
+
+void
+kern (void)
+{
+  int i, j;
+
+#pragma acc kernels
+  {
+#pragma acc loop tile // { dg-error "expected" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile() // { dg-error "expected" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(1)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(2)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(6-2) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(6+2) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(*, 1) 
+    for (i = 0; i < 10; i++)
+      {
+	for (j = 0; j < 10; i++)
+	  { }
+      }
+#pragma acc loop tile(-2) // { dg-warning "'tile' value must be positive" }
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(i)
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop tile(2, 2, 1)
+    for (i = 2; i < 4; i++)
+      for (i = 4; i < 6; i++)
+	{ }
+#pragma acc loop tile(2, 2)
+    for (i = 1; i < 5; i+=2)
+      for (j = i+1; j < 7; i++)
+	{ }
+#pragma acc loop vector tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop worker tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector gang tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop vector worker tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+#pragma acc loop gang worker tile(*) 
+    for (i = 0; i < 10; i++)
+      { }
+   }
+}
+
+
+void k3 (void)
+{
+  int i, j;
+
+#pragma acc kernels loop tile // { dg-error "expected" }
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop tile() // { dg-error "expected" }
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop tile(1) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop tile(*, 1) 
+  for (i = 0; i < 10; i++)
+    {
+      for (j = 1; j < 10; j++)
+	{ }
+    }
+#pragma acc kernels loop tile(-2) // { dg-warning "'tile' value must be positive" }
+  for (i = 1; i < 10; i++)
+    { }
+#pragma acc kernels loop tile(i)
+  for (i = 1; i < 10; i++)
+    { }
+#pragma acc kernels loop tile(2, 2, 1)
+  for (i = 1; i < 3; i++)
+    {
+      for (j = 4; j < 6; j++)
+	{ }
+    }    
+#pragma acc kernels loop tile(2, 2)
+  for (i = 1; i < 5; i++)
+    {
+      for (j = i + 1; j < 7; j += i)
+	{ }
+    }
+#pragma acc kernels loop vector tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop worker tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop gang tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop vector gang tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop vector worker tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+#pragma acc kernels loop gang worker tile(*) 
+  for (i = 0; i < 10; i++)
+    { }
+}
diff --git gcc/testsuite/c-c++-common/goacc/use_device-1.c gcc/testsuite/c-c++-common/goacc/use_device-1.c
deleted file mode 100644
index 9a4f6d0..0000000
--- gcc/testsuite/c-c++-common/goacc/use_device-1.c
+++ /dev/null
@@ -1,14 +0,0 @@
-/* { dg-do compile } */
-
-void bar (float *, float *);
-
-void
-foo (float *x, float *y)
-{
-  int n = 1 << 10;
-#pragma acc data create(x[0:n]) copyout(y[0:n])
-  {
-#pragma acc host_data use_device(x,y)
-    bar (x, y);
-  }
-}
diff --git gcc/testsuite/g++.dg/goacc/template-reduction.C gcc/testsuite/g++.dg/goacc/template-reduction.C
deleted file mode 100644
index facad81..0000000
--- gcc/testsuite/g++.dg/goacc/template-reduction.C
+++ /dev/null
@@ -1,100 +0,0 @@
-extern void abort ();
-
-const int n = 100;
-
-// Check explicit template copy map
-
-template<typename T> T
-sum (T array[])
-{
-   T s = 0;
-
-#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy (s, array[0:n])
-  for (int i = 0; i < n; i++)
-    s += array[i];
-
-  return s;
-}
-
-// Check implicit template copy map
-
-template<typename T> T
-sum ()
-{
-  T s = 0;
-  T array[n];
-
-  for (int i = 0; i < n; i++)
-    array[i] = i+1;
-
-#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy (s)
-  for (int i = 0; i < n; i++)
-    s += array[i];
-
-  return s;
-}
-
-// Check present and async
-
-template<typename T> T
-async_sum (T array[])
-{
-   T s = 0;
-
-#pragma acc parallel loop num_gangs (10) gang async (1) present (array[0:n])
-   for (int i = 0; i < n; i++)
-     array[i] = i+1;
-
-#pragma acc parallel loop num_gangs (10) gang reduction (+:s) present (array[0:n]) copy (s) async wait (1)
-  for (int i = 0; i < n; i++)
-    s += array[i];
-
-#pragma acc wait
-
-  return s;
-}
-
-// Check present and async and an explicit firstprivate
-
-template<typename T> T
-async_sum (int c)
-{
-   T s = 0;
-
-#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy(s) firstprivate (c) async wait (1)
-  for (int i = 0; i < n; i++)
-    s += i+c;
-
-#pragma acc wait
-
-  return s;
-}
-
-int
-main()
-{
-  int a[n];
-  int result = 0;
-
-  for (int i = 0; i < n; i++)
-    {
-      a[i] = i+1;
-      result += i+1;
-    }
-
-  if (sum (a) != result)
-    abort ();
-
-  if (sum<int> () != result)
-    abort ();
-
-#pragma acc enter data copyin (a)
-  if (async_sum (a) != result)
-    abort ();
-
-  if (async_sum<int> (1) != result)
-    abort ();
-#pragma acc exit data delete (a)
-
-  return 0;
-}
diff --git gcc/testsuite/g++.dg/goacc/template.C gcc/testsuite/g++.dg/goacc/template.C
index 8bae381..8dbba76 100644
--- gcc/testsuite/g++.dg/goacc/template.C
+++ gcc/testsuite/g++.dg/goacc/template.C
@@ -1,5 +1,3 @@
-/* { dg-additional-options "-w" } */
-
 #pragma acc routine
 template <typename T> T
 accDouble(int val)
@@ -17,60 +15,62 @@ oacc_parallel_copy (T a)
   double z = 4;
 
 #pragma acc parallel num_gangs (a) num_workers (a) vector_length (a) default (none) copyout (b) copyin (a)
-  {
+#pragma acc loop gang worker vector
+  for (int i = 0; i < 1; i++)
     b = a;
-  }
 
 #pragma acc parallel num_gangs (a) copy (w, x, y, z)
-  {
-    w = accDouble<char>(w);
-    x = accDouble<int>(x);
-    y = accDouble<float>(y);
-    z = accDouble<double>(z);
-  }
+#pragma acc loop
+  for (int i = 0; i < 1; i++)
+    {
+      w = accDouble<char>(w);
+      x = accDouble<int>(x);
+      y = accDouble<float>(y);
+      z = accDouble<double>(z);
+    }
 
 #pragma acc parallel num_gangs (a) if (1)
   {
 #pragma acc loop independent collapse (2) device_type (nvidia) gang
-  for (int i = 0; i < a; i++)
-    for (int j = 0; j < 5; j++)
-      b = a;
+    for (int i = 0; i < a; i++)
+      for (int j = 0; j < 5; j++)
+	b = a;
 
 #pragma acc loop auto tile (a, 3)
-  for (int i = 0; i < a; i++)
-    for (int j = 0; j < 5; j++)
-      b = a;
+    for (int i = 0; i < a; i++)
+      for (int j = 0; j < 5; j++)
+	b = a;
 
 #pragma acc loop seq
-  for (int i = 0; i < a; i++)
-    b = a;
+    for (int i = 0; i < a; i++)
+      b = a;
   }
 
   T c;
 
 #pragma acc parallel num_workers (10)
-  {
+#pragma acc loop worker
+  for (int i = 0; i < 1; i++)
+    {
 #pragma acc atomic capture
-    c = b++;
+      c = b++;
 
 #pragma atomic update
-    c++;
+      c++;
 
 #pragma acc atomic read
-    b = a;
+      b = a;
 
 #pragma acc atomic write
-    b = a;
-  }
+      b = a;
+    }
 
 #pragma acc parallel reduction (+:c)
-  {
-    c = 1;
-  }
+  c = 1;
 
 #pragma acc data if (1) copy (b)
   {
-    #pragma acc parallel
+#pragma acc parallel
     {
       b = a;
     }
@@ -78,9 +78,9 @@ oacc_parallel_copy (T a)
 
 #pragma acc enter data copyin (b)
 #pragma acc parallel present (b)
-    {
-      b = a;
-    }
+  {
+    b = a;
+  }
 
 #pragma acc update host (b)
 #pragma acc update self (b)
@@ -113,9 +113,7 @@ oacc_kernels_copy (T a)
 
 #pragma acc kernels loop reduction (+:c)
   for (int i = 0; i < 10; i++)
-    {
-      c = 1;
-    }
+    c = 1;
 
 #pragma acc data if (1) copy (b)
   {
@@ -127,9 +125,10 @@ oacc_kernels_copy (T a)
 
 #pragma acc enter data copyin (b)
 #pragma acc kernels present (b)
-    {
-      b = a;
-    }
+  {
+    b = a;
+  }
+
   return b;
 }
 
diff --git gcc/testsuite/gfortran.dg/goacc/combined-directives.f90 gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
index 875eb80..08d25ca 100644
--- gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
+++ gcc/testsuite/gfortran.dg/goacc/combined-directives.f90
@@ -11,7 +11,7 @@
 
 subroutine test
   implicit none
-  integer a(100), i, j, z
+  integer a(100), i, j, y, z
 
   ! PARALLEL
   
@@ -79,10 +79,10 @@ subroutine test
   end do
   !$acc end parallel loop
 
-!  !$acc parallel loop reduction (+:z) copy (z)
-!  do i = 1, 100
-!  end do
-!  !$acc end parallel loop
+  !$acc parallel loop reduction (+:y) copy (y)
+  do i = 1, 100
+  end do
+  !$acc end parallel loop
 
   ! KERNELS
 
@@ -150,10 +150,10 @@ subroutine test
   end do
   !$acc end kernels loop
 
-!  !$acc kernels loop reduction (+:z) copy (z)
-!  do i = 1, 100
-!  end do
-!  !$acc end kernels loop
+  !$acc kernels loop reduction (+:y) copy (y)
+  do i = 1, 100
+  end do
+  !$acc end kernels loop
 end subroutine test
 
 ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. collapse.2." 2 "gimple" { xfail *-*-* } } }
@@ -165,3 +165,5 @@ end subroutine test
 ! { dg-final { scan-tree-dump-times "acc loop private.i. private.j. tile.2, 3" 2 "gimple" { xfail *-*-* } } }
 ! { dg-final { scan-tree-dump-times "acc loop private.i. independent" 2 "gimple" { xfail *-*-* } } }
 ! { dg-final { scan-tree-dump-times "private.z" 2 "gimple" } }
+! { dg-final { scan-tree-dump-times "omp target oacc_\[^ \]+ map.force_tofrom:y" 2 "gimple" { xfail *-*-* } } }
+! { dg-final { scan-tree-dump-times "acc loop private.i. reduction..:y." 2 "gimple" { xfail *-*-* } } }
diff --git gcc/testsuite/gfortran.dg/goacc/loop-1.f95 gcc/testsuite/gfortran.dg/goacc/loop-1.f95
index 817039f..b5f9e03 100644
--- gcc/testsuite/gfortran.dg/goacc/loop-1.f95
+++ gcc/testsuite/gfortran.dg/goacc/loop-1.f95
@@ -1,5 +1,3 @@
-! { dg-do compile } 
-! { dg-additional-options "-fmax-errors=100" } 
 module test
   implicit none
 contains
@@ -29,14 +27,18 @@ subroutine test1
        i = i + 1
   end do
   !$acc loop
-  do 300 d = 1, 30, 6 ! { dg-error "integer" }
+  do 300 d = 1, 30, 6
       i = d
   300 a(i) = 1
+  ! { dg-warning "Deleted feature: Loop variable at .1. must be integer" "" { target *-*-* } 30 }
+  ! { dg-error "ACC LOOP iteration variable must be of type integer" "" { target *-*-* } 30 }
   !$acc loop
-  do d = 1, 30, 5 ! { dg-error "integer" }
+  do d = 1, 30, 5
        i = d
       a(i) = 2
   end do
+  ! { dg-warning "Deleted feature: Loop variable at .1. must be integer" "" { target *-*-* } 36 }
+  ! { dg-error "ACC LOOP iteration variable must be of type integer" "" { target *-*-* } 36 }
   !$acc loop
   do i = 1, 30
       if (i .eq. 16) exit ! { dg-error "EXIT statement" }
@@ -144,8 +146,10 @@ subroutine test1
     end do
     !$acc parallel loop collapse(2)
     do i = 1, 3
-        do r = 4, 6    ! { dg-error "integer" }
+        do r = 4, 6
         end do
+        ! { dg-warning "Deleted feature: Loop variable at .1. must be integer" "" { target *-*-* } 149 }
+        ! { dg-error "ACC LOOP iteration variable must be of type integer" "" { target *-*-* } 149 }
     end do
 
     ! Both seq and independent are not allowed
@@ -167,4 +171,3 @@ subroutine test1
 
 end subroutine test1
 end module test
-! { dg-prune-output "Deleted" }
diff --git gcc/testsuite/gfortran.dg/goacc/loop-5.f95 gcc/testsuite/gfortran.dg/goacc/loop-5.f95
index 557bd87..d059cf7 100644
--- gcc/testsuite/gfortran.dg/goacc/loop-5.f95
+++ gcc/testsuite/gfortran.dg/goacc/loop-5.f95
@@ -1,6 +1,3 @@
-! { dg-do compile }
-! { dg-additional-options "-fmax-errors=100" }
-
 program test
   implicit none
   integer :: i, j
diff --git gcc/testsuite/gfortran.dg/goacc/loop-6.f95 gcc/testsuite/gfortran.dg/goacc/loop-6.f95
index e844468..d0855b4 100644
--- gcc/testsuite/gfortran.dg/goacc/loop-6.f95
+++ gcc/testsuite/gfortran.dg/goacc/loop-6.f95
@@ -1,11 +1,3 @@
-! { dg-do compile }
-! { dg-additional-options "-fmax-errors=100" }
-
-! This error is temporary.  Remove when support is added for these clauses
-! in the middle end.
-! { dg-prune-output "sorry, unimplemented" }
-! { dg-prune-output "Error: work-sharing region" }
-
 program test
   implicit none
   integer :: i, j
diff --git gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90 gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90
index 1d0ebe2..81bdc23 100644
--- gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90
+++ gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90
@@ -1,4 +1,3 @@
-! { dg-do compile } 
 ! { dg-additional-options "-fdump-tree-original -std=f2008" } 
 
 ! test for tree-dump-original and spaces-commas
diff --git gcc/testsuite/gfortran.dg/goacc/parallel-tree.f95 gcc/testsuite/gfortran.dg/goacc/parallel-tree.f95
index 2a279ff..c664690 100644
--- gcc/testsuite/gfortran.dg/goacc/parallel-tree.f95
+++ gcc/testsuite/gfortran.dg/goacc/parallel-tree.f95
@@ -1,5 +1,4 @@
-! { dg-do compile } 
-! { dg-additional-options "-fdump-tree-original -w" } 
+! { dg-additional-options "-fdump-tree-original" }
 
 ! test for tree-dump-original and spaces-commas
 
@@ -12,9 +11,13 @@ program test
   !$acc reduction(max:q), copy(i), copyin(j), copyout(k), create(m) &
   !$acc present(o), pcopy(p), pcopyin(r), pcopyout(s), pcreate(t) &
   !$acc deviceptr(u), private(v), firstprivate(w)
+  ! { dg-warning "region is gang partitioned but does not contain gang partitioned code" "" { target *-*-* } 13 }
+  ! { dg-warning "region is worker partitioned but does not contain worker partitioned code" "" { target *-*-* } 13 }
+  ! { dg-warning "region is vector partitioned but does not contain vector partitioned code" "" { target *-*-* } 13 }
   !$acc end parallel
 
 end program test
+
 ! { dg-final { scan-tree-dump-times "pragma acc parallel" 1 "original" } } 
 
 ! { dg-final { scan-tree-dump-times "if" 1 "original" } }
diff --git libgomp/ChangeLog libgomp/ChangeLog
index f4f30fb..a1763b6 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,3 +1,71 @@
+2016-03-30  Thomas Schwinge  <thomas@codesourcery.com>
+	    James Norris  <jnorris@codesourcery.com>
+	    Nathan Sidwell  <nathan@codesourcery.com>
+	    Julian Brown  <julian@codesourcery.com>
+	    Cesar Philippidis  <cesar@codesourcery.com>
+	    Chung-Lin Tang  <cltang@codesourcery.com>
+	    Tom de Vries  <tom@codesourcery.com>
+
+	* testsuite/libgomp.oacc-c-c++-common/clauses-1.c: Update.
+	* testsuite/libgomp.oacc-c-c++-common/deviceptr-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/vector-loop.c: Likewise.
+	* testsuite/libgomp.oacc-fortran/asyncwait-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/asyncwait-2.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/asyncwait-3.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/declare-1.f90: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Likewise.
+	XFAIL.
+	* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: Update.
+	Incorporate...
+	* testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c: ... this
+	file.
+	* testsuite/libgomp.oacc-c++/template-reduction.C: New file.
+	* testsuite/libgomp.oacc-c-c++-common/gang-static-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/gang-static-2.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-clauses.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/private-variables.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/routine-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/routine-4.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/routine-wv-2.c: Likewise.
+	* testsuite/libgomp.oacc-fortran/clauses-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/default-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/firstprivate-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/gang-static-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/implicit-firstprivate-ref.f90:
+	Likewise.
+	* testsuite/libgomp.oacc-fortran/pr68813.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/private-variables.f90: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Merge this
+	file...
+	* testsuite/libgomp.oacc-c-c++-common/parallel-1.c: ..., and this
+	file into...
+	* testsuite/libgomp.oacc-c-c++-common/data-clauses.h: ... this new
+	file.  Update.
+	* testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c: New
+	file.
+	* testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c:
+	Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-2.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c:
+	... this new file.  Update.
+	* testsuite/libgomp.oacc-c-c++-common/parallel-2.c: Rename to...
+	* testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c:
+	... this new file.  Update.
+	* testsuite/libgomp.oacc-c-c++-common/mode-transitions.c: New
+	file.  Incorporate...
+	* testsuite/libgomp.oacc-c-c++-common/worker-single-1a.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-single-4.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-single-6.c: ... this
+	file.
+	* testsuite/libgomp.oacc-c-c++-common/update-1-2.c: Remove file.
+
 2016-03-29  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* testsuite/libgomp.oacc-c++/c++.exp [!lang_test_file_found]: Call
diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index b10ae94..f54de33 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,212 @@
+2016-04-04  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c: Don't XFAIL.
+	* testsuite/libgomp.oacc-c-c++-common/routine-1.c: Extend testing
+	to cover more parallelism levels, and asynchronous kernel
+	launches.
+	* testsuite/libgomp.oacc-c-c++-common/data-3.c: Remove OpenACC
+	present directives.
+	* testsuite/libgomp.oacc-fortran/collapse-5.f90: Remove OpenACC
+	copy directives.
+	* testsuite/libgomp.oacc-fortran/collapse-6.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/collapse-7.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/collapse-8.f90: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/firstprivate-3.c: Remove
+	file.
+	* testsuite/libgomp.oacc-c-c++-common/firstprivate-4.c: Remove
+	file, moving its content into...
+	* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: ... this
+	file.
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-par-gang-1.c:
+	Remove file, moving its content into...
+	* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: ... this
+	file.
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-4.c: Rename
+	file to...
+	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-clauses.c:
+	... this file.  Clean up dg-* directives.
+	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Use
+	dg-warning directives instead of specifying the -w compiler
+	option.
+	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/routine-4.c: Change
+	parallelism used instead of specifying the -w compiler option.
+	* testsuite/libgomp.oacc-fortran/routine-7.f90: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c:
+	Merge this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c:
+	... this file into...
+	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: ... this new
+	file.  Use dg-warning directives instead of specifying the -w
+	compiler option.
+	* testsuite/libgomp.oacc-c-c++-common/vec-partn-1.c: Merge this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/vec-partn-2.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/vec-partn-4.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/vec-partn-5.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/vec-partn-6.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/vec-single-1.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/vec-single-2.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/vec-single-3.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/vec-single-4.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/vec-single-5.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/vec-single-6.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/vector-broadcast.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-partn-1.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-partn-2.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-partn-3.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-partn-4.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-partn-5.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-partn-6.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-partn-7.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-partn-8.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-single-1.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-single-1a.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-single-2.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-single-3.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-single-4.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-single-5.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/worker-single-6.c: ... this
+	file, and...
+	* testsuite/libgomp.oacc-c-c++-common/mode-transitions.c: ... this
+	new file.  Use dg-warning directives instead of specifying the -w
+	compiler option.
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-local-gang-1.c:
+	Merge this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-1.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-2.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-3.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-4.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-5.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-1.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-2.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-3.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-4.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-5.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-6.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-vector-1.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-vector-2.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-1.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-2.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-3.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-4.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-5.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-6.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-7.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-par-gang-2.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/private-vars-par-gang-3.c:
+	... this file into...
+	* testsuite/libgomp.oacc-c-c++-common/private-variables.c:
+	... this new file.  Use dg-warning directives instead of
+	specifying the -w compiler option.
+	* testsuite/libgomp.oacc-fortran/private-vars-loop-gang-1.f90:
+	Merge this file, and...
+	* testsuite/libgomp.oacc-fortran/private-vars-loop-gang-2.f90:
+	... this file, and...
+	* testsuite/libgomp.oacc-fortran/private-vars-loop-gang-3.f90:
+	... this file, and...
+	* testsuite/libgomp.oacc-fortran/private-vars-loop-gang-6.f90:
+	... this file, and...
+	* testsuite/libgomp.oacc-fortran/private-vars-loop-vector-1.f90:
+	... this file, and...
+	* testsuite/libgomp.oacc-fortran/private-vars-loop-vector-2.f90:
+	... this file, and...
+	* testsuite/libgomp.oacc-fortran/private-vars-loop-worker-1.f90:
+	... this file, and...
+	* testsuite/libgomp.oacc-fortran/private-vars-loop-worker-2.f90:
+	... this file, and...
+	* testsuite/libgomp.oacc-fortran/private-vars-loop-worker-3.f90:
+	... this file, and...
+	* testsuite/libgomp.oacc-fortran/private-vars-loop-worker-4.f90:
+	... this file, and...
+	* testsuite/libgomp.oacc-fortran/private-vars-loop-worker-5.f90:
+	... this file, and...
+	* testsuite/libgomp.oacc-fortran/private-vars-loop-worker-6.f90:
+	... this file, and...
+	* testsuite/libgomp.oacc-fortran/private-vars-loop-worker-7.f90:
+	... this file, and...
+	* testsuite/libgomp.oacc-fortran/private-vars-par-gang-2.f90:
+	... this file into...
+	* testsuite/libgomp.oacc-fortran/private-variables.f90: ... this
+	new file.  Use dg-warning directives instead of specifying the -w
+	compiler option.
+	* testsuite/libgomp.oacc-c-c++-common/routine-2.c: Remove file.
+	* testsuite/libgomp.oacc-c-c++-common/routine-vec-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/routine-work-1.c: Likewise.
+	* testsuite/libgomp.oacc-fortran/update-1-2.f90: Likewise.
+
 2016-03-24  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c:
diff --git libgomp/testsuite/libgomp.oacc-c++/template-reduction.C libgomp/testsuite/libgomp.oacc-c++/template-reduction.C
index 0150d99..fb5924c 100644
--- libgomp/testsuite/libgomp.oacc-c++/template-reduction.C
+++ libgomp/testsuite/libgomp.oacc-c++/template-reduction.C
@@ -1,7 +1,3 @@
-/* { dg-do run } */
-
-#include <cstdlib>
-
 const int n = 100;
 
 // Check explicit template copy map
@@ -85,17 +81,17 @@ main()
     }
 
   if (sum (a) != result)
-    abort ();
+    __builtin_abort ();
 
   if (sum<int> () != result)
-    abort ();
+    __builtin_abort ();
 
 #pragma acc enter data copyin (a)
   if (async_sum (a) != result)
-    abort ();
+    __builtin_abort ();
 
   if (async_sum<int> (1) != result)
-    abort ();
+    __builtin_abort ();
 #pragma acc exit data delete (a)
 
   return 0;
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
index f3b490a..d478ce2 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/asyncwait-1.c
@@ -1,6 +1,4 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
-/* <http://news.gmane.org/find-root.php?message_id=%3C87pp0aaksc.fsf%40kepler.schwinge.homeip.net%3E>.
-   { dg-xfail-run-if "TODO" { *-*-* } } */
 /* { dg-additional-options "-lcuda" } */
 
 #include <openacc.h>
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
index 45aa0bd..dad6d13 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/combined-directives-1.c
@@ -1,5 +1,7 @@
 /* This test exercises combined directives.  */
 
+/* { dg-do run } */
+
 #include <stdlib.h>
 
 int
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c
index 6e173d3..747109f 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c
@@ -25,7 +25,7 @@ main (int argc, char **argv)
     }
 
 #pragma acc enter data copyin (a[0:N]) copyin (b[0:N]) copyin (N) async
-#pragma acc parallel async wait present (a[0:N]) present (b[0:N]) present (N)
+#pragma acc parallel async wait
 #pragma acc loop
   for (i = 0; i < N; i++)
     b[i] = a[i];
@@ -49,7 +49,7 @@ main (int argc, char **argv)
     }
 
 #pragma acc update device (a[0:N], b[0:N]) async (1)
-#pragma acc parallel async (1) present (a[0:N]) present (b[0:N]) present (N)
+#pragma acc parallel async (1)
 #pragma acc loop
   for (i = 0; i < N; i++)
     b[i] = a[i];
@@ -78,17 +78,17 @@ main (int argc, char **argv)
 #pragma acc update device (b[0:N]) async (2)
 #pragma acc enter data copyin (c[0:N], d[0:N]) async (3)
 
-#pragma acc parallel async (1) wait (1,2) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (N)
+#pragma acc parallel async (1) wait (1,2)
 #pragma acc loop
   for (i = 0; i < N; i++)
     b[i] = (a[i] * a[i] * a[i]) / a[i];
 
-#pragma acc parallel async (2) wait (1,3) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (N)
+#pragma acc parallel async (2) wait (1,3)
 #pragma acc loop
   for (i = 0; i < N; i++)
     c[i] = (a[i] + a[i] + a[i] + a[i]) / a[i];
 
-#pragma acc parallel async (3) wait (1,3) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (N)
+#pragma acc parallel async (3) wait (1,3)
 #pragma acc loop
   for (i = 0; i < N; i++)
     d[i] = ((a[i] * a[i] + a[i]) / a[i]) - a[i];
@@ -123,19 +123,19 @@ main (int argc, char **argv)
 #pragma acc update device (a[0:N], b[0:N], c[0:N], d[0:N]) async (1)
 #pragma acc enter data copyin (e[0:N]) async (5)
 
-#pragma acc parallel async (1) wait (1) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N)
+#pragma acc parallel async (1) wait (1)
   for (int ii = 0; ii < N; ii++)
     b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii];
 
-#pragma acc parallel async (2) wait (1) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N)
+#pragma acc parallel async (2) wait (1)
   for (int ii = 0; ii < N; ii++)
     c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii];
 
-#pragma acc parallel async (3) wait (1) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N)
+#pragma acc parallel async (3) wait (1)
   for (int ii = 0; ii < N; ii++)
     d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii];
 
-#pragma acc parallel wait (1,5) async (4) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N)
+#pragma acc parallel wait (1,5) async (4)
   for (int ii = 0; ii < N; ii++)
     e[ii] = a[ii] + b[ii] + c[ii] + d[ii];
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
similarity index 86%
rename from libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-2.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
index ae92d0e..4044398 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
@@ -3,4 +3,4 @@
 /* Override the compiler's "avoid offloading" decision.
    { dg-additional-options "-foffload-force" } */
 
-#include "kernels-1.c"
+#include "data-clauses-kernels.c"
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
similarity index 71%
rename from libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
index ab3e496..2c9f7ae 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
@@ -1,7 +1,5 @@
 /* Override the compiler's "avoid offloading" decision.
    { dg-additional-options "-foffload-force" } */
 
-#include <stdlib.h>
-
-#define EXEC_DIRECTIVE kernels
+#define CONSTRUCT kernels
 #include "data-clauses.h"
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
similarity index 75%
rename from libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-2.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
index d9fff6f..ddcf4e3 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
@@ -1,4 +1,4 @@
 /* { dg-do run { target lto } } */
 /* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
 
-#include "parallel-1.c"
+#include "data-clauses-parallel.c"
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
new file mode 100644
index 0000000..e734b2f
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
@@ -0,0 +1,2 @@
+#define CONSTRUCT parallel
+#include "data-clauses.h"
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
index 8341053..d557bef 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
@@ -7,145 +7,145 @@ int main(void)
   i = -1;
   j = -2;
   v = 0;
-#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) copyin (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) copyin (i, j)
   {
     if (i != -1 || j != -2)
-      abort ();
+      __builtin_abort ();
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
 #if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 #else
   if (v != 1 || i != -1 || j != -2)
-    abort ();
+    __builtin_abort ();
 #endif
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) copyout (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) copyout (i, j)
   {
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) copy (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) copy (i, j)
   {
     if (i != -1 || j != -2)
-      abort ();
+      __builtin_abort ();
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) create (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) create (i, j)
   {
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
 #if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 #else
   if (v != 1 || i != -1 || j != -2)
-    abort ();
+    __builtin_abort ();
 #endif
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_copyin (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) present_or_copyin (i, j)
   {
     if (i != -1 || j != -2)
-      abort ();
+      __builtin_abort ();
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
   if (v != 1)
-    abort ();
+    __builtin_abort ();
 #if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 #else
   if (v != 1 || i != -1 || j != -2)
-    abort ();
+    __builtin_abort ();
 #endif
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_copyout (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) present_or_copyout (i, j)
   {
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_copy (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) present_or_copy (i, j)
   {
     if (i != -1 || j != -2)
-      abort ();
+      __builtin_abort ();
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 
   i = -1;
   j = -2;
   v = 0;
-#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_create (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) present_or_create (i, j)
   {
     i = 2;
     j = 1;
     if (i != 2 || j != 1)
-      abort ();
+      __builtin_abort ();
     v = 1;
   }
   if (v != 1)
-    abort ();
+    __builtin_abort ();
 #if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 #else
   if (v != 1 || i != -1 || j != -2)
-    abort ();
+    __builtin_abort ();
 #endif
 
   i = -1;
@@ -154,23 +154,23 @@ int main(void)
 
 #pragma acc data copyin (i, j)
   {
-#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present (i, j)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v) present (i, j)
     {
       if (i != -1 || j != -2)
-	abort ();
+	__builtin_abort ();
       i = 2;
       j = 1;
       if (i != 2 || j != 1)
-	abort ();
+	__builtin_abort ();
       v = 1;
     }
   }
 #if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 #else
   if (v != 1 || i != -1 || j != -2)
-    abort ();
+    __builtin_abort ();
 #endif
 
   i = -1;
@@ -179,23 +179,23 @@ int main(void)
 
 #pragma acc data copyin(i, j)
   {
-#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v)
+#pragma acc CONSTRUCT /* copyout */ present_or_copyout (v)
     {
       if (i != -1 || j != -2)
-	abort ();
+	__builtin_abort ();
       i = 2;
       j = 1;
       if (i != 2 || j != 1)
-	abort ();
+	__builtin_abort ();
       v = 1;
     }
   }
 #if ACC_MEM_SHARED
   if (v != 1 || i != 2 || j != 1)
-    abort ();
+    __builtin_abort ();
 #else
   if (v != 1 || i != -1 || j != -2)
-    abort ();
+    __builtin_abort ();
 #endif
 
   return 0;
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c
index 7f5d3d3..14bc3af 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c
@@ -1,8 +1,7 @@
-/* { dg-do run } */
-
 #include  <openacc.h>
 
-int main ()
+
+void t1 ()
 {
   int ok = 1;
   int val = 2;
@@ -28,14 +27,117 @@ int main ()
   if (ondev)
     {
       if (!ok)
-	return 1;
+	__builtin_abort ();
       if (val != 2)
-	return 1;
+	__builtin_abort ();
 
       for (int i = 0; i < 32; i++)
 	if (ary[i] != 2 + i)
-	  return 1;
+	  __builtin_abort ();
     }
-  
+}
+
+
+void t2 ()
+{
+  int ok = 1;
+  int val = 2;
+
+#pragma acc data copy(val)
+  {
+#pragma acc parallel present (val)
+    {
+      val = 7;
+    }
+
+#pragma acc parallel firstprivate (val) copy(ok)
+    {
+      ok  = val == 7;
+      val = 9;
+    }
+  }
+
+  if (!ok)
+    __builtin_abort ();
+  if (val != 7)
+    __builtin_abort ();
+}
+
+
+#define N 100
+void t3 ()
+{
+  int a, b[N], c, d, i;
+  int n = acc_get_device_type () == acc_device_nvidia ? N : 1;
+
+  a = 5;
+  for (i = 0; i < n; i++)
+    b[i] = -1;
+
+  #pragma acc parallel num_gangs (n) firstprivate (a)
+  #pragma acc loop gang
+  for (i = 0; i < n; i++)
+    {
+      a = a + i;
+      b[i] = a;
+    }
+
+  for (i = 0; i < n; i++)
+    if (a + i != b[i])
+      __builtin_abort ();
+
+  #pragma acc data copy (a)
+  {
+    #pragma acc parallel firstprivate (a) copyout (c)
+    {
+      a = 10;
+      c = a;
+    }
+
+    /* This version of 'a' should still be 5.  */
+    #pragma acc parallel copyout (d) present (a)
+    {
+      d = a;
+    }
+  }
+
+  if (c != 10)
+    __builtin_abort ();
+  if (d != 5)
+    __builtin_abort ();
+}
+#undef N
+
+
+void t4 ()
+{
+  int x = 5, i, arr[32];
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 3;
+
+#pragma acc parallel firstprivate(x) copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 119 } */
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 119 } */
+  {
+#pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      arr[i] += x;
+  }
+
+  for (i = 0; i < 32; i++)
+    if (arr[i] != 8)
+      __builtin_abort ();
+}
+
+
+int
+main()
+{
+  t1 ();
+  t2 ();
+  t3 ();
+  t4 ();
+
   return 0;
 }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c
deleted file mode 100644
index 672e412..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c
+++ /dev/null
@@ -1,29 +0,0 @@
-#include  <openacc.h>
-
-int main ()
-{
-  int ok = 1;
-  int val = 2;
-
-#pragma acc data copy(val)
-  {
-#pragma acc parallel present (val)
-    {
-      val = 7;
-    }
-
-#pragma acc parallel firstprivate (val) copy(ok)
-    {
-      ok  = val == 7;
-      val = 9;
-    }
-
-  }
-
-  if (!ok)
-    return 1;
-  if(val != 7)
-    return 1;
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-3.c
deleted file mode 100644
index 489d731..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-3.c
+++ /dev/null
@@ -1,31 +0,0 @@
-/* { dg-do run } */
-/* { dg-xfail-run-if "TODO" { openacc_host_selected } { "*" } { "" } } */
-
-#include <stdlib.h>
-
-#define n 100
-
-int
-main()
-{
-  int a, b[n], i;
-
-  a = 5;
-
-  for (i = 0; i < n; i++)
-    b[i] = -1;
-
-  #pragma acc parallel num_gangs (n) firstprivate (a)
-  #pragma acc loop gang
-  for (i = 0; i < n; i++)
-    {
-      a = a + i;
-      b[i] = a;
-    }
-
-  for (i = 0; i < n; i++)
-    if (a + i != b[i])
-      abort ();
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-4.c
deleted file mode 100644
index 69abb23..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-4.c
+++ /dev/null
@@ -1,54 +0,0 @@
-/* { dg-do run } */
-
-#include <stdlib.h>
-#include <openacc.h>
-
-#define N 100
-
-int
-main()
-{
-  int a, old_a,  b[N], c, d, i;
-  int n = acc_get_device_type () == acc_device_nvidia ? N : 1;
-
-  a = 5;
-
-  for (i = 0; i < n; i++)
-    b[i] = -1;
-
-  #pragma acc parallel num_gangs (n) firstprivate (a)
-  #pragma acc loop gang
-  for (i = 0; i < n; i++)
-    {
-      a = a + i;
-      b[i] = a;
-    }
-
-  for (i = 0; i < n; i++)
-    if (a + i != b[i])
-      abort ();
-
-  #pragma acc data copy (a)
-  {
-    #pragma acc parallel firstprivate (a) copyout (c)
-    {
-      a = 10;
-      c = a;
-    }
-
-    /* This version of 'a' should still be 5.  */
-    #pragma acc parallel copyout (d) present (a)
-    {
-      d = a;
-    }
-
-  }
-
-  if (c != 10)
-    abort ();
-
-  if (d != 5)
-    abort ();
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-clauses.c
similarity index 90%
rename from libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-4.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-clauses.c
index 1f03d47..2c42497 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-4.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-loop-clauses.c
@@ -1,8 +1,6 @@
 /* Exercise the auto, independent, seq and tile loop clauses inside
    kernels regions.  */
 
-/* { dg-prune-output "insufficient partitioning available to parallelize loop" } */
-
 #include <assert.h>
 
 #define N 100
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c
index 0248ad7..db39647 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c
@@ -1,4 +1,3 @@
-/* { dg-additional-options "-w" } */
 /* This code uses nvptx inline assembly guarded with acc_on_device, which is
    not optimized away at -O0, and then confuses the target assembler.
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
@@ -103,6 +102,7 @@ int vector_1 (int *ary, int size)
   clear (ary, size);
   
 #pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
+  /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 104 } */
   {
 #pragma acc loop gang
     for (int jx = 0; jx < 1; jx++)
@@ -153,6 +153,7 @@ int gang_1 (int *ary, int size)
   clear (ary, size);
   
 #pragma acc parallel num_gangs (32) num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 155 } */
   {
 #pragma acc loop auto
     for (int jx = 0; jx <  size  / 64; jx++)
@@ -187,6 +188,7 @@ int gang_3 (int *ary, int size)
   clear (ary, size);
   
 #pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) firstprivate (size)
+  /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 190 } */
   {
 #pragma acc loop auto
     for (int jx = 0; jx <  size  / 64; jx++)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c
deleted file mode 100644
index 55ab3c9..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c
+++ /dev/null
@@ -1,45 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test of reduction on loop directive (gangs, non-private reduction
-   variable).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, arr[1024], res = 0, hres = 0;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res)
-  {
-    #pragma acc loop gang reduction(+:res)
-    for (i = 0; i < 1024; i++)
-      res += arr[i];
-  }
-
-  for (i = 0; i < 1024; i++)
-    hres += arr[i];
-
-  assert (res == hres);
-
-  res = hres = 1;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res)
-  {
-    #pragma acc loop gang reduction(*:res)
-    for (i = 0; i < 12; i++)
-      res *= arr[i];
-  }
-
-  for (i = 0; i < 12; i++)
-    hres *= arr[i];
-
-  assert (res == hres);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c
deleted file mode 100644
index d4341e9..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c
+++ /dev/null
@@ -1,30 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test of reduction on loop directive (gangs and vectors, non-private
-   reduction variable).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, arr[1024], res = 0, hres = 0;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res)
-  {
-    #pragma acc loop gang vector reduction(+:res)
-    for (i = 0; i < 1024; i++)
-      res += arr[i];
-  }
-
-  for (i = 0; i < 1024; i++)
-    hres += arr[i];
-
-  assert (res == hres);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c
deleted file mode 100644
index 2e5668b..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c
+++ /dev/null
@@ -1,30 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test of reduction on loop directive (gangs and workers, non-private
-   reduction variable).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, arr[1024], res = 0, hres = 0;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res)
-  {
-    #pragma acc loop gang worker reduction(+:res)
-    for (i = 0; i < 1024; i++)
-      res += arr[i];
-  }
-
-  for (i = 0; i < 1024; i++)
-    hres += arr[i];
-
-  assert (res == hres);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c
deleted file mode 100644
index d610373..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c
+++ /dev/null
@@ -1,28 +0,0 @@
-#include <assert.h>
-
-/* Test of reduction on loop directive (gangs, workers and vectors, non-private
-   reduction variable).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, arr[1024], res = 0, hres = 0;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res)
-  {
-    #pragma acc loop gang worker vector reduction(+:res)
-    for (i = 0; i < 1024; i++)
-      res += arr[i];
-  }
-
-  for (i = 0; i < 1024; i++)
-    hres += arr[i];
-
-  assert (res == hres);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c
deleted file mode 100644
index ea5c151..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c
+++ /dev/null
@@ -1,34 +0,0 @@
-#include <assert.h>
-
-/* Test of reduction on loop directive (gangs, workers and vectors, non-private
-   reduction variable: separate gang and worker/vector loops).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, j, arr[32768], res = 0, hres = 0;
-
-  for (i = 0; i < 32768; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res)
-  {
-    #pragma acc loop gang reduction(+:res)
-    for (j = 0; j < 32; j++)
-      {
-        #pragma acc loop worker vector reduction(+:res)
-        for (i = 0; i < 1024; i++)
-	  res += arr[j * 1024 + i];
-      }
-    /* "res" is non-private, and is not available until after the parallel
-       region.  */
-  }
-
-  for (i = 0; i < 32768; i++)
-    hres += arr[i];
-
-  assert (res == hres);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c
deleted file mode 100644
index 0056f3c..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c
+++ /dev/null
@@ -1,33 +0,0 @@
-#include <assert.h>
-
-/* Test of reduction on loop directive (gangs, workers and vectors, non-private
-   reduction variable: separate gang and worker/vector loops).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, j;
-  double arr[32768], res = 0, hres = 0;
-
-  for (i = 0; i < 32768; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copyin(arr) copy(res)
-  {
-    #pragma acc loop gang reduction(+:res)
-    for (j = 0; j < 32; j++)
-      {
-        #pragma acc loop worker vector reduction(+:res)
-        for (i = 0; i < 1024; i++)
-	  res += arr[j * 1024 + i];
-      }
-  }
-
-  for (i = 0; i < 32768; i++)
-    hres += arr[i];
-
-  assert (res == hres);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c
deleted file mode 100644
index e69d0ec..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c
+++ /dev/null
@@ -1,55 +0,0 @@
-#include <assert.h>
-
-/* Test of reduction on loop directive (gangs, workers and vectors, multiple
-   non-private reduction variables, float type).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, j;
-  float arr[32768];
-  float res = 0, mres = 0, hres = 0, hmres = 0;
-
-  for (i = 0; i < 32768; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res, mres)
-  {
-    #pragma acc loop gang reduction(+:res) reduction(max:mres)
-    for (j = 0; j < 32; j++)
-      {
-	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
-	for (i = 0; i < 1024; i++)
-	  {
-	    res += arr[j * 1024 + i];
-	    if (arr[j * 1024 + i] > mres)
-	      mres = arr[j * 1024 + i];
-	  }
-
-	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
-	for (i = 0; i < 1024; i++)
-	  {
-	    res += arr[j * 1024 + (1023 - i)];
-	    if (arr[j * 1024 + (1023 - i)] > mres)
-	      mres = arr[j * 1024 + (1023 - i)];
-	  }
-      }
-  }
-
-  for (j = 0; j < 32; j++)
-    for (i = 0; i < 1024; i++)
-      {
-        hres += arr[j * 1024 + i];
-	hres += arr[j * 1024 + (1023 - i)];
-	if (arr[j * 1024 + i] > hmres)
-	  hmres = arr[j * 1024 + i];
-	if (arr[j * 1024 + (1023 - i)] > hmres)
-	  hmres = arr[j * 1024 + (1023 - i)];
-      }
-
-  assert (res == hres);
-  assert (mres == hmres);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c
deleted file mode 100644
index dd181ef..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c
+++ /dev/null
@@ -1,43 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test of reduction on loop directive (vectors, private reduction
-   variable).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, j, arr[1024], out[32], res = 0, hres = 0;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       private(res) copyout(out)
-  {
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-        res = 0;
-
-	#pragma acc loop vector reduction(+:res)
-	for (i = 0; i < 32; i++)
-	  res += arr[j * 32 + i];
-	
-	out[j] = res;
-      }
-  }
-
-  for (j = 0; j < 32; j++)
-    {
-      hres = 0;
-      
-      for (i = 0; i < 32; i++)
-	hres += arr[j * 32 + i];
-
-      assert (out[j] == hres);
-    }
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c
deleted file mode 100644
index 15f0053..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c
+++ /dev/null
@@ -1,41 +0,0 @@
-#include <assert.h>
-
-/* Test of reduction on loop directive (vector reduction in
-   gang-partitioned/worker-partitioned mode, private reduction variable).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, j, k;
-  double ina[1024], inb[1024], out[1024], acc;
-
-  for (j = 0; j < 32; j++)
-    for (i = 0; i < 32; i++)
-      {
-        ina[j * 32 + i] = (i == j) ? 2.0 : 0.0;
-	inb[j * 32 + i] = (double) (i + j);
-      }
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       private(acc) copyin(ina, inb) copyout(out)
-  {
-    #pragma acc loop gang worker
-    for (k = 0; k < 32; k++)
-      for (j = 0; j < 32; j++)
-        {
-	  acc = 0;
-
-	  #pragma acc loop vector reduction(+:acc)
-	  for (i = 0; i < 32; i++)
-	    acc += ina[k * 32 + i] * inb[i * 32 + j];
-
-	  out[k * 32 + j] = acc;
-	}
-  }
-
-  for (j = 0; j < 32; j++)
-    for (i = 0; i < 32; i++)
-      assert (out[j * 32 + i] == (i + j) * 2);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c
deleted file mode 100644
index 4864acd..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c
+++ /dev/null
@@ -1,43 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test of reduction on loop directive (workers, private reduction
-   variable).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, j, arr[1024], out[32], res = 0, hres = 0;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       private(res) copyout(out)
-  {
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-        res = 0;
-
-	#pragma acc loop worker reduction(+:res)
-	for (i = 0; i < 32; i++)
-	  res += arr[j * 32 + i];
-	
-	out[j] = res;
-      }
-  }
-
-  for (j = 0; j < 32; j++)
-    {
-      hres = 0;
-      
-      for (i = 0; i < 32; i++)
-	hres += arr[j * 32 + i];
-
-      assert (out[j] == hres);
-    }
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c
deleted file mode 100644
index 2765908..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c
+++ /dev/null
@@ -1,41 +0,0 @@
-#include <assert.h>
-
-/* Test of reduction on loop directive (workers and vectors, private reduction
-   variable).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, j, arr[1024], out[32], res = 0, hres = 0;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       private(res) copyout(out)
-  {
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-        res = 0;
-
-	#pragma acc loop worker vector reduction(+:res)
-	for (i = 0; i < 32; i++)
-	  res += arr[j * 32 + i];
-	
-	out[j] = res;
-      }
-  }
-
-  for (j = 0; j < 32; j++)
-    {
-      hres = 0;
-      
-      for (i = 0; i < 32; i++)
-	hres += arr[j * 32 + i];
-
-      assert (out[j] == hres);
-    }
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c
deleted file mode 100644
index c30b0e7..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c
+++ /dev/null
@@ -1,45 +0,0 @@
-#include <assert.h>
-
-/* Test of reduction on loop directive (workers and vectors, private reduction
-   variable).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, j, arr[32768], out[32], res = 0, hres = 0;
-
-  for (i = 0; i < 32768; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       private(res) copyout(out)
-  {
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-        res = j;
-
-	#pragma acc loop worker reduction(+:res)
-	for (i = 0; i < 1024; i++)
-	  res += arr[j * 1024 + i];
-
-	#pragma acc loop vector reduction(+:res)
-	for (i = 1023; i >= 0; i--)
-	  res += arr[j * 1024 + i];
-
-	out[j] = res;
-      }
-  }
-
-  for (j = 0; j < 32; j++)
-    {
-      hres = j;
-      
-      for (i = 0; i < 1024; i++)
-	hres += arr[j * 1024 + i] * 2;
-
-      assert (out[j] == hres);
-    }
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c
deleted file mode 100644
index b5e28fb..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c
+++ /dev/null
@@ -1,38 +0,0 @@
-#include <assert.h>
-
-/* Test of reduction on loop directive (workers and vectors, private reduction
-   variable: gang-redundant mode).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, arr[1024], out[32], res = 0, hres = 0;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = i ^ 33;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       private(res) copyin(arr) copyout(out)
-  {
-    /* Private variables aren't initialized by default in openacc.  */
-    res = 0;
-
-    /* "res" should be available at the end of the following loop (and should
-       have the same value redundantly in each gang).  */
-    #pragma acc loop worker vector reduction(+:res)
-    for (i = 0; i < 1024; i++)
-      res += arr[i];
-    
-    #pragma acc loop gang (static: 1)
-    for (i = 0; i < 32; i++)
-      out[i] = res;
-  }
-
-  for (i = 0; i < 1024; i++)
-    hres += arr[i];
-
-  for (i = 0; i < 32; i++)
-    assert (out[i] == hres);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c
index 34c00a3..30e8e78 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c
@@ -1,4 +1,3 @@
-/* { dg-additional-options "-w" } */
 /* This code uses nvptx inline assembly guarded with acc_on_device, which is
    not optimized away at -O0, and then confuses the target assembler.
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
@@ -17,6 +16,7 @@ int main ()
     ary[ix] = -1;
   
 #pragma acc parallel num_workers(32) vector_length(32) copy(ary) copy(ondev)
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 18 } */
   {
 #pragma acc loop worker
     for (unsigned ix = 0; ix < N; ix++)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/mode-transitions.c libgomp/testsuite/libgomp.oacc-c-c++-common/mode-transitions.c
new file mode 100644
index 0000000..f62daf0
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/mode-transitions.c
@@ -0,0 +1,1186 @@
+/* Miscellaneous test cases for gang/worker/vector mode transitions.  */
+
+#include <assert.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <math.h>
+#include <openacc.h>
+
+
+/* Test basic vector-partitioned mode transitions.  */
+
+void t1()
+{
+  int n = 0, arr[32], i;
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 0;
+
+  #pragma acc parallel copy(n, arr) \
+		       num_gangs(1) num_workers(1) vector_length(32)
+  {
+    int j;
+    n++;
+    #pragma acc loop vector
+    for (j = 0; j < 32; j++)
+      arr[j]++;
+    n++;
+  }
+
+  assert (n == 2);
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == 1);
+}
+
+
+/* Test vector-partitioned, gang-partitioned mode.  */
+
+void t2()
+{
+  int n[32], arr[1024], i;
+  
+  for (i = 0; i < 1024; i++)
+    arr[i] = 0;
+
+  for (i = 0; i < 32; i++)
+    n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) \
+		       num_gangs(32) num_workers(1) vector_length(32)
+  {
+    int j, k;
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      #pragma acc loop vector
+      for (k = 0; k < 32; k++)
+	arr[j * 32 + k]++;
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (n[i] == 2);
+
+  for (i = 0; i < 1024; i++)
+    assert (arr[i] == 1);
+}
+
+
+/* Test conditional vector-partitioned loops.  */
+
+void t3()
+{
+  int n[32], arr[1024], i;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = 0;
+
+  for (i = 0; i < 32; i++)
+    n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) \
+		       num_gangs(32) num_workers(1) vector_length(32)
+  {
+    int j, k;
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	if ((j % 2) == 0)
+	  {
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[j * 32 + k]++;
+	  }
+	else
+	  {
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[j * 32 + k]--;
+	  }
+      }
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (n[i] == 2);
+
+  for (i = 0; i < 1024; i++)
+    assert (arr[i] == ((i % 64) < 32) ? 1 : -1);
+}
+
+
+/* Test conditions inside vector-partitioned loops.  */
+
+void t4()
+{
+  int n[32], arr[1024], i;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  for (i = 0; i < 32; i++)
+    n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) \
+		       num_gangs(32) num_workers(1) vector_length(32)
+  {
+    int j, k;
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop vector
+	for (k = 0; k < 32; k++)
+	  if ((arr[j * 32 + k] % 2) != 0)
+	    arr[j * 32 + k] *= 2;
+      }
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (n[i] == 2);
+
+  for (i = 0; i < 1024; i++)
+    assert (arr[i] == ((i % 2) == 0 ? i : i * 2));
+}
+
+
+/* Test conditions inside gang-partitioned/vector-partitioned loops.  */
+
+void t5()
+{
+  int n[32], arr[1024], i;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  for (i = 0; i < 32; i++)
+    n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) \
+		       num_gangs(32) num_workers(1) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+
+    #pragma acc loop gang vector
+    for (j = 0; j < 1024; j++)
+      if ((arr[j] % 2) != 0)
+	arr[j] *= 2;
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (n[i] == 2);
+
+  for (i = 0; i < 1024; i++)
+    assert (arr[i] == ((i % 2) == 0 ? i : i * 2));
+}
+
+
+/* Test switch containing vector-partitioned loops inside gang-partitioned
+   loops.  */
+
+void t6()
+{
+  int n[32], arr[1024], i;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = 0;
+
+  for (i = 0; i < 32; i++)
+    n[i] = i % 5;
+
+  #pragma acc parallel copy(n, arr) \
+		       num_gangs(32) num_workers(1) vector_length(32)
+  {
+    int j, k;
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      switch (n[j])
+	{
+	case 1:
+	  #pragma acc loop vector
+	  for (k = 0; k < 32; k++)
+	    arr[j * 32 + k] += 1;
+	  break;
+
+	case 2:
+	  #pragma acc loop vector
+	  for (k = 0; k < 32; k++)
+	    arr[j * 32 + k] += 2;
+	  break;
+
+	case 3:
+	  #pragma acc loop vector
+	  for (k = 0; k < 32; k++)
+	    arr[j * 32 + k] += 3;
+	  break;
+
+	case 4:
+	  #pragma acc loop vector
+	  for (k = 0; k < 32; k++)
+	    arr[j * 32 + k] += 4;
+	  break;
+
+	case 5:
+	  #pragma acc loop vector
+	  for (k = 0; k < 32; k++)
+	    arr[j * 32 + k] += 5;
+	  break;
+
+	default:
+	  abort ();
+	}
+
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 32; j++)
+      n[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (n[i] == (i % 5) + 2);
+
+  for (i = 0; i < 1024; i++)
+    assert (arr[i] == ((i / 32) % 5) + 1);
+}
+
+
+/* Test trivial operation of vector-single mode.  */
+
+void t7()
+{
+  int n = 0;
+  #pragma acc parallel copy(n) \
+		       num_gangs(1) num_workers(1) vector_length(32)
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 288 } */
+  {
+    n++;
+  }
+  assert (n == 1);
+}
+
+
+/* Test vector-single, gang-partitioned mode.  */
+
+void t8()
+{
+  int arr[1024];
+  int gangs;
+
+  for (gangs = 1; gangs <= 1024; gangs <<= 1)
+    {
+      int i;
+
+      for (i = 0; i < 1024; i++)
+	arr[i] = 0;
+
+      #pragma acc parallel copy(arr) \
+			   num_gangs(gangs) num_workers(1) vector_length(32)
+      /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 312 } */
+      {
+	int j;
+	#pragma acc loop gang
+	for (j = 0; j < 1024; j++)
+	  arr[j]++;
+      }
+
+      for (i = 0; i < 1024; i++)
+	assert (arr[i] == 1);
+    }
+}
+
+
+/* Test conditions in vector-single mode.  */
+
+void t9()
+{
+  int arr[1024];
+  int gangs;
+
+  for (gangs = 1; gangs <= 1024; gangs <<= 1)
+    {
+      int i;
+
+      for (i = 0; i < 1024; i++)
+	arr[i] = 0;
+
+      #pragma acc parallel copy(arr) \
+			   num_gangs(gangs) num_workers(1) vector_length(32)
+      /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 342 } */
+      {
+	int j;
+	#pragma acc loop gang
+	for (j = 0; j < 1024; j++)
+	  if ((j % 3) == 0)
+	    arr[j]++;
+	  else
+	    arr[j] += 2;
+      }
+
+      for (i = 0; i < 1024; i++)
+	assert (arr[i] == ((i % 3) == 0) ? 1 : 2);
+    }
+}
+
+
+/* Test switch in vector-single mode.  */
+
+void t10()
+{
+  int arr[1024];
+  int gangs;
+
+  for (gangs = 1; gangs <= 1024; gangs <<= 1)
+    {
+      int i;
+
+      for (i = 0; i < 1024; i++)
+	arr[i] = 0;
+
+      #pragma acc parallel copy(arr) \
+			   num_gangs(gangs) num_workers(1) vector_length(32)
+      /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 375 } */
+      {
+	int j;
+	#pragma acc loop gang
+	for (j = 0; j < 1024; j++)
+	  switch (j % 5)
+	    {
+	    case 0: arr[j] += 1; break;
+	    case 1: arr[j] += 2; break;
+	    case 2: arr[j] += 3; break;
+	    case 3: arr[j] += 4; break;
+	    case 4: arr[j] += 5; break;
+	    default: arr[j] += 99;
+	    }
+      }
+
+      for (i = 0; i < 1024; i++)
+	assert (arr[i] == (i % 5) + 1);
+    }
+}
+
+
+/* Test switch in vector-single mode, initialise array on device.  */
+
+void t11()
+{
+  int arr[1024];
+  int i;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = 99;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(1024) num_workers(1) vector_length(32)
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 409 } */
+  {
+    int j;
+
+    /* This loop and the one following must be distributed to available gangs
+       in the same way to ensure data dependencies are not violated (hence the
+       "static" clauses).  */
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 1024; j++)
+      arr[j] = 0;
+    
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < 1024; j++)
+      switch (j % 5)
+	{
+	case 0: arr[j] += 1; break;
+	case 1: arr[j] += 2; break;
+	case 2: arr[j] += 3; break;
+	case 3: arr[j] += 4; break;
+	case 4: arr[j] += 5; break;
+	default: arr[j] += 99;
+	}
+  }
+
+  for (i = 0; i < 1024; i++)
+    assert (arr[i] == (i % 5) + 1);
+}
+
+
+/* Test multiple conditions in vector-single mode.  */
+
+#define NUM_GANGS 4096
+void t12()
+{
+  bool fizz[NUM_GANGS], buzz[NUM_GANGS], fizzbuzz[NUM_GANGS];
+  int i;
+
+  #pragma acc parallel copyout(fizz, buzz, fizzbuzz) \
+		       num_gangs(NUM_GANGS) num_workers(1) vector_length(32)
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 448 } */
+  {
+    int j;
+    
+    /* This loop and the one following must be distributed to available gangs
+       in the same way to ensure data dependencies are not violated (hence the
+       "static" clauses).  */
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < NUM_GANGS; j++)
+      fizz[j] = buzz[j] = fizzbuzz[j] = 0;
+    
+    #pragma acc loop gang(static:*)
+    for (j = 0; j < NUM_GANGS; j++)
+      {
+	if ((j % 3) == 0 && (j % 5) == 0)
+	  fizzbuzz[j] = 1;
+	else
+	  {
+	    if ((j % 3) == 0)
+	      fizz[j] = 1;
+	    else if ((j % 5) == 0)
+	      buzz[j] = 1;
+	  }
+      }
+  }
+
+  for (i = 0; i < NUM_GANGS; i++)
+    {
+      assert (fizzbuzz[i] == ((i % 3) == 0 && (i % 5) == 0));
+      assert (fizz[i] == ((i % 3) == 0 && (i % 5) != 0));
+      assert (buzz[i] == ((i % 3) != 0 && (i % 5) == 0));
+    }
+}
+#undef NUM_GANGS
+
+
+/* Test worker-partitioned/vector-single mode.  */
+
+void t13()
+{
+  int arr[32 * 8], i;
+
+  for (i = 0; i < 32 * 8; i++)
+    arr[i] = 0;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 495 } */
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	int k;
+	#pragma acc loop worker
+	for (k = 0; k < 8; k++)
+          arr[j * 8 + k] += j * 8 + k;
+      }
+  }
+
+  for (i = 0; i < 32 * 8; i++)
+    assert (arr[i] == i);
+}
+
+
+/* Test condition in worker-partitioned mode.  */
+
+void t14()
+{
+  int arr[32 * 32 * 8], i;
+
+  for (i = 0; i < 32 * 32 * 8; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	int k;
+	#pragma acc loop worker
+	for (k = 0; k < 8; k++)
+	  {
+	    int m;
+	    if ((k % 2) == 0)
+	      {
+		#pragma acc loop vector
+		for (m = 0; m < 32; m++)
+		  arr[j * 32 * 8 + k * 32 + m]++;
+	      }
+	    else
+	      {
+		#pragma acc loop vector
+		for (m = 0; m < 32; m++)
+		  arr[j * 32 * 8 + k * 32 + m] += 2;
+	      }
+	  }
+      }
+  }
+
+  for (i = 0; i < 32 * 32 * 8; i++)
+    assert (arr[i] == i + ((i / 32) % 2) + 1);
+}
+
+
+/* Test switch in worker-partitioned mode.  */
+
+void t15()
+{
+  int arr[32 * 32 * 8], i;
+
+  for (i = 0; i < 32 * 32 * 8; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	int k;
+	#pragma acc loop worker
+	for (k = 0; k < 8; k++)
+	  {
+	    int m;
+	    switch ((j * 32 + k) % 3)
+	    {
+	    case 0:
+	      #pragma acc loop vector
+	      for (m = 0; m < 32; m++)
+		arr[j * 32 * 8 + k * 32 + m]++;
+	      break;
+
+	    case 1:
+	      #pragma acc loop vector
+	      for (m = 0; m < 32; m++)
+		arr[j * 32 * 8 + k * 32 + m] += 2;
+	      break;
+
+	    case 2:
+	      #pragma acc loop vector
+	      for (m = 0; m < 32; m++)
+		arr[j * 32 * 8 + k * 32 + m] += 3;
+	      break;
+
+	    default: ;
+	    }
+	  }
+      }
+  }
+
+  for (i = 0; i < 32 * 32 * 8; i++)
+    assert (arr[i] == i + ((i / 32) % 3) + 1);
+}
+
+
+/* Test worker-single/worker-partitioned transitions.  */
+
+void t16()
+{
+  int n[32], arr[32 * 32], i;
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = 0;
+
+  for (i = 0; i < 32; i++)
+    n[i] = 0;
+
+  #pragma acc parallel copy(n, arr) \
+		       num_gangs(8) num_workers(16) vector_length(32)
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 621 } */
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	int k;
+
+	n[j]++;
+
+	#pragma acc loop worker
+	for (k = 0; k < 32; k++)
+          arr[j * 32 + k]++;
+
+	n[j]++;
+
+	#pragma acc loop worker
+	for (k = 0; k < 32; k++)
+          arr[j * 32 + k]++;
+
+	n[j]++;
+
+	#pragma acc loop worker
+	for (k = 0; k < 32; k++)
+          arr[j * 32 + k]++;
+
+	n[j]++;
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (n[i] == 4);
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == 3);
+}
+
+
+/* Test correct synchronisation between worker-partitioned loops.  */
+
+void t17()
+{
+  int arr_a[32 * 32], arr_b[32 * 32], i;
+  int num_workers, num_gangs;
+
+  for (num_workers = 1; num_workers <= 32; num_workers <<= 1)
+    for (num_gangs = 1; num_gangs <= 32; num_gangs <<= 1)
+      {
+	for (i = 0; i < 32 * 32; i++)
+	  arr_a[i] = i;
+
+	#pragma acc parallel copyin(arr_a) copyout(arr_b) \
+			     num_gangs(num_gangs) num_workers(num_workers) vector_length(32)
+	/* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 674 } */
+	{
+	  int j;
+	  #pragma acc loop gang
+	  for (j = 0; j < 32; j++)
+	    {
+	      int k;
+
+	      #pragma acc loop worker
+	      for (k = 0; k < 32; k++)
+        	arr_b[j * 32 + (31 - k)] = arr_a[j * 32 + k] * 2;
+
+	      #pragma acc loop worker
+	      for (k = 0; k < 32; k++)
+        	arr_a[j * 32 + (31 - k)] = arr_b[j * 32 + k] * 2;
+
+	      #pragma acc loop worker
+	      for (k = 0; k < 32; k++)
+        	arr_b[j * 32 + (31 - k)] = arr_a[j * 32 + k] * 2;
+	    }
+	}
+
+	for (i = 0; i < 32 * 32; i++)
+	  assert (arr_b[i] == (i ^ 31) * 8);
+      }
+}
+
+
+/* Test correct synchronisation between worker+vector-partitioned loops.  */
+
+void t18()
+{
+  int arr_a[32 * 32 * 32], arr_b[32 * 32 * 32], i;
+  int num_workers, num_gangs;
+
+  for (num_workers = 1; num_workers <= 32; num_workers <<= 1)
+    for (num_gangs = 1; num_gangs <= 32; num_gangs <<= 1)
+      {
+	for (i = 0; i < 32 * 32 * 32; i++)
+	  arr_a[i] = i;
+
+	#pragma acc parallel copyin(arr_a) copyout(arr_b) \
+			     num_gangs(num_gangs) num_workers(num_workers) vector_length(32)
+	{
+	  int j;
+	  #pragma acc loop gang
+	  for (j = 0; j < 32; j++)
+	    {
+	      int k;
+
+	      #pragma acc loop worker vector
+	      for (k = 0; k < 32 * 32; k++)
+        	arr_b[j * 32 * 32 + (1023 - k)] = arr_a[j * 32 * 32 + k] * 2;
+
+	      #pragma acc loop worker vector
+	      for (k = 0; k < 32 * 32; k++)
+        	arr_a[j * 32 * 32 + (1023 - k)] = arr_b[j * 32 * 32 + k] * 2;
+
+	      #pragma acc loop worker vector
+	      for (k = 0; k < 32 * 32; k++)
+        	arr_b[j * 32 * 32 + (1023 - k)] = arr_a[j * 32 * 32 + k] * 2;
+	    }
+	}
+
+	for (i = 0; i < 32 * 32 * 32; i++)
+	  assert (arr_b[i] == (i ^ 1023) * 8);
+      }
+}
+
+
+/* Test correct synchronisation between vector-partitioned loops in
+   worker-partitioned mode.  */
+
+void t19()
+{
+  int n[32 * 32], arr_a[32 * 32 * 32], arr_b[32 * 32 * 32], i;
+  int num_workers, num_gangs;
+
+  for (num_workers = 1; num_workers <= 32; num_workers <<= 1)
+    for (num_gangs = 1; num_gangs <= 32; num_gangs <<= 1)
+      {
+	for (i = 0; i < 32 * 32 * 32; i++)
+	  arr_a[i] = i;
+
+	for (i = 0; i < 32 * 32; i++)
+          n[i] = 0;
+
+	#pragma acc parallel copy (n) copyin(arr_a) copyout(arr_b) \
+			     num_gangs(num_gangs) num_workers(num_workers) vector_length(32)
+	{
+	  int j;
+	  #pragma acc loop gang
+	  for (j = 0; j < 32; j++)
+	    {
+	      int k;
+
+	      #pragma acc loop worker
+	      for (k = 0; k < 32; k++)
+		{
+		  int m;
+
+		  n[j * 32 + k]++;
+
+		  #pragma acc loop vector
+		  for (m = 0; m < 32; m++)
+		    {
+	              if (((j * 1024 + k * 32 + m) % 2) == 0)
+			arr_b[j * 1024 + k * 32 + (31 - m)]
+			  = arr_a[j * 1024 + k * 32 + m] * 2;
+		      else
+			arr_b[j * 1024 + k * 32 + (31 - m)]
+			  = arr_a[j * 1024 + k * 32 + m] * 3;
+		    }
+
+		  /* Test returning to vector-single mode...  */
+		  n[j * 32 + k]++;
+
+		  #pragma acc loop vector
+		  for (m = 0; m < 32; m++)
+		    {
+	              if (((j * 1024 + k * 32 + m) % 3) == 0)
+			arr_a[j * 1024 + k * 32 + (31 - m)]
+			  = arr_b[j * 1024 + k * 32 + m] * 5;
+		      else
+			arr_a[j * 1024 + k * 32 + (31 - m)]
+			  = arr_b[j * 1024 + k * 32 + m] * 7;
+		    }
+
+		  /* ...and back-to-back vector loops.  */
+
+		  #pragma acc loop vector
+		  for (m = 0; m < 32; m++)
+		    {
+	              if (((j * 1024 + k * 32 + m) % 2) == 0)
+			arr_b[j * 1024 + k * 32 + (31 - m)]
+			  = arr_a[j * 1024 + k * 32 + m] * 3;
+		      else
+			arr_b[j * 1024 + k * 32 + (31 - m)]
+			  = arr_a[j * 1024 + k * 32 + m] * 2;
+		    }
+		}
+	    }
+	}
+
+	for (i = 0; i < 32 * 32; i++)
+          assert (n[i] == 2);
+
+	for (i = 0; i < 32 * 32 * 32; i++)
+          {
+	    int m = 6 * ((i % 3) == 0 ? 5 : 7);
+	    assert (arr_b[i] == (i ^ 31) * m);
+	  }
+      }
+}
+
+
+/* With -O0, variables are on the stack, not in registers.  Check that worker
+   state propagation handles the stack frame.  */
+
+void t20()
+{
+  int w0 = 0;
+  int w1 = 0;
+  int w2 = 0;
+  int w3 = 0;
+  int w4 = 0;
+  int w5 = 0;
+  int w6 = 0;
+  int w7 = 0;
+
+  int i;
+
+#pragma acc parallel copy (w0, w1, w2, w3, w4, w5, w6, w7) \
+		     num_gangs (1) num_workers (8)
+  {
+    int internal = 100;
+
+#pragma acc loop worker
+    for (i = 0; i < 8; i++)
+      {
+	switch (i)
+	  {
+	  case 0: w0 = internal; break;
+	  case 1: w1 = internal; break;
+	  case 2: w2 = internal; break;
+	  case 3: w3 = internal; break;
+	  case 4: w4 = internal; break;
+	  case 5: w5 = internal; break;
+	  case 6: w6 = internal; break;
+	  case 7: w7 = internal; break;
+	  default: break;
+	  }
+      }
+  }
+
+  if (w0 != 100
+      || w1 != 100
+      || w2 != 100
+      || w3 != 100
+      || w4 != 100
+      || w5 != 100
+      || w6 != 100
+      || w7 != 100)
+    __builtin_abort ();
+}
+
+
+/* Test worker-single/vector-single mode.  */
+
+void t21()
+{
+  int arr[32], i;
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 0;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 892 } */
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 892 } */
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      arr[j]++;
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == 1);
+}
+
+
+/* Test worker-single/vector-single mode.  */
+
+void t22()
+{
+  int arr[32], i;
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 0;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 917 } */
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 917 } */
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc atomic
+	arr[j]++;
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == 1);
+}
+
+
+/* Test condition in worker-single/vector-single mode.  */
+
+void t23()
+{
+  int arr[32], i;
+
+  for (i = 0; i < 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 945 } */
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 945 } */
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      if ((arr[j] % 2) != 0)
+	arr[j]++;
+      else
+	arr[j] += 2;
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == ((i % 2) != 0) ? i + 1 : i + 2);
+}
+
+
+/* Test switch in worker-single/vector-single mode.  */
+
+void t24()
+{
+  int arr[32], i;
+
+  for (i = 0; i < 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 973 } */
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 973 } */
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      switch (arr[j] % 5)
+	{
+	case 0: arr[j] += 1; break;
+	case 1: arr[j] += 2; break;
+	case 2: arr[j] += 3; break;
+	case 3: arr[j] += 4; break;
+	case 4: arr[j] += 5; break;
+	default: arr[j] += 99;
+	}
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == i + (i % 5) + 1);
+}
+
+
+/* Test worker-single/vector-partitioned mode.  */
+
+void t25()
+{
+  int arr[32 * 32], i;
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 1006 } */
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	int k;
+	#pragma acc loop vector
+	for (k = 0; k < 32; k++)
+	  {
+	    #pragma acc atomic
+	    arr[j * 32 + k]++;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + 1);
+}
+
+
+/* Test multiple conditional vector-partitioned loops in worker-single
+   mode.  */
+
+void t26()
+{
+  int arr[32 * 32], i;
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) \
+		       num_gangs(8) num_workers(8) vector_length(32)
+  /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 1039 } */
+  {
+    int j;
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+	int k;
+	if ((j % 3) == 0)
+	  {
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      {
+		#pragma acc atomic
+		arr[j * 32 + k] += 3;
+	      }
+	  }
+	else if ((j % 3) == 1)
+	  {
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      {
+		#pragma acc atomic
+		arr[j * 32 + k] += 7;
+	      }
+	  }
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    {
+      int j = (i / 32) % 3;
+      assert (arr[i] == i + ((j == 0) ? 3 : (j == 1) ? 7 : 0));
+    }
+}
+
+
+/* Test worker-single, vector-partitioned, gang-redundant mode.  */
+
+#define ACTUAL_GANGS 8
+void t27()
+{
+  int n, arr[32], i;
+  int ondev;
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 0;
+
+  n = 0;
+
+  #pragma acc parallel copy(n, arr) copyout(ondev) \
+	  num_gangs(ACTUAL_GANGS) num_workers(8) vector_length(32)
+  /* { dg-warning "region is gang partitioned but does not contain gang partitioned code" "gang" { target *-*-* } 1090 } */
+  /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 1090 } */
+  {
+    int j;
+
+    ondev = acc_on_device (acc_device_not_host);
+
+    #pragma acc atomic
+    n++;
+
+    #pragma acc loop vector
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc atomic
+	arr[j] += 1;
+      }
+
+    #pragma acc atomic
+    n++;
+  }
+
+  int m = ondev ? ACTUAL_GANGS : 1;
+  
+  assert (n == m * 2);
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == m);
+}
+#undef ACTUAL_GANGS
+
+
+/* Check if worker-single variables get broadcastd to vectors.  */
+
+#pragma acc routine
+float t28_routine ()
+{
+  return 2.71;
+}
+
+#define N 32
+void t28()
+{
+  float threads[N], v1 = 3.14;
+
+  for (int i = 0; i < N; i++)
+    threads[i] = -1;
+
+#pragma acc parallel num_gangs (1) vector_length (32) copy (v1)
+  {
+    float val = t28_routine ();
+
+#pragma acc loop vector
+    for (int i = 0; i < N; i++)
+      threads[i] = val + v1*i;
+  }
+
+  for (int i = 0; i < N; i++)
+    assert (fabs (threads[i] - (t28_routine () + v1*i)) < 0.0001);
+}
+#undef N
+
+
+int main()
+{
+  t1();
+  t2();
+  t3();
+  t4();
+  t5();
+  t6();
+  t7();
+  t8();
+  t9();
+  t10();
+  t11();
+  t12();
+  t13();
+  t14();
+  t15();
+  t16();
+  t17();
+  t18();
+  t19();
+  t20();
+  t21();
+  t22();
+  t23();
+  t24();
+  t25();
+  t26();
+  t27();
+  t28();
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
index 83cddb5..c164598 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/nested-2.c
@@ -1,3 +1,5 @@
+/* { dg-do run } */
+
 #include <stdlib.h>
 
 int
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
deleted file mode 100644
index 9a411fe..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
+++ /dev/null
@@ -1,6 +0,0 @@
-/* { dg-do run } */
-
-#include <stdlib.h>
-
-#define EXEC_DIRECTIVE parallel
-#include "data-clauses.h"
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-variables.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-variables.c
new file mode 100644
index 0000000..f0c3447
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/private-variables.c
@@ -0,0 +1,966 @@
+#include <assert.h>
+#include <openacc.h>
+
+typedef struct {
+  int x, y;
+} vec2;
+
+typedef struct {
+  int x, y, z;
+  int attr[13];
+} vec3_attr;
+
+
+/* Test of gang-private variables declared in local scope with parallel
+   directive.  */
+
+void local_g_1()
+{
+  int i, arr[32];
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 3;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 24 } */
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 24 } */
+  {
+    int x;
+
+    #pragma acc loop gang(static:1)
+    for (i = 0; i < 32; i++)
+      x = i * 2;
+
+    #pragma acc loop gang(static:1)
+    for (i = 0; i < 32; i++)
+      {
+	if (acc_on_device (acc_device_host))
+	  x = i * 2;
+	arr[i] += x;
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == 3 + i * 2);
+}
+
+
+/* Test of worker-private variables declared in a local scope, broadcasting
+   to vector-partitioned mode.  Back-to-back worker loops.  */
+
+void local_w_1()
+{
+  int i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    int x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+
+	#pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    int x = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared in a local scope, broadcasting
+   to vector-partitioned mode.  Successive vector loops.  */
+
+void local_w_2()
+{
+  int i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    int x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	    
+	    x = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared in a local scope, broadcasting
+   to vector-partitioned mode.  Aggregate worker variable.  */
+
+void local_w_3()
+{
+  int i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    vec2 pt;
+	    
+	    pt.x = i ^ j * 3;
+	    pt.y = i | j * 5;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt.x * k;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt.y * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared in a local scope, broadcasting
+   to vector-partitioned mode.  Addressable worker variable.  */
+
+void local_w_4()
+{
+  int i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    vec2 pt, *ptp;
+	    
+	    ptp = &pt;
+	    
+	    pt.x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += ptp->x * k;
+
+	    ptp->y = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt.y * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared in a local scope, broadcasting
+   to vector-partitioned mode.  Array worker variable.  */
+
+void local_w_5()
+{
+  int i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    int pt[2];
+	    
+	    pt[0] = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt[0] * k;
+
+	    pt[1] = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt[1] * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of gang-private variables declared on loop directive.  */
+
+void loop_g_1()
+{
+  int x = 5, i, arr[32];
+
+  for (i = 0; i < 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 299 } */
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 299 } */
+  {
+    #pragma acc loop gang private(x)
+    for (i = 0; i < 32; i++)
+      {
+	x = i * 2;
+	arr[i] += x;
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == i * 3);
+}
+
+
+/* Test of gang-private variables declared on loop directive, with broadcasting
+   to partitioned workers.  */
+
+void loop_g_2()
+{
+  int x = 5, i, arr[32 * 32];
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 326 } */
+  {
+    #pragma acc loop gang private(x)
+    for (i = 0; i < 32; i++)
+      {
+	x = i * 2;
+
+	#pragma acc loop worker
+	for (int j = 0; j < 32; j++)
+	  arr[i * 32 + j] += x;
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + (i / 32) * 2);
+}
+
+
+/* Test of gang-private variables declared on loop directive, with broadcasting
+   to partitioned vectors.  */
+
+void loop_g_3()
+{
+  int x = 5, i, arr[32 * 32];
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 355 } */
+  {
+    #pragma acc loop gang private(x)
+    for (i = 0; i < 32; i++)
+      {
+	x = i * 2;
+
+	#pragma acc loop vector
+	for (int j = 0; j < 32; j++)
+	  arr[i * 32 + j] += x;
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + (i / 32) * 2);
+}
+
+
+/* Test of gang-private addressable variable declared on loop directive, with
+   broadcasting to partitioned workers.  */
+
+void loop_g_4()
+{
+  int x = 5, i, arr[32 * 32];
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 384 } */
+  {
+    #pragma acc loop gang private(x)
+    for (i = 0; i < 32; i++)
+      {
+        int *p = &x;
+
+	x = i * 2;
+
+	#pragma acc loop worker
+	for (int j = 0; j < 32; j++)
+	  arr[i * 32 + j] += x;
+
+	(*p)--;
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + (i / 32) * 2);
+}
+
+
+/* Test of gang-private array variable declared on loop directive, with
+   broadcasting to partitioned workers.  */
+
+void loop_g_5()
+{
+  int x[8], i, arr[32 * 32];
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 417 } */
+  {
+    #pragma acc loop gang private(x)
+    for (i = 0; i < 32; i++)
+      {
+        for (int j = 0; j < 8; j++)
+	  x[j] = j * 2;
+
+	#pragma acc loop worker
+	for (int j = 0; j < 32; j++)
+	  arr[i * 32 + j] += x[j % 8];
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + (i % 8) * 2);
+}
+
+
+/* Test of gang-private aggregate variable declared on loop directive, with
+   broadcasting to partitioned workers.  */
+
+void loop_g_6()
+{
+  int i, arr[32 * 32];
+  vec3_attr pt;
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 448 } */
+  {
+    #pragma acc loop gang private(pt)
+    for (i = 0; i < 32; i++)
+      {
+        pt.x = i;
+	pt.y = i * 2;
+	pt.z = i * 4;
+	pt.attr[5] = i * 6;
+
+	#pragma acc loop worker
+	for (int j = 0; j < 32; j++)
+	  arr[i * 32 + j] += pt.x + pt.y + pt.z + pt.attr[5];
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + (i / 32) * 13);
+}
+
+
+/* Test of vector-private variables declared on loop directive.  */
+
+void loop_v_1()
+{
+  int x, i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+
+	    #pragma acc loop vector private(x)
+	    for (k = 0; k < 32; k++)
+	      {
+		x = i ^ j * 3;
+		arr[i * 1024 + j * 32 + k] += x * k;
+	      }
+
+	    #pragma acc loop vector private(x)
+	    for (k = 0; k < 32; k++)
+	      {
+		x = i | j * 5;
+		arr[i * 1024 + j * 32 + k] += x * k;
+	      }
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of vector-private variables declared on loop directive. Array type.  */
+
+void loop_v_2()
+{
+  int pt[2], i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+
+	    #pragma acc loop vector private(pt)
+	    for (k = 0; k < 32; k++)
+	      {
+	        pt[0] = i ^ j * 3;
+		pt[1] = i | j * 5;
+		arr[i * 1024 + j * 32 + k] += pt[0] * k;
+		arr[i * 1024 + j * 32 + k] += pt[1] * k;
+	      }
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared on a loop directive.  */
+
+void loop_w_1()
+{
+  int x = 5, i, arr[32 * 32];
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 570 } */
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker private(x)
+	for (j = 0; j < 32; j++)
+	  {
+	    x = i ^ j * 3;
+	    /* Try to ensure 'x' accesses doesn't get optimized into a
+	       temporary.  */
+	    __asm__ __volatile__ ("");
+	    arr[i * 32 + j] += x;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + ((i / 32) ^ (i % 32) * 3));
+}
+
+
+/* Test of worker-private variables declared on a loop directive, broadcasting
+   to vector-partitioned mode.  */
+
+void loop_w_2()
+{
+  int x = 5, i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker private(x)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared on a loop directive, broadcasting
+   to vector-partitioned mode.  Back-to-back worker loops.  */
+
+void loop_w_3()
+{
+  int x = 5, i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker private(x)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+
+	#pragma acc loop worker private(x)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    x = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared on a loop directive, broadcasting
+   to vector-partitioned mode.  Successive vector loops.  */
+
+void loop_w_4()
+{
+  int x = 5, i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker private(x)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	    
+	    x = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared on a loop directive, broadcasting
+   to vector-partitioned mode.  Addressable worker variable.  */
+
+void loop_w_5()
+{
+  int x = 5, i, arr[32 * 32 * 32];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker private(x)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    int *p = &x;
+	    
+	    x = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	    
+	    *p = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += x * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared on a loop directive, broadcasting
+   to vector-partitioned mode.  Aggregate worker variable.  */
+
+void loop_w_6()
+{
+  int i, arr[32 * 32 * 32];
+  vec2 pt;
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        #pragma acc loop worker private(pt)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    
+	    pt.x = i ^ j * 3;
+	    pt.y = i | j * 5;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt.x * k;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt.y * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of worker-private variables declared on loop directive, broadcasting
+   to vector-partitioned mode.  Array worker variable.  */
+
+void loop_w_7()
+{
+  int i, arr[32 * 32 * 32];
+  int pt[2];
+
+  for (i = 0; i < 32 * 32 * 32; i++)
+    arr[i] = i;
+
+  /* "pt" is treated as "present_or_copy" on the parallel directive because it
+     is an array variable.  */
+  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
+  {
+    int j;
+
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        /* But here, it is made private per-worker.  */
+        #pragma acc loop worker private(pt)
+	for (j = 0; j < 32; j++)
+	  {
+	    int k;
+	    
+	    pt[0] = i ^ j * 3;
+
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt[0] * k;
+
+	    pt[1] = i | j * 5;
+	    
+	    #pragma acc loop vector
+	    for (k = 0; k < 32; k++)
+	      arr[i * 1024 + j * 32 + k] += pt[1] * k;
+	  }
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    for (int j = 0; j < 32; j++)
+      for (int k = 0; k < 32; k++)
+        {
+	  int idx = i * 1024 + j * 32 + k;
+          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
+	}
+}
+
+
+/* Test of gang-private variables declared on the parallel directive.  */
+
+void parallel_g_1()
+{
+  int x = 5, i, arr[32];
+
+  for (i = 0; i < 32; i++)
+    arr[i] = 3;
+
+  #pragma acc parallel private(x) copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 887 } */
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 887 } */
+  {
+    #pragma acc loop gang(static:1)
+    for (i = 0; i < 32; i++)
+      x = i * 2;
+
+    #pragma acc loop gang(static:1)
+    for (i = 0; i < 32; i++)
+      {
+	if (acc_on_device (acc_device_host))
+	  x = i * 2;
+	arr[i] += x;
+      }
+  }
+
+  for (i = 0; i < 32; i++)
+    assert (arr[i] == 3 + i * 2);
+}
+
+
+/* Test of gang-private array variable declared on the parallel directive.  */
+
+void parallel_g_2()
+{
+  int x[32], i, arr[32 * 32];
+
+  for (i = 0; i < 32 * 32; i++)
+    arr[i] = i;
+
+  #pragma acc parallel private(x) copy(arr) num_gangs(32) num_workers(2) vector_length(32)
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 918 } */
+  {
+    #pragma acc loop gang
+    for (i = 0; i < 32; i++)
+      {
+        int j;
+	for (j = 0; j < 32; j++)
+	  x[j] = j * 2;
+	
+	#pragma acc loop worker
+	for (j = 0; j < 32; j++)
+	  arr[i * 32 + j] += x[31 - j];
+      }
+  }
+
+  for (i = 0; i < 32 * 32; i++)
+    assert (arr[i] == i + (31 - (i % 32)) * 2);
+}
+
+
+int main ()
+{
+  local_g_1();
+  local_w_1();
+  local_w_2();
+  local_w_3();
+  local_w_4();
+  local_w_5();
+  loop_g_1();
+  loop_g_2();
+  loop_g_3();
+  loop_g_4();
+  loop_g_5();
+  loop_g_6();
+  loop_v_1();
+  loop_v_2();
+  loop_w_1();
+  loop_w_2();
+  loop_w_3();
+  loop_w_4();
+  loop_w_5();
+  loop_w_6();
+  loop_w_7();
+  parallel_g_1();
+  parallel_g_2();
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-gang-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-gang-1.c
deleted file mode 100644
index 3c9ee83..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-gang-1.c
+++ /dev/null
@@ -1,41 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-#include <openacc.h>
-
-/* Test of gang-private variables declared in local scope with parallel
-   directive.  */
-
-#define ACTUAL_GANGS 32
-
-int
-main (int argc, char* argv[])
-{
-  int x = 5, i, arr[ACTUAL_GANGS];
-
-  for (i = 0; i < ACTUAL_GANGS; i++)
-    arr[i] = 3;
-
-  #pragma acc parallel copy(arr) num_gangs(ACTUAL_GANGS) num_workers(8) \
-		       vector_length(32)
-  {
-    int x;
-
-    #pragma acc loop gang(static:1)
-    for (i = 0; i < ACTUAL_GANGS; i++)
-      x = i * 2;
-
-    #pragma acc loop gang(static:1)
-    for (i = 0; i < ACTUAL_GANGS; i++)
-      {
-	if (acc_on_device (acc_device_host))
-	  x = i * 2;
-	arr[i] += x;
-      }
-  }
-
-  for (i = 0; i < ACTUAL_GANGS; i++)
-    assert (arr[i] == 3 + i * 2);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-1.c
deleted file mode 100644
index 67a1518..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-1.c
+++ /dev/null
@@ -1,54 +0,0 @@
-#include <assert.h>
-
-/* Test of worker-private variables declared in a local scope, broadcasting
-   to vector-partitioned mode.  Back-to-back worker loops.  */
-
-int
-main (int argc, char* argv[])
-{
-  int i, arr[32 * 32 * 32];
-
-  for (i = 0; i < 32 * 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
-  {
-    int j;
-
-    #pragma acc loop gang
-    for (i = 0; i < 32; i++)
-      {
-        #pragma acc loop worker
-	for (j = 0; j < 32; j++)
-	  {
-	    int k;
-	    int x = i ^ j * 3;
-
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += x * k;
-	  }
-
-	#pragma acc loop worker
-	for (j = 0; j < 32; j++)
-	  {
-	    int k;
-	    int x = i | j * 5;
-	    
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += x * k;
-	  }
-      }
-  }
-
-  for (i = 0; i < 32; i++)
-    for (int j = 0; j < 32; j++)
-      for (int k = 0; k < 32; k++)
-        {
-	  int idx = i * 1024 + j * 32 + k;
-          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
-	}
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-2.c
deleted file mode 100644
index 0ee87be..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-2.c
+++ /dev/null
@@ -1,49 +0,0 @@
-#include <assert.h>
-
-/* Test of worker-private variables declared in a local scope, broadcasting
-   to vector-partitioned mode.  Successive vector loops.  */
-
-int
-main (int argc, char* argv[])
-{
-  int x = 5, i, arr[32 * 32 * 32];
-
-  for (i = 0; i < 32 * 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
-  {
-    int j;
-
-    #pragma acc loop gang
-    for (i = 0; i < 32; i++)
-      {
-        #pragma acc loop worker
-	for (j = 0; j < 32; j++)
-	  {
-	    int k;
-	    int x = i ^ j * 3;
-
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += x * k;
-	    
-	    x = i | j * 5;
-	    
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += x * k;
-	  }
-      }
-  }
-
-  for (i = 0; i < 32; i++)
-    for (int j = 0; j < 32; j++)
-      for (int k = 0; k < 32; k++)
-        {
-	  int idx = i * 1024 + j * 32 + k;
-          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
-	}
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-3.c
deleted file mode 100644
index 1e67322..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-3.c
+++ /dev/null
@@ -1,55 +0,0 @@
-#include <assert.h>
-
-/* Test of worker-private variables declared in a local scope, broadcasting
-   to vector-partitioned mode.  Aggregate worker variable.  */
-
-typedef struct
-{
-  int x, y;
-} vec2;
-
-int
-main (int argc, char* argv[])
-{
-  int i, arr[32 * 32 * 32];
-
-  for (i = 0; i < 32 * 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
-  {
-    int j;
-
-    #pragma acc loop gang
-    for (i = 0; i < 32; i++)
-      {
-        #pragma acc loop worker
-	for (j = 0; j < 32; j++)
-	  {
-	    int k;
-	    vec2 pt;
-	    
-	    pt.x = i ^ j * 3;
-	    pt.y = i | j * 5;
-
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += pt.x * k;
-	    
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += pt.y * k;
-	  }
-      }
-  }
-
-  for (i = 0; i < 32; i++)
-    for (int j = 0; j < 32; j++)
-      for (int k = 0; k < 32; k++)
-        {
-	  int idx = i * 1024 + j * 32 + k;
-          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
-	}
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-4.c
deleted file mode 100644
index 120001b..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-4.c
+++ /dev/null
@@ -1,58 +0,0 @@
-#include <assert.h>
-
-/* Test of worker-private variables declared in a local scope, broadcasting
-   to vector-partitioned mode.  Addressable worker variable.  */
-
-typedef struct
-{
-  int x, y;
-} vec2;
-
-int
-main (int argc, char* argv[])
-{
-  int i, arr[32 * 32 * 32];
-
-  for (i = 0; i < 32 * 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
-  {
-    int j;
-
-    #pragma acc loop gang
-    for (i = 0; i < 32; i++)
-      {
-        #pragma acc loop worker
-	for (j = 0; j < 32; j++)
-	  {
-	    int k;
-	    vec2 pt, *ptp;
-	    
-	    ptp = &pt;
-	    
-	    pt.x = i ^ j * 3;
-
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += ptp->x * k;
-
-	    ptp->y = i | j * 5;
-	    
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += pt.y * k;
-	  }
-      }
-  }
-
-  for (i = 0; i < 32; i++)
-    for (int j = 0; j < 32; j++)
-      for (int k = 0; k < 32; k++)
-        {
-	  int idx = i * 1024 + j * 32 + k;
-          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
-	}
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-5.c
deleted file mode 100644
index f849f0c..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-local-worker-5.c
+++ /dev/null
@@ -1,51 +0,0 @@
-#include <assert.h>
-
-/* Test of worker-private variables declared in a local scope, broadcasting
-   to vector-partitioned mode.  Array worker variable.  */
-
-int
-main (int argc, char* argv[])
-{
-  int i, arr[32 * 32 * 32];
-
-  for (i = 0; i < 32 * 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
-  {
-    int j;
-
-    #pragma acc loop gang
-    for (i = 0; i < 32; i++)
-      {
-        #pragma acc loop worker
-	for (j = 0; j < 32; j++)
-	  {
-	    int k;
-	    int pt[2];
-	    
-	    pt[0] = i ^ j * 3;
-
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += pt[0] * k;
-
-	    pt[1] = i | j * 5;
-	    
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += pt[1] * k;
-	  }
-      }
-  }
-
-  for (i = 0; i < 32; i++)
-    for (int j = 0; j < 32; j++)
-      for (int k = 0; k < 32; k++)
-        {
-	  int idx = i * 1024 + j * 32 + k;
-          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
-	}
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-1.c
deleted file mode 100644
index f80b1e2..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-1.c
+++ /dev/null
@@ -1,29 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test of gang-private variables declared on loop directive.  */
-
-int
-main (int argc, char* argv[])
-{
-  int x = 5, i, arr[32];
-
-  for (i = 0; i < 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  {
-    #pragma acc loop gang private(x)
-    for (i = 0; i < 32; i++)
-      {
-	x = i * 2;
-	arr[i] += x;
-      }
-  }
-
-  for (i = 0; i < 32; i++)
-    assert (arr[i] == i * 3);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-2.c
deleted file mode 100644
index a127166..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-2.c
+++ /dev/null
@@ -1,33 +0,0 @@
-/* { dg-additional-options "-w" }   */
-
-#include <assert.h>
-
-/* Test of gang-private variables declared on loop directive, with broadcasting
-   to partitioned workers.  */
-
-int
-main (int argc, char* argv[])
-{
-  int x = 5, i, arr[32 * 32];
-
-  for (i = 0; i < 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  {
-    #pragma acc loop gang private(x)
-    for (i = 0; i < 32; i++)
-      {
-	x = i * 2;
-
-	#pragma acc loop worker
-	for (int j = 0; j < 32; j++)
-	  arr[i * 32 + j] += x;
-      }
-  }
-
-  for (i = 0; i < 32 * 32; i++)
-    assert (arr[i] == i + (i / 32) * 2);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-3.c
deleted file mode 100644
index ed06a51..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-3.c
+++ /dev/null
@@ -1,33 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test of gang-private variables declared on loop directive, with broadcasting
-   to partitioned vectors.  */
-
-int
-main (int argc, char* argv[])
-{
-  int x = 5, i, arr[32 * 32];
-
-  for (i = 0; i < 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  {
-    #pragma acc loop gang private(x)
-    for (i = 0; i < 32; i++)
-      {
-	x = i * 2;
-
-	#pragma acc loop vector
-	for (int j = 0; j < 32; j++)
-	  arr[i * 32 + j] += x;
-      }
-  }
-
-  for (i = 0; i < 32 * 32; i++)
-    assert (arr[i] == i + (i / 32) * 2);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-4.c
deleted file mode 100644
index dec9290..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-4.c
+++ /dev/null
@@ -1,37 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test of gang-private addressable variable declared on loop directive, with
-   broadcasting to partitioned workers.  */
-
-int
-main (int argc, char* argv[])
-{
-  int x = 5, i, arr[32 * 32];
-
-  for (i = 0; i < 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  {
-    #pragma acc loop gang private(x)
-    for (i = 0; i < 32; i++)
-      {
-        int *p = &x;
-
-	x = i * 2;
-
-	#pragma acc loop worker
-	for (int j = 0; j < 32; j++)
-	  arr[i * 32 + j] += x;
-
-	(*p)--;
-      }
-  }
-
-  for (i = 0; i < 32 * 32; i++)
-    assert (arr[i] == i + (i / 32) * 2);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-5.c
deleted file mode 100644
index 6a952b7..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-5.c
+++ /dev/null
@@ -1,34 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test of gang-private array variable declared on loop directive, with
-   broadcasting to partitioned workers.  */
-
-int
-main (int argc, char* argv[])
-{
-  int x[8], i, arr[32 * 32];
-
-  for (i = 0; i < 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  {
-    #pragma acc loop gang private(x)
-    for (i = 0; i < 32; i++)
-      {
-        for (int j = 0; j < 8; j++)
-	  x[j] = j * 2;
-
-	#pragma acc loop worker
-	for (int j = 0; j < 32; j++)
-	  arr[i * 32 + j] += x[j % 8];
-      }
-  }
-
-  for (i = 0; i < 32 * 32; i++)
-    assert (arr[i] == i + (i % 8) * 2);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-6.c
deleted file mode 100644
index 48db3b3..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-gang-6.c
+++ /dev/null
@@ -1,42 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test of gang-private aggregate variable declared on loop directive, with
-   broadcasting to partitioned workers.  */
-
-typedef struct {
-  int x, y, z;
-  int attr[13];
-} vec3;
-
-int
-main (int argc, char* argv[])
-{
-  int i, arr[32 * 32];
-  vec3 pt;
-
-  for (i = 0; i < 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  {
-    #pragma acc loop gang private(pt)
-    for (i = 0; i < 32; i++)
-      {
-        pt.x = i;
-	pt.y = i * 2;
-	pt.z = i * 4;
-	pt.attr[5] = i * 6;
-
-	#pragma acc loop worker
-	for (int j = 0; j < 32; j++)
-	  arr[i * 32 + j] += pt.x + pt.y + pt.z + pt.attr[5];
-      }
-  }
-
-  for (i = 0; i < 32 * 32; i++)
-    assert (arr[i] == i + (i / 32) * 13);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-vector-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-vector-1.c
deleted file mode 100644
index b3c6ad3..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-vector-1.c
+++ /dev/null
@@ -1,51 +0,0 @@
-#include <assert.h>
-
-/* Test of vector-private variables declared on loop directive.  */
-
-int
-main (int argc, char* argv[])
-{
-  int x, i, arr[32 * 32 * 32];
-
-  for (i = 0; i < 32 * 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
-  {
-    int j;
-
-    #pragma acc loop gang
-    for (i = 0; i < 32; i++)
-      {
-        #pragma acc loop worker
-	for (j = 0; j < 32; j++)
-	  {
-	    int k;
-
-	    #pragma acc loop vector private(x)
-	    for (k = 0; k < 32; k++)
-	      {
-		x = i ^ j * 3;
-		arr[i * 1024 + j * 32 + k] += x * k;
-	      }
-
-	    #pragma acc loop vector private(x)
-	    for (k = 0; k < 32; k++)
-	      {
-		x = i | j * 5;
-		arr[i * 1024 + j * 32 + k] += x * k;
-	      }
-	  }
-      }
-  }
-
-  for (i = 0; i < 32; i++)
-    for (int j = 0; j < 32; j++)
-      for (int k = 0; k < 32; k++)
-        {
-	  int idx = i * 1024 + j * 32 + k;
-          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
-	}
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-vector-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-vector-2.c
deleted file mode 100644
index d4609e9..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-vector-2.c
+++ /dev/null
@@ -1,46 +0,0 @@
-#include <assert.h>
-
-/* Test of vector-private variables declared on loop directive. Array type.  */
-
-int
-main (int argc, char* argv[])
-{
-  int pt[2], i, arr[32 * 32 * 32];
-
-  for (i = 0; i < 32 * 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
-  {
-    int j;
-
-    #pragma acc loop gang
-    for (i = 0; i < 32; i++)
-      {
-        #pragma acc loop worker
-	for (j = 0; j < 32; j++)
-	  {
-	    int k;
-
-	    #pragma acc loop vector private(pt)
-	    for (k = 0; k < 32; k++)
-	      {
-	        pt[0] = i ^ j * 3;
-		pt[1] = i | j * 5;
-		arr[i * 1024 + j * 32 + k] += pt[0] * k;
-		arr[i * 1024 + j * 32 + k] += pt[1] * k;
-	      }
-	  }
-      }
-  }
-
-  for (i = 0; i < 32; i++)
-    for (int j = 0; j < 32; j++)
-      for (int k = 0; k < 32; k++)
-        {
-	  int idx = i * 1024 + j * 32 + k;
-          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
-	}
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-1.c
deleted file mode 100644
index 3d22611..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-1.c
+++ /dev/null
@@ -1,38 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test of worker-private variables declared on a loop directive.  */
-
-int
-main (int argc, char* argv[])
-{
-  int x = 5, i, arr[32 * 32];
-
-  for (i = 0; i < 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  {
-    int j;
-
-    #pragma acc loop gang
-    for (i = 0; i < 32; i++)
-      {
-        #pragma acc loop worker private(x)
-	for (j = 0; j < 32; j++)
-	  {
-	    x = i ^ j * 3;
-	    /* Try to ensure 'x' accesses doesn't get optimized into a
-	       temporary.  */
-	    __asm__ __volatile__ ("");
-	    arr[i * 32 + j] += x;
-	  }
-      }
-  }
-
-  for (i = 0; i < 32 * 32; i++)
-    assert (arr[i] == i + ((i / 32) ^ (i % 32) * 3));
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-2.c
deleted file mode 100644
index 3227700..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-2.c
+++ /dev/null
@@ -1,43 +0,0 @@
-#include <assert.h>
-
-/* Test of worker-private variables declared on a loop directive, broadcasting
-   to vector-partitioned mode.  */
-
-int
-main (int argc, char* argv[])
-{
-  int x = 5, i, arr[32 * 32 * 32];
-
-  for (i = 0; i < 32 * 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
-  {
-    int j;
-
-    #pragma acc loop gang
-    for (i = 0; i < 32; i++)
-      {
-        #pragma acc loop worker private(x)
-	for (j = 0; j < 32; j++)
-	  {
-	    int k;
-	    x = i ^ j * 3;
-
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += x * k;
-	  }
-      }
-  }
-
-  for (i = 0; i < 32; i++)
-    for (int j = 0; j < 32; j++)
-      for (int k = 0; k < 32; k++)
-        {
-	  int idx = i * 1024 + j * 32 + k;
-          assert (arr[idx] == idx + (i ^ j * 3) * k);
-	}
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-3.c
deleted file mode 100644
index 65d5d4f..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-3.c
+++ /dev/null
@@ -1,54 +0,0 @@
-#include <assert.h>
-
-/* Test of worker-private variables declared on a loop directive, broadcasting
-   to vector-partitioned mode.  Back-to-back worker loops.  */
-
-int
-main (int argc, char* argv[])
-{
-  int x = 5, i, arr[32 * 32 * 32];
-
-  for (i = 0; i < 32 * 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
-  {
-    int j;
-
-    #pragma acc loop gang
-    for (i = 0; i < 32; i++)
-      {
-        #pragma acc loop worker private(x)
-	for (j = 0; j < 32; j++)
-	  {
-	    int k;
-	    x = i ^ j * 3;
-
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += x * k;
-	  }
-
-	#pragma acc loop worker private(x)
-	for (j = 0; j < 32; j++)
-	  {
-	    int k;
-	    x = i | j * 5;
-	    
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += x * k;
-	  }
-      }
-  }
-
-  for (i = 0; i < 32; i++)
-    for (int j = 0; j < 32; j++)
-      for (int k = 0; k < 32; k++)
-        {
-	  int idx = i * 1024 + j * 32 + k;
-          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
-	}
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-4.c
deleted file mode 100644
index 42a12b2..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-4.c
+++ /dev/null
@@ -1,49 +0,0 @@
-#include <assert.h>
-
-/* Test of worker-private variables declared on a loop directive, broadcasting
-   to vector-partitioned mode.  Successive vector loops.  */
-
-int
-main (int argc, char* argv[])
-{
-  int x = 5, i, arr[32 * 32 * 32];
-
-  for (i = 0; i < 32 * 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
-  {
-    int j;
-
-    #pragma acc loop gang
-    for (i = 0; i < 32; i++)
-      {
-        #pragma acc loop worker private(x)
-	for (j = 0; j < 32; j++)
-	  {
-	    int k;
-	    x = i ^ j * 3;
-
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += x * k;
-	    
-	    x = i | j * 5;
-	    
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += x * k;
-	  }
-      }
-  }
-
-  for (i = 0; i < 32; i++)
-    for (int j = 0; j < 32; j++)
-      for (int k = 0; k < 32; k++)
-        {
-	  int idx = i * 1024 + j * 32 + k;
-          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
-	}
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-5.c
deleted file mode 100644
index a28105c..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-5.c
+++ /dev/null
@@ -1,51 +0,0 @@
-#include <assert.h>
-
-/* Test of worker-private variables declared on a loop directive, broadcasting
-   to vector-partitioned mode.  Addressable worker variable.  */
-
-int
-main (int argc, char* argv[])
-{
-  int x = 5, i, arr[32 * 32 * 32];
-
-  for (i = 0; i < 32 * 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
-  {
-    int j;
-
-    #pragma acc loop gang
-    for (i = 0; i < 32; i++)
-      {
-        #pragma acc loop worker private(x)
-	for (j = 0; j < 32; j++)
-	  {
-	    int k;
-	    int *p = &x;
-	    
-	    x = i ^ j * 3;
-
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += x * k;
-	    
-	    *p = i | j * 5;
-	    
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += x * k;
-	  }
-      }
-  }
-
-  for (i = 0; i < 32; i++)
-    for (int j = 0; j < 32; j++)
-      for (int k = 0; k < 32; k++)
-        {
-	  int idx = i * 1024 + j * 32 + k;
-          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
-	}
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-6.c
deleted file mode 100644
index 5dde621..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-6.c
+++ /dev/null
@@ -1,55 +0,0 @@
-#include <assert.h>
-
-/* Test of worker-private variables declared on a loop directive, broadcasting
-   to vector-partitioned mode.  Aggregate worker variable.  */
-
-typedef struct
-{
-  int x, y;
-} vec2;
-
-int
-main (int argc, char* argv[])
-{
-  int i, arr[32 * 32 * 32];
-  vec2 pt;
-
-  for (i = 0; i < 32 * 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
-  {
-    int j;
-
-    #pragma acc loop gang
-    for (i = 0; i < 32; i++)
-      {
-        #pragma acc loop worker private(pt)
-	for (j = 0; j < 32; j++)
-	  {
-	    int k;
-	    
-	    pt.x = i ^ j * 3;
-	    pt.y = i | j * 5;
-
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += pt.x * k;
-	    
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += pt.y * k;
-	  }
-      }
-  }
-
-  for (i = 0; i < 32; i++)
-    for (int j = 0; j < 32; j++)
-      for (int k = 0; k < 32; k++)
-        {
-	  int idx = i * 1024 + j * 32 + k;
-          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
-	}
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-7.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-7.c
deleted file mode 100644
index e4d4ccf..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-loop-worker-7.c
+++ /dev/null
@@ -1,54 +0,0 @@
-#include <assert.h>
-
-/* Test of worker-private variables declared on loop directive, broadcasting
-   to vector-partitioned mode.  Array worker variable.  */
-
-int
-main (int argc, char* argv[])
-{
-  int i, arr[32 * 32 * 32];
-  int pt[2];
-
-  for (i = 0; i < 32 * 32 * 32; i++)
-    arr[i] = i;
-
-  /* "pt" is treated as "present_or_copy" on the parallel directive because it
-     is an array variable.  */
-  #pragma acc parallel copy(arr) num_gangs(32) num_workers(32) vector_length(32)
-  {
-    int j;
-
-    #pragma acc loop gang
-    for (i = 0; i < 32; i++)
-      {
-        /* But here, it is made private per-worker.  */
-        #pragma acc loop worker private(pt)
-	for (j = 0; j < 32; j++)
-	  {
-	    int k;
-	    
-	    pt[0] = i ^ j * 3;
-
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += pt[0] * k;
-
-	    pt[1] = i | j * 5;
-	    
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[i * 1024 + j * 32 + k] += pt[1] * k;
-	  }
-      }
-  }
-
-  for (i = 0; i < 32; i++)
-    for (int j = 0; j < 32; j++)
-      for (int k = 0; k < 32; k++)
-        {
-	  int idx = i * 1024 + j * 32 + k;
-          assert (arr[idx] == idx + (i ^ j * 3) * k + (i | j * 5) * k);
-	}
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-par-gang-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-par-gang-1.c
deleted file mode 100644
index 43d4765..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-par-gang-1.c
+++ /dev/null
@@ -1,27 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Basic test of firstprivate variable.  */
-
-int
-main (int argc, char* argv[])
-{
-  int x = 5, i, arr[32];
-
-  for (i = 0; i < 32; i++)
-    arr[i] = 3;
-
-  #pragma acc parallel firstprivate(x) copy(arr) num_gangs(32) num_workers(8) \
-		       vector_length(32)
-  {
-    #pragma acc loop gang
-    for (i = 0; i < 32; i++)
-      arr[i] += x;
-  }
-
-  for (i = 0; i < 32; i++)
-    assert (arr[i] == 8);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-par-gang-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-par-gang-2.c
deleted file mode 100644
index 7b74e02..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-par-gang-2.c
+++ /dev/null
@@ -1,38 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-#include <openacc.h>
-
-/* Test of gang-private variables declared on the parallel directive.  */
-
-#define ACTUAL_GANGS 32
-
-int
-main (int argc, char* argv[])
-{
-  int x = 5, i, arr[ACTUAL_GANGS];
-
-  for (i = 0; i < ACTUAL_GANGS; i++)
-    arr[i] = 3;
-
-  #pragma acc parallel private(x) copy(arr) num_gangs(ACTUAL_GANGS) \
-		       num_workers(8) vector_length(32)
-  {
-    #pragma acc loop gang(static:1)
-    for (i = 0; i < ACTUAL_GANGS; i++)
-      x = i * 2;
-
-    #pragma acc loop gang(static:1)
-    for (i = 0; i < ACTUAL_GANGS; i++)
-      {
-	if (acc_on_device (acc_device_host))
-	  x = i * 2;
-	arr[i] += x;
-      }
-  }
-
-  for (i = 0; i < ACTUAL_GANGS; i++)
-    assert (arr[i] == 3 + i * 2);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-par-gang-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-par-gang-3.c
deleted file mode 100644
index 965221a..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/private-vars-par-gang-3.c
+++ /dev/null
@@ -1,35 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test of gang-private array variable declared on the parallel directive.  */
-
-int
-main (int argc, char* argv[])
-{
-  int x[32], i, arr[32 * 32];
-
-  for (i = 0; i < 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel private(x) copy(arr) num_gangs(32) num_workers(2) \
-		       vector_length(32)
-  {
-    #pragma acc loop gang
-    for (i = 0; i < 32; i++)
-      {
-        int j;
-	for (j = 0; j < 32; j++)
-	  x[j] = j * 2;
-	
-	#pragma acc loop worker
-	for (j = 0; j < 32; j++)
-	  arr[i * 32 + j] += x[31 - j];
-      }
-  }
-
-  for (i = 0; i < 32 * 32; i++)
-    assert (arr[i] == i + (31 - (i % 32)) * 2);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c
new file mode 100644
index 0000000..702b02b
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c
@@ -0,0 +1,488 @@
+/* Tests of reduction on loop directive.  */
+
+#include <assert.h>
+
+
+/* Test of reduction on loop directive (gangs, non-private reduction
+   variable).  */
+
+void g_np_1()
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 16 } */
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 16 } */
+  {
+    #pragma acc loop gang reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+
+  res = hres = 1;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 33 } */
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 33 } */
+  {
+    #pragma acc loop gang reduction(*:res)
+    for (i = 0; i < 12; i++)
+      res *= arr[i];
+  }
+
+  for (i = 0; i < 12; i++)
+    hres *= arr[i];
+
+  assert (res == hres);
+}
+
+
+/* Test of reduction on loop directive (gangs and vectors, non-private
+   reduction variable).  */
+
+void gv_np_1()
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 60 } */
+  {
+    #pragma acc loop gang vector reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+}
+
+
+/* Test of reduction on loop directive (gangs and workers, non-private
+   reduction variable).  */
+
+void gw_np_1()
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 86 } */
+  {
+    #pragma acc loop gang worker reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+}
+
+
+/* Test of reduction on loop directive (gangs, workers and vectors, non-private
+   reduction variable).  */
+
+void gwv_np_1()
+{
+  int i, arr[1024], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang worker vector reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+}
+
+
+/* Test of reduction on loop directive (gangs, workers and vectors, non-private
+   reduction variable: separate gang and worker/vector loops).  */
+
+void gwv_np_2()
+{
+  int i, j, arr[32768], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang reduction(+:res)
+    for (j = 0; j < 32; j++)
+      {
+        #pragma acc loop worker vector reduction(+:res)
+        for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+      }
+    /* "res" is non-private, and is not available until after the parallel
+       region.  */
+  }
+
+  for (i = 0; i < 32768; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+}
+
+
+/* Test of reduction on loop directive (gangs, workers and vectors, non-private
+   reduction variable: separate gang and worker/vector loops).  */
+
+void gwv_np_3()
+{
+  int i, j;
+  double arr[32768], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copyin(arr) copy(res)
+  {
+    #pragma acc loop gang reduction(+:res)
+    for (j = 0; j < 32; j++)
+      {
+        #pragma acc loop worker vector reduction(+:res)
+        for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+      }
+  }
+
+  for (i = 0; i < 32768; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+}
+
+
+/* Test of reduction on loop directive (gangs, workers and vectors, multiple
+   non-private reduction variables, float type).  */
+
+void gwv_np_4()
+{
+  int i, j;
+  float arr[32768];
+  float res = 0, mres = 0, hres = 0, hmres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res, mres)
+  {
+    #pragma acc loop gang reduction(+:res) reduction(max:mres)
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
+	for (i = 0; i < 1024; i++)
+	  {
+	    res += arr[j * 1024 + i];
+	    if (arr[j * 1024 + i] > mres)
+	      mres = arr[j * 1024 + i];
+	  }
+
+	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
+	for (i = 0; i < 1024; i++)
+	  {
+	    res += arr[j * 1024 + (1023 - i)];
+	    if (arr[j * 1024 + (1023 - i)] > mres)
+	      mres = arr[j * 1024 + (1023 - i)];
+	  }
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 1024; i++)
+      {
+        hres += arr[j * 1024 + i];
+	hres += arr[j * 1024 + (1023 - i)];
+	if (arr[j * 1024 + i] > hmres)
+	  hmres = arr[j * 1024 + i];
+	if (arr[j * 1024 + (1023 - i)] > hmres)
+	  hmres = arr[j * 1024 + (1023 - i)];
+      }
+
+  assert (res == hres);
+  assert (mres == hmres);
+}
+
+
+/* Test of reduction on loop directive (vectors, private reduction
+   variable).  */
+
+void v_p_1()
+{
+  int i, j, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  /* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 250 } */
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = 0;
+
+	#pragma acc loop vector reduction(+:res)
+	for (i = 0; i < 32; i++)
+	  res += arr[j * 32 + i];
+	
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = 0;
+      
+      for (i = 0; i < 32; i++)
+	hres += arr[j * 32 + i];
+
+      assert (out[j] == hres);
+    }
+}
+
+
+/* Test of reduction on loop directive (vector reduction in
+   gang-partitioned/worker-partitioned mode, private reduction variable).  */
+
+void v_p_2()
+{
+  int i, j, k;
+  double ina[1024], inb[1024], out[1024], acc;
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 32; i++)
+      {
+        ina[j * 32 + i] = (i == j) ? 2.0 : 0.0;
+	inb[j * 32 + i] = (double) (i + j);
+      }
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(acc) copyin(ina, inb) copyout(out)
+  {
+    #pragma acc loop gang worker
+    for (k = 0; k < 32; k++)
+      for (j = 0; j < 32; j++)
+        {
+	  acc = 0;
+
+	  #pragma acc loop vector reduction(+:acc)
+	  for (i = 0; i < 32; i++)
+	    acc += ina[k * 32 + i] * inb[i * 32 + j];
+
+	  out[k * 32 + j] = acc;
+	}
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 32; i++)
+      assert (out[j * 32 + i] == (i + j) * 2);
+}
+
+
+/* Test of reduction on loop directive (workers, private reduction
+   variable).  */
+
+void w_p_1()
+{
+  int i, j, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  /* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 327 } */
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = 0;
+
+	#pragma acc loop worker reduction(+:res)
+	for (i = 0; i < 32; i++)
+	  res += arr[j * 32 + i];
+	
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = 0;
+      
+      for (i = 0; i < 32; i++)
+	hres += arr[j * 32 + i];
+
+      assert (out[j] == hres);
+    }
+}
+
+
+/* Test of reduction on loop directive (workers and vectors, private reduction
+   variable).  */
+
+void wv_p_1()
+{
+  int i, j, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = 0;
+
+	#pragma acc loop worker vector reduction(+:res)
+	for (i = 0; i < 32; i++)
+	  res += arr[j * 32 + i];
+	
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = 0;
+      
+      for (i = 0; i < 32; i++)
+	hres += arr[j * 32 + i];
+
+      assert (out[j] == hres);
+    }
+}
+
+
+/* Test of reduction on loop directive (workers and vectors, private reduction
+   variable).  */
+
+void wv_p_2()
+{
+  int i, j, arr[32768], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = j;
+
+	#pragma acc loop worker reduction(+:res)
+	for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+
+	#pragma acc loop vector reduction(+:res)
+	for (i = 1023; i >= 0; i--)
+	  res += arr[j * 1024 + i];
+
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = j;
+      
+      for (i = 0; i < 1024; i++)
+	hres += arr[j * 1024 + i] * 2;
+
+      assert (out[j] == hres);
+    }
+}
+
+
+/* Test of reduction on loop directive (workers and vectors, private reduction
+   variable: gang-redundant mode).  */
+
+void wv_p_3()
+{
+  int i, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i ^ 33;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyin(arr) copyout(out)
+  {
+    /* Private variables aren't initialized by default in openacc.  */
+    res = 0;
+
+    /* "res" should be available at the end of the following loop (and should
+       have the same value redundantly in each gang).  */
+    #pragma acc loop worker vector reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+    
+    #pragma acc loop gang (static: 1)
+    for (i = 0; i < 32; i++)
+      out[i] = res;
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  for (i = 0; i < 32; i++)
+    assert (out[i] == hres);
+}
+
+
+int main()
+{
+  g_np_1();
+  gv_np_1();
+  gw_np_1();
+  gwv_np_1();
+  gwv_np_2();
+  gwv_np_3();
+  gwv_np_4();
+  v_p_1();
+  v_p_2();
+  w_p_1();
+  wv_p_1();
+  wv_p_2();
+  wv_p_3();
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/routine-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/routine-1.c
index f8b58f8..f2ebaf1 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/routine-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-1.c
@@ -1,40 +1,86 @@
-/* FIXME: remove -fno-var-tracking from dg-aditional-options.  */
-
-/* { dg-do run { target openacc_nvidia_accel_selected }  } */
-/* { dg-additional-options "-fno-inline -fno-var-tracking" } */
-
 #include <stdio.h>
 #include <stdlib.h>
 
 #pragma acc routine
-int
-fact (int n)
+int fact(int n)
 {
   if (n == 0 || n == 1)
     return 1;
-
-  return n * fact (n - 1);
+  else
+    return n * fact (n - 1);
 }
 
-int
-main()
+int main()
 {
-  int *a, i, n = 10;
+  int *s, *g, *w, *v, *gw, *gv, *wv, *gwv, i, n = 10;
 
-  a = (int *)malloc (sizeof (int) * n);
+  s = (int *) malloc (sizeof (int) * n);
+  g = (int *) malloc (sizeof (int) * n);
+  w = (int *) malloc (sizeof (int) * n);
+  v = (int *) malloc (sizeof (int) * n);
+  gw = (int *) malloc (sizeof (int) * n);
+  gv = (int *) malloc (sizeof (int) * n);
+  wv = (int *) malloc (sizeof (int) * n);
+  gwv = (int *) malloc (sizeof (int) * n);
 
-#pragma acc parallel copy (a[0:n]) vector_length (32)
-  {
-#pragma acc loop vector
-    for (i = 0; i < n; i++)
-      a[i] = fact (i);
-  }
+#pragma acc parallel loop async copyout(s[0:n]) seq
+  for (i = 0; i < n; i++)
+    s[i] = fact (i);
 
+#pragma acc parallel loop async copyout(g[0:n]) gang
   for (i = 0; i < n; i++)
-    if (a[i] != fact (i))
-      abort ();
+    g[i] = fact (i);
+
+#pragma acc parallel loop async copyout(w[0:n]) worker
+  for (i = 0; i < n; i++)
+    w[i] = fact (i);
+
+#pragma acc parallel loop async copyout(v[0:n]) vector
+  for (i = 0; i < n; i++)
+    v[i] = fact (i);
+
+#pragma acc parallel loop async copyout(gw[0:n]) gang worker
+  for (i = 0; i < n; i++)
+    gw[i] = fact (i);
+
+#pragma acc parallel loop async copyout(gv[0:n]) gang vector
+  for (i = 0; i < n; i++)
+    gv[i] = fact (i);
+
+#pragma acc parallel loop async copyout(wv[0:n]) worker vector
+  for (i = 0; i < n; i++)
+    wv[i] = fact (i);
+
+#pragma acc parallel loop async copyout(gwv[0:n]) gang worker vector
+  for (i = 0; i < n; i++)
+    gwv[i] = fact (i);
 
-  free (a);
+#pragma acc wait
+
+  for (i = 0; i < n; i++)
+    if (s[i] != fact (i))
+      abort ();
+  for (i = 0; i < n; i++)
+    if (g[i] != s[i])
+      abort ();
+  for (i = 0; i < n; i++)
+    if (w[i] != s[i])
+      abort ();
+  for (i = 0; i < n; i++)
+    if (v[i] != s[i])
+      abort ();
+  for (i = 0; i < n; i++)
+    if (gw[i] != s[i])
+      abort ();
+  for (i = 0; i < n; i++)
+    if (gv[i] != s[i])
+      abort ();
+  for (i = 0; i < n; i++)
+    if (wv[i] != s[i])
+      abort ();
+  for (i = 0; i < n; i++)
+    if (gwv[i] != s[i])
+      abort ();
 
   return 0;
 }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/routine-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/routine-2.c
deleted file mode 100644
index 2aa101a..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/routine-2.c
+++ /dev/null
@@ -1,42 +0,0 @@
-/* FIXME: remove -fno-var-tracking from dg-additional-options.  */
-
-/* { dg-do run { target openacc_nvidia_accel_selected }  } */
-/* { dg-additional-options "-fno-inline -fno-var-tracking" } */
-
-#include <stdio.h>
-#include <stdlib.h>
-
-int fact (int);
-
-#pragma acc routine (fact)
-
-int fact (int n)
-{
-  if (n == 0 || n == 1)
-    return 1;
-
-  return n * fact (n - 1);
-}
-
-int
-main()
-{
-  int *a, i, n = 10;
-
-  a = (int *)malloc (sizeof (int) * n);
-
-#pragma acc parallel copy (a[0:n]) vector_length (32)
-  {
-#pragma acc loop vector
-    for (i = 0; i < n; i++)
-      a[i] = fact (i);
-  }
-
-  for (i = 0; i < n; i++)
-    if (a[i] != fact (i))
-      abort ();
-
-  free (a);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c
index bcff464..d6ff44d 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c
@@ -1,7 +1,5 @@
-/* { dg-do run } */
-/* { dg-additional-options "-w" } */
-
 #include <stdlib.h>
+#include <stdio.h>
 
 #define M 8
 #define N 32
@@ -38,7 +36,7 @@ gang (int *a)
 {
   int i;
 
-#pragma acc loop gang
+#pragma acc loop gang worker vector
   for (i = 0; i < N; i++)
     a[i] -= i; 
 }
@@ -53,8 +51,6 @@ seq (int *a)
     a[i] += 1;
 }
 
-#include <stdio.h>
-
 int
 main(int argc, char **argv)
 {
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/routine-g-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/routine-g-1.c
index 33c8a62..2ef5a55 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/routine-g-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-g-1.c
@@ -1,4 +1,3 @@
-/* { dg-additional-options "-w" } */
 /* This code uses nvptx inline assembly guarded with acc_on_device, which is
    not optimized away at -O0, and then confuses the target assembler.
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
@@ -9,6 +8,8 @@
 
 #pragma acc routine gang
 void __attribute__ ((noinline)) gang (int ary[N])
+/* { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 10 } */
+/* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 10 } */
 {
 #pragma acc loop gang
     for (unsigned ix = 0; ix < N; ix++)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/routine-vec-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/routine-vec-1.c
deleted file mode 100644
index fa1f96d..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/routine-vec-1.c
+++ /dev/null
@@ -1,47 +0,0 @@
-/* This code uses nvptx inline assembly guarded with acc_on_device, which is
-   not optimized away at -O0, and then confuses the target assembler.
-   { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
-
-#include <stdio.h>
-#include <openacc.h>
-
-#define VEC_ID(I, N)						\
-  (acc_on_device (acc_device_nvidia)				\
-   ? ({unsigned __r;						\
-       __asm__ volatile ("mov.u32 %0,%%tid.x;" : "=r" (__r));	\
-       __r; }) : (I % N))
-
-#pragma acc routine vector
-void Vec (int *ptr, int lim, int N)
-{
-#pragma acc loop vector
-  for (int i = 0; i < lim; i++)
-    ptr[i] = VEC_ID(i, N);
-}
-
-#define LEN 32
-
-int main ()
-{
-  int ary[LEN];
-  int err = 0;
-
-  for (int ix = 0; ix != LEN; ix++)
-    ary[ix] = 0xdeadbeef;
-  
-#pragma acc parallel vector_length(32) copy (ary)
-  {
-    Vec (ary, LEN, 32);
-  }
-
-  for (int ix = 0; ix != LEN; ix++)
-    {
-      if (ary[ix] != ix % 32)
-	{
-	  printf ("ary[%d] = %d expected %d\n", ix, ary[ix], ix % 32);
-	  err = 1;
-	}
-    }
-
-  return err;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/routine-w-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/routine-w-1.c
index c295e66..0b03a01 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/routine-w-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-w-1.c
@@ -1,4 +1,3 @@
-/* { dg-additional-options "-w" } */
 /* This code uses nvptx inline assembly guarded with acc_on_device, which is
    not optimized away at -O0, and then confuses the target assembler.
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
@@ -9,6 +8,7 @@
 
 #pragma acc routine worker
 void __attribute__ ((noinline)) worker (int ary[N])
+/* { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 10 } */
 {
 #pragma acc loop worker
   for (unsigned ix = 0; ix < N; ix++)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/routine-work-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/routine-work-1.c
deleted file mode 100644
index daf9bea..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/routine-work-1.c
+++ /dev/null
@@ -1,55 +0,0 @@
-/* { dg-additional-options "-w" } */
-/* This code uses nvptx inline assembly guarded with acc_on_device, which is
-   not optimized away at -O0, and then confuses the target assembler.
-   { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
-
-#include <stdio.h>
-#include <openacc.h>
-
-#define WORK_ID(I,N)						\
-  (acc_on_device (acc_device_nvidia)				\
-   ? ({unsigned __r;						\
-       __asm__ volatile ("mov.u32 %0,%%tid.y;" : "=r" (__r));	\
-       __r; }) : (I % N))
-
-#pragma acc routine worker
-void Work (int *ptr, int lim, int N)
-{
-#pragma acc loop worker
-  for (int i = 0; i < lim; i++)
-    ptr[i] = WORK_ID(i, N);
-}
-
-#define LEN 32
-
-int DoWork (int err, int N)
-{
-  int ary[LEN];
-
-  for (int ix = 0; ix != LEN; ix++)
-    ary[ix] = 0xdeadbeef;
-  
-#pragma acc parallel num_workers(N) copy (ary)
-  {
-    Work (ary, LEN, N);
-  }
-
-  for (int ix = 0; ix != LEN; ix++)
-    if (ary[ix] != ix % N)
-      {
-	printf ("ary[%d] = %d expected %d\n", ix, ary[ix], ix % N);
-	err = 1;
-      }
-  return err;
-}
-
-
-int main ()
-{
-  int err = 0;
-
-  for (int W = 1; W <= LEN; W <<= 1)
-    err = DoWork (err, W);
-
-  return err;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/update-1-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/update-1-2.c
deleted file mode 100644
index 82c3192..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/update-1-2.c
+++ /dev/null
@@ -1,361 +0,0 @@
-/* Copy of update-1.c with self exchanged with host for #pragma acc update.  */
-
-/* { dg-do run } */
-/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } } */
-
-#include <openacc.h>
-#include <string.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <stdbool.h>
-
-int
-main (int argc, char **argv)
-{
-    int N = 8;
-    int NDIV2 = N / 2;
-    float *a, *b, *c;
-    float *d_a, *d_b, *d_c;
-    int i;
-
-    a = (float *) malloc (N * sizeof (float));
-    b = (float *) malloc (N * sizeof (float));
-    c = (float *) malloc (N * sizeof (float));
-
-    d_a = (float *) acc_malloc (N * sizeof (float));
-    d_b = (float *) acc_malloc (N * sizeof (float));
-    d_c = (float *) acc_malloc (N * sizeof (float));
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 3.0;
-        b[i] = 0.0;
-    }
-
-    acc_map_data (a, d_a, N * sizeof (float));
-    acc_map_data (b, d_b, N * sizeof (float));
-    acc_map_data (c, d_c, N * sizeof (float));
-
-#pragma acc update device (a[0:N], b[0:N])
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update self (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        if (a[i] != 3.0)
-            abort ();
-
-        if (b[i] != 3.0)
-            abort ();
-    }
-
-    if (!acc_is_present (&a[0], (N * sizeof (float))))
-      abort ();
-
-    if (!acc_is_present (&b[0], (N * sizeof (float))))
-      abort ();
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 5.0;
-        b[i] = 1.0;
-    }
-
-#pragma acc update device (a[0:N], b[0:N])
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update self (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        if (a[i] != 5.0)
-            abort ();
-
-        if (b[i] != 5.0)
-            abort ();
-    }
-
-    if (!acc_is_present (&a[0], (N * sizeof (float))))
-      abort ();
-
-    if (!acc_is_present (&b[0], (N * sizeof (float))))
-      abort ();
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 5.0;
-        b[i] = 1.0;
-    }
-
-#pragma acc update device (a[0:N], b[0:N])
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update host (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        if (a[i] != 5.0)
-            abort ();
-
-        if (b[i] != 5.0)
-            abort ();
-    }
-
-    if (!acc_is_present (&a[0], (N * sizeof (float))))
-      abort ();
-
-    if (!acc_is_present (&b[0], (N * sizeof (float))))
-      abort ();
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 6.0;
-        b[i] = 0.0;
-    }
-
-#pragma acc update device (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 9.0;
-    }
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update self (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        if (a[i] != 6.0)
-            abort ();
-
-        if (b[i] != 6.0)
-            abort ();
-    }
-
-    if (!acc_is_present (&a[0], (N * sizeof (float))))
-      abort ();
-
-    if (!acc_is_present (&b[0], (N * sizeof (float))))
-      abort ();
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 7.0;
-        b[i] = 2.0;
-    }
-
-#pragma acc update device (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 9.0;
-    }
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update self (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        if (a[i] != 7.0)
-            abort ();
-
-        if (b[i] != 7.0)
-            abort ();
-    }
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 9.0;
-    }
-
-#pragma acc update device (a[0:N])
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update self (a[0:N], b[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        if (a[i] != 9.0)
-            abort ();
-
-        if (b[i] != 9.0)
-            abort ();
-    }
-
-    if (!acc_is_present (&a[0], (N * sizeof (float))))
-      abort ();
-
-    if (!acc_is_present (&b[0], (N * sizeof (float))))
-      abort ();
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 5.0;
-    }
-
-#pragma acc update device (a[0:N])
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 6.0;
-    }
-
-#pragma acc update device (a[0:NDIV2])
-
-#pragma acc parallel present (a[0:N], b[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            b[ii] = a[ii];
-    }
-
-#pragma acc update self (a[0:N], b[0:N])
-
-    for (i = 0; i < NDIV2; i++)
-    {
-        if (a[i] != 6.0)
-            abort ();
-
-        if (b[i] != 6.0)
-            abort ();
-    }
-
-    for (i = NDIV2; i < N; i++)
-    {
-        if (a[i] != 5.0)
-            abort ();
-
-        if (b[i] != 5.0)
-            abort ();
-    }
-
-    if (!acc_is_present (&a[0], (N * sizeof (float))))
-      abort ();
-
-    if (!acc_is_present (&b[0], (N * sizeof (float))))
-      abort ();
-
-    for (i = 0; i < N; i++)
-    {
-        a[i] = 0.0;
-    }
-
-#pragma acc update device (a[0:4])
-
-#pragma acc parallel present (a[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            a[ii] = a[ii] + 1.0;
-    }
-
-#pragma acc update self (a[4:4])
-
-    for (i = 0; i < NDIV2; i++)
-    {
-        if (a[i] != 0.0)
-            abort ();
-    }
-
-    for (i = NDIV2; i < N; i++)
-    {
-        if (a[i] != 6.0)
-            abort ();
-    }
-
-#pragma acc update self (a[0:4])
-
-    for (i = 0; i < NDIV2; i++)
-    {
-        if (a[i] != 1.0)
-            abort ();
-    }
-
-    for (i = NDIV2; i < N; i++)
-    {
-        if (a[i] != 6.0)
-            abort ();
-    }
-
-    a[2] = 9;
-    a[3] = 9;
-    a[4] = 9;
-    a[5] = 9;
-
-#pragma acc update device (a[2:4])
-
-#pragma acc parallel present (a[0:N])
-    {
-        int ii;
-
-        for (ii = 0; ii < N; ii++)
-            a[ii] = a[ii] + 1.0;
-    }
-
-#pragma acc update self (a[2:4])
-
-    for (i = 0; i < 2; i++)
-    {
-      if (a[i] != 1.0)
-	abort ();
-    }
-
-    for (i = 2; i < 6; i++)
-    {
-      if (a[i] != 10.0)
-	abort ();
-    }
-
-    for (i = 6; i < N; i++)
-    {
-        if (a[i] != 6.0)
-            abort ();
-    }
-
-    return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-1.c
deleted file mode 100644
index b21e588..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-1.c
+++ /dev/null
@@ -1,30 +0,0 @@
-#include <assert.h>
-
-/* Test basic vector-partitioned mode transitions.  */
-
-int
-main (int argc, char *argv[])
-{
-  int n = 0, arr[32], i;
-
-  for (i = 0; i < 32; i++)
-    arr[i] = 0;
-
-  #pragma acc parallel copy(n, arr) num_gangs(1) num_workers(1) \
-		       vector_length(32)
-  {
-    int j;
-    n++;
-    #pragma acc loop vector
-    for (j = 0; j < 32; j++)
-      arr[j]++;
-    n++;
-  }
-
-  assert (n == 2);
-
-  for (i = 0; i < 32; i++)
-    assert (arr[i] == 1);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-2.c
deleted file mode 100644
index 1ff222d..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-2.c
+++ /dev/null
@@ -1,43 +0,0 @@
-#include <assert.h>
-
-/* Test vector-partitioned, gang-partitioned mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int n[32], arr[1024], i;
-  
-  for (i = 0; i < 1024; i++)
-    arr[i] = 0;
-
-  for (i = 0; i < 32; i++)
-    n[i] = 0;
-
-  #pragma acc parallel copy(n, arr) num_gangs(32) num_workers(1) \
-		       vector_length(32)
-  {
-    int j, k;
-
-    #pragma acc loop gang(static:*)
-    for (j = 0; j < 32; j++)
-      n[j]++;
-
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      #pragma acc loop vector
-      for (k = 0; k < 32; k++)
-	arr[j * 32 + k]++;
-
-    #pragma acc loop gang(static:*)
-    for (j = 0; j < 32; j++)
-      n[j]++;
-  }
-
-  for (i = 0; i < 32; i++)
-    assert (n[i] == 2);
-
-  for (i = 0; i < 1024; i++)
-    assert (arr[i] == 1);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c
deleted file mode 100644
index 8dd628e2..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-3.c
+++ /dev/null
@@ -1,54 +0,0 @@
-#include <assert.h>
-
-/* Test conditional vector-partitioned loops.  */
-
-int
-main (int argc, char *argv[])
-{
-  int n[32], arr[1024], i;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = 0;
-
-  for (i = 0; i < 32; i++)
-    n[i] = 0;
-
-  #pragma acc parallel copy(n, arr) num_gangs(32) num_workers(1) \
-		       vector_length(32)
-  {
-    int j, k;
-
-    #pragma acc loop gang(static:*)
-    for (j = 0; j < 32; j++)
-      n[j]++;
-
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-	if ((j % 2) == 0)
-	  {
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[j * 32 + k]++;
-	  }
-	else
-	  {
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      arr[j * 32 + k]--;
-	  }
-      }
-
-    #pragma acc loop gang(static:*)
-    for (j = 0; j < 32; j++)
-      n[j]++;
-  }
-
-  for (i = 0; i < 32; i++)
-    assert (n[i] == 2);
-
-  for (i = 0; i < 1024; i++)
-    assert (arr[i] == ((i % 64) < 32) ? 1 : -1);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-4.c
deleted file mode 100644
index 4ea3bf2..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-4.c
+++ /dev/null
@@ -1,46 +0,0 @@
-#include <assert.h>
-
-/* Test conditions inside vector-partitioned loops.  */
-
-int
-main (int argc, char *argv[])
-{
-  int n[32], arr[1024], i;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = i;
-
-  for (i = 0; i < 32; i++)
-    n[i] = 0;
-
-  #pragma acc parallel copy(n, arr) num_gangs(32) num_workers(1) \
-		       vector_length(32)
-  {
-    int j, k;
-
-    #pragma acc loop gang(static:*)
-    for (j = 0; j < 32; j++)
-      n[j]++;
-
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-	#pragma acc loop vector
-	for (k = 0; k < 32; k++)
-	  if ((arr[j * 32 + k] % 2) != 0)
-	    arr[j * 32 + k] *= 2;
-      }
-
-    #pragma acc loop gang(static:*)
-    for (j = 0; j < 32; j++)
-      n[j]++;
-  }
-
-  for (i = 0; i < 32; i++)
-    assert (n[i] == 2);
-
-  for (i = 0; i < 1024; i++)
-    assert (arr[i] == ((i % 2) == 0 ? i : i * 2));
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-5.c
deleted file mode 100644
index 86b742a..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-5.c
+++ /dev/null
@@ -1,42 +0,0 @@
-#include <assert.h>
-
-/* Test conditions inside gang-partitioned/vector-partitioned loops.  */
-
-int
-main (int argc, char *argv[])
-{
-  int n[32], arr[1024], i;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = i;
-
-  for (i = 0; i < 32; i++)
-    n[i] = 0;
-
-  #pragma acc parallel copy(n, arr) num_gangs(32) num_workers(1) \
-		       vector_length(32)
-  {
-    int j, k;
-
-    #pragma acc loop gang(static:*)
-    for (j = 0; j < 32; j++)
-      n[j]++;
-
-    #pragma acc loop gang vector
-    for (j = 0; j < 1024; j++)
-      if ((arr[j] % 2) != 0)
-	arr[j] *= 2;
-
-    #pragma acc loop gang(static:*)
-    for (j = 0; j < 32; j++)
-      n[j]++;
-  }
-
-  for (i = 0; i < 32; i++)
-    assert (n[i] == 2);
-
-  for (i = 0; i < 1024; i++)
-    assert (arr[i] == ((i % 2) == 0 ? i : i * 2));
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-6.c
deleted file mode 100644
index 606b787..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/vec-partn-6.c
+++ /dev/null
@@ -1,77 +0,0 @@
-#include <assert.h>
-#include <stdlib.h>
-
-/* Test switch containing vector-partitioned loops inside gang-partitioned
-   loops.  */
-
-int
-main (int argc, char *argv[])
-{
-  int n[32], arr[1024], i;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = 0;
-
-  for (i = 0; i < 32; i++)
-    n[i] = i % 5;
-
-  #pragma acc parallel copy(n, arr) num_gangs(32) num_workers(1) \
-		       vector_length(32)
-  {
-    int j, k;
-
-    #pragma acc loop gang(static:*)
-    for (j = 0; j < 32; j++)
-      n[j]++;
-
-    #pragma acc loop gang(static:*)
-    for (j = 0; j < 32; j++)
-      switch (n[j])
-	{
-	case 1:
-	  #pragma acc loop vector
-	  for (k = 0; k < 32; k++)
-	    arr[j * 32 + k] += 1;
-	  break;
-
-	case 2:
-	  #pragma acc loop vector
-	  for (k = 0; k < 32; k++)
-	    arr[j * 32 + k] += 2;
-	  break;
-
-	case 3:
-	  #pragma acc loop vector
-	  for (k = 0; k < 32; k++)
-	    arr[j * 32 + k] += 3;
-	  break;
-
-	case 4:
-	  #pragma acc loop vector
-	  for (k = 0; k < 32; k++)
-	    arr[j * 32 + k] += 4;
-	  break;
-
-	case 5:
-	  #pragma acc loop vector
-	  for (k = 0; k < 32; k++)
-	    arr[j * 32 + k] += 5;
-	  break;
-
-	default:
-	  abort ();
-	}
-
-    #pragma acc loop gang(static:*)
-    for (j = 0; j < 32; j++)
-      n[j]++;
-  }
-
-  for (i = 0; i < 32; i++)
-    assert (n[i] == (i % 5) + 2);
-
-  for (i = 0; i < 1024; i++)
-    assert (arr[i] == ((i / 32) % 5) + 1);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/vec-single-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/vec-single-1.c
deleted file mode 100644
index 248ddb1..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/vec-single-1.c
+++ /dev/null
@@ -1,17 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test trivial operation of vector-single mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int n = 0;
-  #pragma acc parallel copy(n) num_gangs(1) num_workers(1) vector_length(32)
-  {
-    n++;
-  }
-  assert (n == 1);
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/vec-single-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/vec-single-2.c
deleted file mode 100644
index ff1f5f2..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/vec-single-2.c
+++ /dev/null
@@ -1,34 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test vector-single, gang-partitioned mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr[1024];
-  int gangs;
-
-  for (gangs = 1; gangs <= 1024; gangs <<= 1)
-    {
-      int i;
-
-      for (i = 0; i < 1024; i++)
-	arr[i] = 0;
-
-      #pragma acc parallel copy(arr) num_gangs(gangs) num_workers(1) \
-			   vector_length(32)
-      {
-	int j;
-	#pragma acc loop gang
-	for (j = 0; j < 1024; j++)
-	  arr[j]++;
-      }
-
-      for (i = 0; i < 1024; i++)
-	assert (arr[i] == 1);
-    }
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/vec-single-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/vec-single-3.c
deleted file mode 100644
index c34e95a..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/vec-single-3.c
+++ /dev/null
@@ -1,37 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test conditions in vector-single mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr[1024];
-  int gangs;
-
-  for (gangs = 1; gangs <= 1024; gangs <<= 1)
-    {
-      int i;
-
-      for (i = 0; i < 1024; i++)
-	arr[i] = 0;
-
-      #pragma acc parallel copy(arr) num_gangs(gangs) num_workers(1) \
-			   vector_length(32)
-      {
-	int j;
-	#pragma acc loop gang
-	for (j = 0; j < 1024; j++)
-	  if ((j % 3) == 0)
-	    arr[j]++;
-	  else
-	    arr[j] += 2;
-      }
-
-      for (i = 0; i < 1024; i++)
-	assert (arr[i] == ((i % 3) == 0) ? 1 : 2);
-    }
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/vec-single-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/vec-single-4.c
deleted file mode 100644
index 1227c13..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/vec-single-4.c
+++ /dev/null
@@ -1,42 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test switch in vector-single mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr[1024];
-  int gangs;
-
-  for (gangs = 1; gangs <= 1024; gangs <<= 1)
-    {
-      int i;
-
-      for (i = 0; i < 1024; i++)
-	arr[i] = 0;
-
-      #pragma acc parallel copy(arr) num_gangs(gangs) num_workers(1) \
-			   vector_length(32)
-      {
-	int j;
-	#pragma acc loop gang
-	for (j = 0; j < 1024; j++)
-	  switch (j % 5)
-	    {
-	    case 0: arr[j] += 1; break;
-	    case 1: arr[j] += 2; break;
-	    case 2: arr[j] += 3; break;
-	    case 3: arr[j] += 4; break;
-	    case 4: arr[j] += 5; break;
-	    default: arr[j] += 99;
-	    }
-      }
-
-      for (i = 0; i < 1024; i++)
-	assert (arr[i] == (i % 5) + 1);
-    }
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/vec-single-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/vec-single-5.c
deleted file mode 100644
index 76839ab..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/vec-single-5.c
+++ /dev/null
@@ -1,45 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test switch in vector-single mode, initialise array on device.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr[1024];
-  int i;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = 99;
-
-  #pragma acc parallel copy(arr) num_gangs(1024) num_workers(1) \
-		       vector_length(32)
-  {
-    int j;
-
-    /* This loop and the one following must be distributed to available gangs
-       in the same way to ensure data dependencies are not violated (hence the
-       "static" clauses).  */
-    #pragma acc loop gang(static:*)
-    for (j = 0; j < 1024; j++)
-      arr[j] = 0;
-    
-    #pragma acc loop gang(static:*)
-    for (j = 0; j < 1024; j++)
-      switch (j % 5)
-	{
-	case 0: arr[j] += 1; break;
-	case 1: arr[j] += 2; break;
-	case 2: arr[j] += 3; break;
-	case 3: arr[j] += 4; break;
-	case 4: arr[j] += 5; break;
-	default: arr[j] += 99;
-	}
-  }
-
-  for (i = 0; i < 1024; i++)
-    assert (arr[i] == (i % 5) + 1);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/vec-single-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/vec-single-6.c
deleted file mode 100644
index 3bd6845..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/vec-single-6.c
+++ /dev/null
@@ -1,51 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-#include <stdbool.h>
-
-#define NUM_GANGS 4096
-
-/* Test multiple conditions in vector-single mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  bool fizz[NUM_GANGS], buzz[NUM_GANGS], fizzbuzz[NUM_GANGS];
-  int i;
-
-  #pragma acc parallel copyout(fizz, buzz, fizzbuzz) \
-		       num_gangs(NUM_GANGS) num_workers(1) vector_length(32)
-  {
-    int j;
-    
-    /* This loop and the one following must be distributed to available gangs
-       in the same way to ensure data dependencies are not violated (hence the
-       "static" clauses).  */
-    #pragma acc loop gang(static:*)
-    for (j = 0; j < NUM_GANGS; j++)
-      fizz[j] = buzz[j] = fizzbuzz[j] = 0;
-    
-    #pragma acc loop gang(static:*)
-    for (j = 0; j < NUM_GANGS; j++)
-      {
-	if ((j % 3) == 0 && (j % 5) == 0)
-	  fizzbuzz[j] = 1;
-	else
-	  {
-	    if ((j % 3) == 0)
-	      fizz[j] = 1;
-	    else if ((j % 5) == 0)
-	      buzz[j] = 1;
-	  }
-      }
-  }
-
-  for (i = 0; i < NUM_GANGS; i++)
-    {
-      assert (fizzbuzz[i] == ((i % 3) == 0 && (i % 5) == 0));
-      assert (fizz[i] == ((i % 3) == 0 && (i % 5) != 0));
-      assert (buzz[i] == ((i % 3) != 0 && (i % 5) == 0));
-    }
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/vector-broadcast.c libgomp/testsuite/libgomp.oacc-c-c++-common/vector-broadcast.c
deleted file mode 100644
index 2e1893b..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/vector-broadcast.c
+++ /dev/null
@@ -1,38 +0,0 @@
-/* Check if worker-single variables get broadcastd to vectors.  */
-
-/* { dg-do run } */
-
-#include <assert.h>
-#include <math.h>
-
-#define N 32
-
-#pragma acc routine
-float
-some_val ()
-{
-  return 2.71;
-}
-
-int
-main ()
-{
-  float threads[N], v1 = 3.14;
-
-  for (int i = 0; i < N; i++)
-    threads[i] = -1;
-
-#pragma acc parallel num_gangs (1) vector_length (32) copy (v1)
-  {
-    float val = some_val ();
-
-#pragma acc loop vector
-    for (int i = 0; i < N; i++)
-      threads[i] = val + v1*i;
-  }
-
-  for (int i = 0; i < N; i++)
-    assert (fabs (threads[i] - (some_val () + v1*i)) < 0.0001);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-1.c
deleted file mode 100644
index d72cd55..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-1.c
+++ /dev/null
@@ -1,32 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test worker-partitioned/vector-single mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr[32 * 8], i;
-
-  for (i = 0; i < 32 * 8; i++)
-    arr[i] = 0;
-
-  #pragma acc parallel copy(arr) num_gangs(8) num_workers(8) vector_length(32)
-  {
-    int j;
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-	int k;
-	#pragma acc loop worker
-	for (k = 0; k < 8; k++)
-          arr[j * 8 + k] += j * 8 + k;
-      }
-  }
-
-  for (i = 0; i < 32 * 8; i++)
-    assert (arr[i] == i);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-2.c
deleted file mode 100644
index 1023e22..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-2.c
+++ /dev/null
@@ -1,44 +0,0 @@
-#include <assert.h>
-
-/* Test condition in worker-partitioned mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr[32 * 32 * 8], i;
-
-  for (i = 0; i < 32 * 32 * 8; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(8) num_workers(8) vector_length(32)
-  {
-    int j;
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-	int k;
-	#pragma acc loop worker
-	for (k = 0; k < 8; k++)
-	  {
-	    int m;
-	    if ((k % 2) == 0)
-	      {
-		#pragma acc loop vector
-		for (m = 0; m < 32; m++)
-		  arr[j * 32 * 8 + k * 32 + m]++;
-	      }
-	    else
-	      {
-		#pragma acc loop vector
-		for (m = 0; m < 32; m++)
-		  arr[j * 32 * 8 + k * 32 + m] += 2;
-	      }
-	  }
-      }
-  }
-
-  for (i = 0; i < 32 * 32 * 8; i++)
-    assert (arr[i] == i + ((i / 32) % 2) + 1);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-3.c
deleted file mode 100644
index a13a571..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-3.c
+++ /dev/null
@@ -1,54 +0,0 @@
-#include <assert.h>
-
-/* Test switch in worker-partitioned mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr[32 * 32 * 8], i;
-
-  for (i = 0; i < 32 * 32 * 8; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(8) num_workers(8) vector_length(32)
-  {
-    int j;
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-	int k;
-	#pragma acc loop worker
-	for (k = 0; k < 8; k++)
-	  {
-	    int m;
-	    switch ((j * 32 + k) % 3)
-	    {
-	    case 0:
-	      #pragma acc loop vector
-	      for (m = 0; m < 32; m++)
-		arr[j * 32 * 8 + k * 32 + m]++;
-	      break;
-
-	    case 1:
-	      #pragma acc loop vector
-	      for (m = 0; m < 32; m++)
-		arr[j * 32 * 8 + k * 32 + m] += 2;
-	      break;
-
-	    case 2:
-	      #pragma acc loop vector
-	      for (m = 0; m < 32; m++)
-		arr[j * 32 * 8 + k * 32 + m] += 3;
-	      break;
-
-	    default: ;
-	    }
-	  }
-      }
-  }
-
-  for (i = 0; i < 32 * 32 * 8; i++)
-    assert (arr[i] == i + ((i / 32) % 3) + 1);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-4.c
deleted file mode 100644
index 45d3cce..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-4.c
+++ /dev/null
@@ -1,56 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test worker-single/worker-partitioned transitions.  */
-
-int
-main (int argc, char *argv[])
-{
-  int n[32], arr[32 * 32], i;
-
-  for (i = 0; i < 32 * 32; i++)
-    arr[i] = 0;
-
-  for (i = 0; i < 32; i++)
-    n[i] = 0;
-
-  #pragma acc parallel copy(n, arr) num_gangs(8) num_workers(16) \
-	  vector_length(32)
-  {
-    int j;
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-	int k;
-
-	n[j]++;
-
-	#pragma acc loop worker
-	for (k = 0; k < 32; k++)
-          arr[j * 32 + k]++;
-
-	n[j]++;
-
-	#pragma acc loop worker
-	for (k = 0; k < 32; k++)
-          arr[j * 32 + k]++;
-
-	n[j]++;
-
-	#pragma acc loop worker
-	for (k = 0; k < 32; k++)
-          arr[j * 32 + k]++;
-
-	n[j]++;
-      }
-  }
-
-  for (i = 0; i < 32; i++)
-    assert (n[i] == 4);
-
-  for (i = 0; i < 32 * 32; i++)
-    assert (arr[i] == 3);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-5.c
deleted file mode 100644
index 7c9a51c..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-5.c
+++ /dev/null
@@ -1,47 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test correct synchronisation between worker-partitioned loops.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr_a[32 * 32], arr_b[32 * 32], i;
-  int num_workers, num_gangs;
-
-  for (num_workers = 1; num_workers <= 32; num_workers <<= 1)
-    for (num_gangs = 1; num_gangs <= 32; num_gangs <<= 1)
-      {
-	for (i = 0; i < 32 * 32; i++)
-	  arr_a[i] = i;
-
-	#pragma acc parallel copyin(arr_a) copyout(arr_b) num_gangs(num_gangs) \
-		num_workers(num_workers) vector_length(32)
-	{
-	  int j;
-	  #pragma acc loop gang
-	  for (j = 0; j < 32; j++)
-	    {
-	      int k;
-
-	      #pragma acc loop worker
-	      for (k = 0; k < 32; k++)
-        	arr_b[j * 32 + (31 - k)] = arr_a[j * 32 + k] * 2;
-
-	      #pragma acc loop worker
-	      for (k = 0; k < 32; k++)
-        	arr_a[j * 32 + (31 - k)] = arr_b[j * 32 + k] * 2;
-
-	      #pragma acc loop worker
-	      for (k = 0; k < 32; k++)
-        	arr_b[j * 32 + (31 - k)] = arr_a[j * 32 + k] * 2;
-	    }
-	}
-
-	for (i = 0; i < 32 * 32; i++)
-	  assert (arr_b[i] == (i ^ 31) * 8);
-      }
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-6.c
deleted file mode 100644
index cfbcd17..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-6.c
+++ /dev/null
@@ -1,45 +0,0 @@
-#include <assert.h>
-
-/* Test correct synchronisation between worker+vector-partitioned loops.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr_a[32 * 32 * 32], arr_b[32 * 32 * 32], i;
-  int num_workers, num_gangs;
-
-  for (num_workers = 1; num_workers <= 32; num_workers <<= 1)
-    for (num_gangs = 1; num_gangs <= 32; num_gangs <<= 1)
-      {
-	for (i = 0; i < 32 * 32 * 32; i++)
-	  arr_a[i] = i;
-
-	#pragma acc parallel copyin(arr_a) copyout(arr_b) num_gangs(num_gangs) \
-		num_workers(num_workers) vector_length(32)
-	{
-	  int j;
-	  #pragma acc loop gang
-	  for (j = 0; j < 32; j++)
-	    {
-	      int k;
-
-	      #pragma acc loop worker vector
-	      for (k = 0; k < 32 * 32; k++)
-        	arr_b[j * 32 * 32 + (1023 - k)] = arr_a[j * 32 * 32 + k] * 2;
-
-	      #pragma acc loop worker vector
-	      for (k = 0; k < 32 * 32; k++)
-        	arr_a[j * 32 * 32 + (1023 - k)] = arr_b[j * 32 * 32 + k] * 2;
-
-	      #pragma acc loop worker vector
-	      for (k = 0; k < 32 * 32; k++)
-        	arr_b[j * 32 * 32 + (1023 - k)] = arr_a[j * 32 * 32 + k] * 2;
-	    }
-	}
-
-	for (i = 0; i < 32 * 32 * 32; i++)
-	  assert (arr_b[i] == (i ^ 1023) * 8);
-      }
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-7.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-7.c
deleted file mode 100644
index fe0c59c..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-7.c
+++ /dev/null
@@ -1,90 +0,0 @@
-#include <assert.h>
-
-/* Test correct synchronisation between vector-partitioned loops in
-   worker-partitioned mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int n[32 * 32], arr_a[32 * 32 * 32], arr_b[32 * 32 * 32], i;
-  int num_workers, num_gangs;
-
-  for (num_workers = 1; num_workers <= 32; num_workers <<= 1)
-    for (num_gangs = 1; num_gangs <= 32; num_gangs <<= 1)
-      {
-	for (i = 0; i < 32 * 32 * 32; i++)
-	  arr_a[i] = i;
-
-	for (i = 0; i < 32 * 32; i++)
-          n[i] = 0;
-
-	#pragma acc parallel copy (n) copyin(arr_a) copyout(arr_b) \
-		num_gangs(num_gangs) num_workers(num_workers) vector_length(32)
-	{
-	  int j;
-	  #pragma acc loop gang
-	  for (j = 0; j < 32; j++)
-	    {
-	      int k;
-
-	      #pragma acc loop worker
-	      for (k = 0; k < 32; k++)
-		{
-		  int m;
-
-		  n[j * 32 + k]++;
-
-		  #pragma acc loop vector
-		  for (m = 0; m < 32; m++)
-		    {
-	              if (((j * 1024 + k * 32 + m) % 2) == 0)
-			arr_b[j * 1024 + k * 32 + (31 - m)]
-			  = arr_a[j * 1024 + k * 32 + m] * 2;
-		      else
-			arr_b[j * 1024 + k * 32 + (31 - m)]
-			  = arr_a[j * 1024 + k * 32 + m] * 3;
-		    }
-
-		  /* Test returning to vector-single mode...  */
-		  n[j * 32 + k]++;
-
-		  #pragma acc loop vector
-		  for (m = 0; m < 32; m++)
-		    {
-	              if (((j * 1024 + k * 32 + m) % 3) == 0)
-			arr_a[j * 1024 + k * 32 + (31 - m)]
-			  = arr_b[j * 1024 + k * 32 + m] * 5;
-		      else
-			arr_a[j * 1024 + k * 32 + (31 - m)]
-			  = arr_b[j * 1024 + k * 32 + m] * 7;
-		    }
-
-		  /* ...and back-to-back vector loops.  */
-
-		  #pragma acc loop vector
-		  for (m = 0; m < 32; m++)
-		    {
-	              if (((j * 1024 + k * 32 + m) % 2) == 0)
-			arr_b[j * 1024 + k * 32 + (31 - m)]
-			  = arr_a[j * 1024 + k * 32 + m] * 3;
-		      else
-			arr_b[j * 1024 + k * 32 + (31 - m)]
-			  = arr_a[j * 1024 + k * 32 + m] * 2;
-		    }
-		}
-	    }
-	}
-
-	for (i = 0; i < 32 * 32; i++)
-          assert (n[i] == 2);
-
-	for (i = 0; i < 32 * 32 * 32; i++)
-          {
-	    int m = 6 * ((i % 3) == 0 ? 5 : 7);
-	    assert (arr_b[i] == (i ^ 31) * m);
-	  }
-      }
-
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-8.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-8.c
deleted file mode 100644
index 6ed736a..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-partn-8.c
+++ /dev/null
@@ -1,51 +0,0 @@
-/* With -O0, variables are on the stack, not in registers.  Check that worker
-   state propagation handles the stack frame.  */
-
-int
-main (int argc, char *argv[])
-{
-  int w0 = 0;
-  int w1 = 0;
-  int w2 = 0;
-  int w3 = 0;
-  int w4 = 0;
-  int w5 = 0;
-  int w6 = 0;
-  int w7 = 0;
-
-  int i;
-
-#pragma acc parallel num_gangs (1) num_workers (8) copy (w0, w1, w2, w3, w4, w5, w6, w7)
-  {
-    int internal = 100;
-
-#pragma acc loop worker
-    for (i = 0; i < 8; i++)
-      {
-	switch (i)
-	  {
-	  case 0: w0 = internal; break;
-	  case 1: w1 = internal; break;
-	  case 2: w2 = internal; break;
-	  case 3: w3 = internal; break;
-	  case 4: w4 = internal; break;
-	  case 5: w5 = internal; break;
-	  case 6: w6 = internal; break;
-	  case 7: w7 = internal; break;
-	  default: break;
-	  }
-      }
-  }
-
-  if (w0 != 100
-      || w1 != 100
-      || w2 != 100
-      || w3 != 100
-      || w4 != 100
-      || w5 != 100
-      || w6 != 100
-      || w7 != 100)
-    __builtin_abort ();
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-1.c
deleted file mode 100644
index 5a2fb65..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-1.c
+++ /dev/null
@@ -1,27 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test worker-single/vector-single mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr[32], i;
-
-  for (i = 0; i < 32; i++)
-    arr[i] = 0;
-
-  #pragma acc parallel copy(arr) num_gangs(8) num_workers(8) vector_length(32)
-  {
-    int j;
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      arr[j]++;
-  }
-
-  for (i = 0; i < 32; i++)
-    assert (arr[i] == 1);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-1a.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-1a.c
deleted file mode 100644
index 9a21d05..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-1a.c
+++ /dev/null
@@ -1,30 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test worker-single/vector-single mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr[32], i;
-
-  for (i = 0; i < 32; i++)
-    arr[i] = 0;
-
-  #pragma acc parallel copy(arr) num_gangs(8) num_workers(8) vector_length(32)
-  {
-    int j;
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-	#pragma acc atomic
-	arr[j]++;
-      }
-  }
-
-  for (i = 0; i < 32; i++)
-    assert (arr[i] == 1);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-2.c
deleted file mode 100644
index 5bdbe85..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-2.c
+++ /dev/null
@@ -1,30 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test condition in worker-single/vector-single mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr[32], i;
-
-  for (i = 0; i < 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(8) num_workers(8) vector_length(32)
-  {
-    int j;
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      if ((arr[j] % 2) != 0)
-	arr[j]++;
-      else
-	arr[j] += 2;
-  }
-
-  for (i = 0; i < 32; i++)
-    assert (arr[i] == ((i % 2) != 0) ? i + 1 : i + 2);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-3.c
deleted file mode 100644
index 1563019..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-3.c
+++ /dev/null
@@ -1,35 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test switch in worker-single/vector-single mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr[32], i;
-
-  for (i = 0; i < 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(8) num_workers(8) vector_length(32)
-  {
-    int j;
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      switch (arr[j] % 5)
-	{
-	case 0: arr[j] += 1; break;
-	case 1: arr[j] += 2; break;
-	case 2: arr[j] += 3; break;
-	case 3: arr[j] += 4; break;
-	case 4: arr[j] += 5; break;
-	default: arr[j] += 99;
-	}
-  }
-
-  for (i = 0; i < 32; i++)
-    assert (arr[i] == i + (i % 5) + 1);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-4.c
deleted file mode 100644
index 2428514..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-4.c
+++ /dev/null
@@ -1,35 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test worker-single/vector-partitioned mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr[32 * 32], i;
-
-  for (i = 0; i < 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(8) num_workers(8) vector_length(32)
-  {
-    int j;
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-	int k;
-	#pragma acc loop vector
-	for (k = 0; k < 32; k++)
-	  {
-	    #pragma acc atomic
-	    arr[j * 32 + k]++;
-	  }
-      }
-  }
-
-  for (i = 0; i < 32 * 32; i++)
-    assert (arr[i] == i + 1);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-5.c
deleted file mode 100644
index 419634f..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-5.c
+++ /dev/null
@@ -1,51 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test multiple conditional vector-partitioned loops in worker-single
-   mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int arr[32 * 32], i;
-
-  for (i = 0; i < 32 * 32; i++)
-    arr[i] = i;
-
-  #pragma acc parallel copy(arr) num_gangs(8) num_workers(8) vector_length(32)
-  {
-    int j;
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-	int k;
-	if ((j % 3) == 0)
-	  {
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      {
-		#pragma acc atomic
-		arr[j * 32 + k] += 3;
-	      }
-	  }
-	else if ((j % 3) == 1)
-	  {
-	    #pragma acc loop vector
-	    for (k = 0; k < 32; k++)
-	      {
-		#pragma acc atomic
-		arr[j * 32 + k] += 7;
-	      }
-	  }
-      }
-  }
-
-  for (i = 0; i < 32 * 32; i++)
-    {
-      int j = (i / 32) % 3;
-      assert (arr[i] == i + ((j == 0) ? 3 : (j == 1) ? 7 : 0));
-    }
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-6.c libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-6.c
deleted file mode 100644
index c04aa05..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/worker-single-6.c
+++ /dev/null
@@ -1,50 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-#include <openacc.h>
-
-#define ACTUAL_GANGS 8
-
-/* Test worker-single, vector-partitioned, gang-redundant mode.  */
-
-int
-main (int argc, char *argv[])
-{
-  int n, arr[32], i;
-  int ondev;
-
-  for (i = 0; i < 32; i++)
-    arr[i] = 0;
-
-  n = 0;
-
-  #pragma acc parallel copy(n, arr) num_gangs(ACTUAL_GANGS) num_workers(8) \
-	  vector_length(32) copyout(ondev)
-  {
-    int j;
-
-    ondev = acc_on_device (acc_device_not_host);
-
-    #pragma acc atomic
-    n++;
-
-    #pragma acc loop vector
-    for (j = 0; j < 32; j++)
-      {
-	#pragma acc atomic
-	arr[j] += 1;
-      }
-
-    #pragma acc atomic
-    n++;
-  }
-
-  int m = ondev ? ACTUAL_GANGS : 1;
-  
-  assert (n == m * 2);
-
-  for (i = 0; i < 32; i++)
-    assert (arr[i] == m);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-fortran/collapse-5.f90 libgomp/testsuite/libgomp.oacc-fortran/collapse-5.f90
index c6d0b4d..8c20f04 100644
--- libgomp/testsuite/libgomp.oacc-fortran/collapse-5.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/collapse-5.f90
@@ -15,7 +15,7 @@ program collapse5
   v4 = 4
   v5 = 13
   v6 = 18
-  !$acc parallel copy (l)
+  !$acc parallel
   !$acc loop collapse (3) reduction (.or.:l)
     do i = v1, v2
       do j = v3, v4
diff --git libgomp/testsuite/libgomp.oacc-fortran/collapse-6.f90 libgomp/testsuite/libgomp.oacc-fortran/collapse-6.f90
index 4de724d..7404b91 100644
--- libgomp/testsuite/libgomp.oacc-fortran/collapse-6.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/collapse-6.f90
@@ -18,7 +18,7 @@ program collapse6
   v7 = 1
   v8 = 1
   v9 = 1
-  !$acc parallel copy (l)
+  !$acc parallel
   !$acc loop collapse (3) reduction (.or.:l)
     do i = v1, v2, v7
       do j = v3, v4, v8
diff --git libgomp/testsuite/libgomp.oacc-fortran/collapse-7.f90 libgomp/testsuite/libgomp.oacc-fortran/collapse-7.f90
index 8a0d6d1..12efd8c 100644
--- libgomp/testsuite/libgomp.oacc-fortran/collapse-7.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/collapse-7.f90
@@ -8,7 +8,7 @@ program collapse7
   r = .false.
   a(:, :, :) = 0
   b(:, :, :) = 0
-  !$acc parallel copy (l)
+  !$acc parallel
   !$acc loop collapse (3) reduction (.or.:l)
     do i = 2, 6
       do j = -2, 4
diff --git libgomp/testsuite/libgomp.oacc-fortran/collapse-8.f90 libgomp/testsuite/libgomp.oacc-fortran/collapse-8.f90
index ca3ef0a..04fbcfe 100644
--- libgomp/testsuite/libgomp.oacc-fortran/collapse-8.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/collapse-8.f90
@@ -15,7 +15,7 @@ program collapse8
   v4 = 4
   v5 = 13
   v6 = 18
-  !$acc parallel copy (l)
+  !$acc parallel
   !$acc loop collapse (3) reduction (.or.:l)
     do i = v1, v2
       do j = v3, v4
diff --git libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90 libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90
new file mode 100644
index 0000000..2535eb8
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/private-variables.f90
@@ -0,0 +1,552 @@
+! Miscellaneous tests for private variables.
+
+! { dg-do run }
+
+
+! Test of gang-private variables declared on loop directive.
+
+subroutine t1()
+  integer :: x, i, arr(32)
+
+  do i = 1, 32
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  ! { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 15 }
+  ! { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 15 }
+  !$acc loop gang private(x)
+  do i = 1, 32
+     x = i * 2;
+     arr(i) = arr(i) + x
+  end do
+  !$acc end parallel
+
+  do i = 1, 32
+     if (arr(i) .ne. i * 3) call abort
+  end do
+end subroutine t1
+
+
+! Test of gang-private variables declared on loop directive, with broadcasting
+! to partitioned workers.
+
+subroutine t2()
+  integer :: x, i, j, arr(0:32*32)
+
+  do i = 0, 32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  ! { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 41 }
+  !$acc loop gang private(x)
+  do i = 0, 31
+     x = i * 2;
+
+     !$acc loop worker
+     do j = 0, 31
+        arr(i * 32 + j) = arr(i * 32 + j) + x
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 * 32 - 1
+     if (arr(i) .ne. i + (i / 32) * 2) call abort
+  end do
+end subroutine t2
+
+
+! Test of gang-private variables declared on loop directive, with broadcasting
+! to partitioned vectors.
+
+subroutine t3()
+  integer :: x, i, j, arr(0:32*32)
+
+  do i = 0, 32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  ! { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 70 }
+  !$acc loop gang private(x)
+  do i = 0, 31
+     x = i * 2;
+
+     !$acc loop vector
+     do j = 0, 31
+        arr(i * 32 + j) = arr(i * 32 + j) + x
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 * 32 - 1
+     if (arr(i) .ne. i + (i / 32) * 2) call abort
+  end do
+end subroutine t3
+
+
+! Test of gang-private addressable variable declared on loop directive, with
+! broadcasting to partitioned workers.
+
+subroutine t4()
+  type vec3
+     integer x, y, z, attr(13)
+  end type vec3
+
+  integer i, j, arr(0:32*32)
+  type(vec3) pt
+  
+  do i = 0, 32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  ! { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 104 }
+  !$acc loop gang private(pt)
+  do i = 0, 31
+     pt%x = i
+     pt%y = i * 2
+     pt%z = i * 4
+     pt%attr(5) = i * 6
+
+     !$acc loop vector
+     do j = 0, 31
+        arr(i * 32 + j) = arr(i * 32 + j) + pt%x + pt%y + pt%z + pt%attr(5);
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 * 32 - 1
+     if (arr(i) .ne. i + (i / 32) * 13) call abort
+  end do
+end subroutine t4
+
+
+! Test of vector-private variables declared on loop directive.
+
+subroutine t5()
+  integer :: x, i, j, k, idx, arr(0:32*32*32)
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker
+     do j = 0, 31
+        !$acc loop vector private(x)
+        do k = 0, 31
+           x = ieor(i, j * 3)
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+        !$acc loop vector private(x)
+        do k = 0, 31
+           x = ior(i, j * 5)
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t5
+
+
+! Test of vector-private variables declared on loop directive. Array type.
+
+subroutine t6()
+  integer :: i, j, k, idx, arr(0:32*32*32), pt(2)
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker
+     do j = 0, 31
+        !$acc loop vector private(x, pt)
+        do k = 0, 31
+           pt(1) = ieor(i, j * 3)
+           pt(2) = ior(i, j * 5)
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt(1) * k
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt(2) * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t6
+
+
+! Test of worker-private variables declared on a loop directive.
+
+subroutine t7()
+  integer :: x, i, j, arr(0:32*32)
+  common x
+
+  do i = 0, 32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  ! { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 215 }
+  !$acc loop gang private(x)
+  do i = 0, 31
+     !$acc loop worker private(x)
+     do j = 0, 31
+        x = ieor(i, j * 3)
+        arr(i * 32 + j) = arr(i * 32 + j) + x
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 * 32 - 1
+     if (arr(i) .ne. i + ieor(i / 32, mod(i, 32) * 3)) call abort
+  end do
+end subroutine t7
+
+
+! Test of worker-private variables declared on a loop directive, broadcasting
+! to vector-partitioned mode.
+
+subroutine t8()
+  integer :: x, i, j, k, idx, arr(0:32*32*32)
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker private(x)
+     do j = 0, 31
+        x = ieor(i, j * 3)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k) call abort
+        end do
+     end do
+  end do
+end subroutine t8
+
+
+! Test of worker-private variables declared on a loop directive, broadcasting
+! to vector-partitioned mode.  Back-to-back worker loops.
+
+subroutine t9()
+  integer :: x, i, j, k, idx, arr(0:32*32*32)
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker private(x)
+     do j = 0, 31
+        x = ieor(i, j * 3)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+     end do
+
+     !$acc loop worker private(x)
+     do j = 0, 31
+        x = ior(i, j * 5)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t9
+
+
+! Test of worker-private variables declared on a loop directive, broadcasting
+! to vector-partitioned mode.  Successive vector loops.  */
+
+subroutine t10()
+  integer :: x, i, j, k, idx, arr(0:32*32*32)
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker private(x)
+     do j = 0, 31
+        x = ieor(i, j * 3)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+
+        x = ior(i, j * 5)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t10
+
+
+! Test of worker-private variables declared on a loop directive, broadcasting
+! to vector-partitioned mode.  Addressable worker variable.
+
+subroutine t11()
+  integer :: i, j, k, idx, arr(0:32*32*32)
+  integer, target :: x
+  integer, pointer :: p
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker private(x, p)
+     do j = 0, 31
+        p => x
+        x = ieor(i, j * 3)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+
+        p = ior(i, j * 5)
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t11
+
+
+! Test of worker-private variables declared on a loop directive, broadcasting
+! to vector-partitioned mode.  Aggregate worker variable.
+
+subroutine t12()
+  type vec2
+     integer x, y
+  end type vec2
+  
+  integer :: i, j, k, idx, arr(0:32*32*32)
+  type(vec2) :: pt
+  
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker private(pt)
+     do j = 0, 31
+        pt%x = ieor(i, j * 3)
+        pt%y = ior(i, j * 5)
+        
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt%x * k
+        end do
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt%y * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t12
+
+
+! Test of worker-private variables declared on loop directive, broadcasting
+! to vector-partitioned mode.  Array worker variable.
+
+subroutine t13()
+  integer :: i, j, k, idx, arr(0:32*32*32), pt(2)
+
+  do i = 0, 32*32*32-1
+     arr(i) = i
+  end do
+
+  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
+  !$acc loop gang
+  do i = 0, 31
+     !$acc loop worker private(pt)
+     do j = 0, 31
+        pt(1) = ieor(i, j * 3)
+        pt(2) = ior(i, j * 5)
+        
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt(1) * k
+        end do
+
+        !$acc loop vector
+        do k = 0, 31
+           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt(2) * k
+        end do
+     end do
+  end do
+  !$acc end parallel
+
+  do i = 0, 32 - 1
+     do j = 0, 32 -1
+        do k = 0, 32 - 1
+           idx = i * 1024 + j * 32 + k
+           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
+              call abort
+           end if
+        end do
+     end do
+  end do
+end subroutine t13
+
+
+! Test of gang-private variables declared on the parallel directive.
+
+subroutine t14()
+  use openacc
+  integer :: x = 5
+  integer, parameter :: n = 32
+  integer :: arr(n)
+
+  do i = 1, n
+    arr(i) = 3
+  end do
+
+  !$acc parallel private(x) copy(arr) num_gangs(n) num_workers(8) vector_length(32)
+  ! { dg-warning "region is worker partitioned but does not contain worker partitioned code" "worker" { target *-*-* } 515 }
+  ! { dg-warning "region is vector partitioned but does not contain vector partitioned code" "vector" { target *-*-* } 515 }
+    !$acc loop gang(static:1)
+    do i = 1, n
+      x = i * 2;
+    end do
+
+   !$acc loop gang(static:1)
+    do i = 1, n
+      if (acc_on_device (acc_device_host) .eqv. .TRUE.) x = i * 2
+      arr(i) = arr(i) + x
+    end do
+  !$acc end parallel
+
+  do i = 1, n
+    if (arr(i) .ne. (3 + i * 2)) call abort
+  end do
+
+end subroutine t14
+
+
+program main
+  call t1()
+  call t2()
+  call t3()
+  call t4()
+  call t5()
+  call t6()
+  call t7()
+  call t8()
+  call t9()
+  call t10()
+  call t11()
+  call t12()
+  call t13()
+  call t14()
+end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-gang-1.f90 libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-gang-1.f90
deleted file mode 100644
index 4adeff0..0000000
--- libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-gang-1.f90
+++ /dev/null
@@ -1,23 +0,0 @@
-! { dg-additional-options "-w" }
-
-! Test of gang-private variables declared on loop directive.
-
-program main
-  integer :: x, i, arr(32)
-
-  do i = 1, 32
-     arr(i) = i
-  end do
-
-  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  !$acc loop gang private(x)
-  do i = 1, 32
-     x = i * 2;
-     arr(i) = arr(i) + x
-  end do
-  !$acc end parallel
-
-  do i = 1, 32
-     if (arr(i) .ne. i * 3) call abort
-  end do
-end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-gang-2.f90 libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-gang-2.f90
deleted file mode 100644
index 3339d29..0000000
--- libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-gang-2.f90
+++ /dev/null
@@ -1,28 +0,0 @@
-! { dg-additional-options "-cpp -w" }
-
-! Test of gang-private variables declared on loop directive, with broadcasting
-! to partitioned workers.
-
-program main
-  integer :: x, i, j, arr(0:32*32)
-
-  do i = 0, 32*32-1
-     arr(i) = i
-  end do
-
-  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  !$acc loop gang private(x)
-  do i = 0, 31
-     x = i * 2;
-
-     !$acc loop worker
-     do j = 0, 31
-        arr(i * 32 + j) = arr(i * 32 + j) + x
-     end do
-  end do
-  !$acc end parallel
-
-  do i = 0, 32 * 32 - 1
-     if (arr(i) .ne. i + (i / 32) * 2) call abort
-  end do
-end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-gang-3.f90 libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-gang-3.f90
deleted file mode 100644
index c828efb..0000000
--- libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-gang-3.f90
+++ /dev/null
@@ -1,28 +0,0 @@
-! { dg-additional-options "-w" }
-
-! Test of gang-private variables declared on loop directive, with broadcasting
-! to partitioned vectors.
-
-program main
-  integer :: x, i, j, arr(0:32*32)
-
-  do i = 0, 32*32-1
-     arr(i) = i
-  end do
-
-  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  !$acc loop gang private(x)
-  do i = 0, 31
-     x = i * 2;
-
-     !$acc loop vector
-     do j = 0, 31
-        arr(i * 32 + j) = arr(i * 32 + j) + x
-     end do
-  end do
-  !$acc end parallel
-
-  do i = 0, 32 * 32 - 1
-     if (arr(i) .ne. i + (i / 32) * 2) call abort
-  end do
-end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-gang-6.f90 libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-gang-6.f90
deleted file mode 100644
index 9abd586..0000000
--- libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-gang-6.f90
+++ /dev/null
@@ -1,36 +0,0 @@
-! { dg-additional-options "-w" }
-
-! Test of gang-private addressable variable declared on loop directive, with
-! broadcasting to partitioned workers.
-
-program main
-  type vec3
-     integer x, y, z, attr(13)
-  end type vec3
-
-  integer x, i, j, arr(0:32*32)
-  type(vec3) pt
-  
-  do i = 0, 32*32-1
-     arr(i) = i
-  end do
-
-  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  !$acc loop gang private(pt)
-  do i = 0, 31
-     pt%x = i
-     pt%y = i * 2
-     pt%z = i * 4
-     pt%attr(5) = i * 6
-
-     !$acc loop vector
-     do j = 0, 31
-        arr(i * 32 + j) = arr(i * 32 + j) + pt%x + pt%y + pt%z + pt%attr(5);
-     end do
-  end do
-  !$acc end parallel
-
-  do i = 0, 32 * 32 - 1
-     if (arr(i) .ne. i + (i / 32) * 13) call abort
-  end do
-end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-vector-1.f90 libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-vector-1.f90
deleted file mode 100644
index 7fa900a..0000000
--- libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-vector-1.f90
+++ /dev/null
@@ -1,39 +0,0 @@
-! Test of vector-private variables declared on loop directive.
-
-program main
-  integer :: x, i, j, k, idx, arr(0:32*32*32)
-
-  do i = 0, 32*32*32-1
-     arr(i) = i
-  end do
-
-  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  !$acc loop gang
-  do i = 0, 31
-     !$acc loop worker
-     do j = 0, 31
-        !$acc loop vector private(x)
-        do k = 0, 31
-           x = ieor(i, j * 3)
-           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
-        end do
-        !$acc loop vector private(x)
-        do k = 0, 31
-           x = ior(i, j * 5)
-           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
-        end do
-     end do
-  end do
-  !$acc end parallel
-
-  do i = 0, 32 - 1
-     do j = 0, 32 -1
-        do k = 0, 32 - 1
-           idx = i * 1024 + j * 32 + k
-           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
-              call abort
-           end if
-        end do
-     end do
-  end do
-end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-vector-2.f90 libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-vector-2.f90
deleted file mode 100644
index 5456c38..0000000
--- libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-vector-2.f90
+++ /dev/null
@@ -1,36 +0,0 @@
-! Test of vector-private variables declared on loop directive. Array type.
-
-program main
-  integer :: i, j, k, idx, arr(0:32*32*32), pt(2)
-
-  do i = 0, 32*32*32-1
-     arr(i) = i
-  end do
-
-  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  !$acc loop gang
-  do i = 0, 31
-     !$acc loop worker
-     do j = 0, 31
-        !$acc loop vector private(x, pt)
-        do k = 0, 31
-           pt(1) = ieor(i, j * 3)
-           pt(2) = ior(i, j * 5)
-           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt(1) * k
-           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt(2) * k
-        end do
-     end do
-  end do
-  !$acc end parallel
-
-  do i = 0, 32 - 1
-     do j = 0, 32 -1
-        do k = 0, 32 - 1
-           idx = i * 1024 + j * 32 + k
-           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
-              call abort
-           end if
-        end do
-     end do
-  end do
-end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-1.f90 libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-1.f90
deleted file mode 100644
index 9fec621..0000000
--- libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-1.f90
+++ /dev/null
@@ -1,27 +0,0 @@
-! { dg-additional-options "-cpp -w" }
-
-! Test of worker-private variables declared on a loop directive.
-
-program main
-  integer :: x, i, j, arr(0:32*32)
-  common x
-
-  do i = 0, 32*32-1
-     arr(i) = i
-  end do
-
-  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  !$acc loop gang private(x)
-  do i = 0, 31
-     !$acc loop worker private(x)
-     do j = 0, 31
-        x = ieor(i, j * 3)
-        arr(i * 32 + j) = arr(i * 32 + j) + x
-     end do
-  end do
-  !$acc end parallel
-
-  do i = 0, 32 * 32 - 1
-     if (arr(i) .ne. i + ieor(i / 32, mod(i, 32) * 3)) call abort
-  end do
-end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-2.f90 libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-2.f90
deleted file mode 100644
index 725f175..0000000
--- libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-2.f90
+++ /dev/null
@@ -1,34 +0,0 @@
-! Test of worker-private variables declared on a loop directive, broadcasting
-! to vector-partitioned mode.
-
-program main
-  integer :: x, i, j, k, idx, arr(0:32*32*32)
-
-  do i = 0, 32*32*32-1
-     arr(i) = i
-  end do
-
-  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  !$acc loop gang
-  do i = 0, 31
-     !$acc loop worker private(x)
-     do j = 0, 31
-        x = ieor(i, j * 3)
-
-        !$acc loop vector
-        do k = 0, 31
-           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
-        end do
-     end do
-  end do
-  !$acc end parallel
-
-  do i = 0, 32 - 1
-     do j = 0, 32 -1
-        do k = 0, 32 - 1
-           idx = i * 1024 + j * 32 + k
-           if (arr(idx) .ne. idx + ieor(i, j * 3) * k) call abort
-        end do
-     end do
-  end do
-end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-3.f90 libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-3.f90
deleted file mode 100644
index 29239ec..0000000
--- libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-3.f90
+++ /dev/null
@@ -1,46 +0,0 @@
-! Test of worker-private variables declared on a loop directive, broadcasting
-! to vector-partitioned mode.  Back-to-back worker loops.
-
-program main
-  integer :: x, i, j, k, idx, arr(0:32*32*32)
-
-  do i = 0, 32*32*32-1
-     arr(i) = i
-  end do
-
-  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  !$acc loop gang
-  do i = 0, 31
-     !$acc loop worker private(x)
-     do j = 0, 31
-        x = ieor(i, j * 3)
-
-        !$acc loop vector
-        do k = 0, 31
-           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
-        end do
-     end do
-
-     !$acc loop worker private(x)
-     do j = 0, 31
-        x = ior(i, j * 5)
-
-        !$acc loop vector
-        do k = 0, 31
-           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
-        end do
-     end do
-  end do
-  !$acc end parallel
-
-  do i = 0, 32 - 1
-     do j = 0, 32 -1
-        do k = 0, 32 - 1
-           idx = i * 1024 + j * 32 + k
-           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
-              call abort
-           end if
-        end do
-     end do
-  end do
-end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-4.f90 libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-4.f90
deleted file mode 100644
index 9f621ef..0000000
--- libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-4.f90
+++ /dev/null
@@ -1,43 +0,0 @@
-! Test of worker-private variables declared on a loop directive, broadcasting
-! to vector-partitioned mode.  Successive vector loops.  */
-
-program main
-  integer :: x, i, j, k, idx, arr(0:32*32*32)
-
-  do i = 0, 32*32*32-1
-     arr(i) = i
-  end do
-
-  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  !$acc loop gang
-  do i = 0, 31
-     !$acc loop worker private(x)
-     do j = 0, 31
-        x = ieor(i, j * 3)
-
-        !$acc loop vector
-        do k = 0, 31
-           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
-        end do
-
-        x = ior(i, j * 5)
-
-        !$acc loop vector
-        do k = 0, 31
-           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
-        end do
-     end do
-  end do
-  !$acc end parallel
-
-  do i = 0, 32 - 1
-     do j = 0, 32 -1
-        do k = 0, 32 - 1
-           idx = i * 1024 + j * 32 + k
-           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
-              call abort
-           end if
-        end do
-     end do
-  end do
-end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-5.f90 libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-5.f90
deleted file mode 100644
index fa65f5e..0000000
--- libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-5.f90
+++ /dev/null
@@ -1,46 +0,0 @@
-! Test of worker-private variables declared on a loop directive, broadcasting
-! to vector-partitioned mode.  Addressable worker variable.
-
-program main
-  integer :: i, j, k, idx, arr(0:32*32*32)
-  integer, target :: x
-  integer, pointer :: p
-
-  do i = 0, 32*32*32-1
-     arr(i) = i
-  end do
-
-  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  !$acc loop gang
-  do i = 0, 31
-     !$acc loop worker private(x, p)
-     do j = 0, 31
-        p => x
-        x = ieor(i, j * 3)
-
-        !$acc loop vector
-        do k = 0, 31
-           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
-        end do
-
-        p = ior(i, j * 5)
-
-        !$acc loop vector
-        do k = 0, 31
-           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + x * k
-        end do
-     end do
-  end do
-  !$acc end parallel
-
-  do i = 0, 32 - 1
-     do j = 0, 32 -1
-        do k = 0, 32 - 1
-           idx = i * 1024 + j * 32 + k
-           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
-              call abort
-           end if
-        end do
-     end do
-  end do
-end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-6.f90 libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-6.f90
deleted file mode 100644
index 45bc414..0000000
--- libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-6.f90
+++ /dev/null
@@ -1,47 +0,0 @@
-! Test of worker-private variables declared on a loop directive, broadcasting
-! to vector-partitioned mode.  Aggregate worker variable.
-
-program main
-  type vec2
-     integer x, y
-  end type vec2
-  
-  integer :: i, j, k, idx, arr(0:32*32*32)
-  type(vec2) :: pt
-  
-  do i = 0, 32*32*32-1
-     arr(i) = i
-  end do
-
-  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  !$acc loop gang
-  do i = 0, 31
-     !$acc loop worker private(pt)
-     do j = 0, 31
-        pt%x = ieor(i, j * 3)
-        pt%y = ior(i, j * 5)
-        
-        !$acc loop vector
-        do k = 0, 31
-           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt%x * k
-        end do
-
-        !$acc loop vector
-        do k = 0, 31
-           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt%y * k
-        end do
-     end do
-  end do
-  !$acc end parallel
-
-  do i = 0, 32 - 1
-     do j = 0, 32 -1
-        do k = 0, 32 - 1
-           idx = i * 1024 + j * 32 + k
-           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
-              call abort
-           end if
-        end do
-     end do
-  end do
-end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-7.f90 libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-7.f90
deleted file mode 100644
index a046e77..0000000
--- libgomp/testsuite/libgomp.oacc-fortran/private-vars-loop-worker-7.f90
+++ /dev/null
@@ -1,42 +0,0 @@
-! Test of worker-private variables declared on loop directive, broadcasting
-! to vector-partitioned mode.  Array worker variable.
-
-program main
-  integer :: i, j, k, idx, arr(0:32*32*32), pt(2)
-
-  do i = 0, 32*32*32-1
-     arr(i) = i
-  end do
-
-  !$acc parallel copy(arr) num_gangs(32) num_workers(8) vector_length(32)
-  !$acc loop gang
-  do i = 0, 31
-     !$acc loop worker private(pt)
-     do j = 0, 31
-        pt(1) = ieor(i, j * 3)
-        pt(2) = ior(i, j * 5)
-        
-        !$acc loop vector
-        do k = 0, 31
-           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt(1) * k
-        end do
-
-        !$acc loop vector
-        do k = 0, 31
-           arr(i * 1024 + j * 32 + k) = arr(i * 1024 + j * 32 + k) + pt(2) * k
-        end do
-     end do
-  end do
-  !$acc end parallel
-
-  do i = 0, 32 - 1
-     do j = 0, 32 -1
-        do k = 0, 32 - 1
-           idx = i * 1024 + j * 32 + k
-           if (arr(idx) .ne. idx + ieor(i, j * 3) * k + ior(i, j * 5) * k) then
-              call abort
-           end if
-        end do
-     end do
-  end do
-end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/private-vars-par-gang-2.f90 libgomp/testsuite/libgomp.oacc-fortran/private-vars-par-gang-2.f90
deleted file mode 100644
index cad7c5f..0000000
--- libgomp/testsuite/libgomp.oacc-fortran/private-vars-par-gang-2.f90
+++ /dev/null
@@ -1,32 +0,0 @@
-! { dg-additional-options "-w" }
-
-! Test of gang-private variables declared on the parallel directive.
-
-program main
-  use openacc
-  integer :: x = 5
-  integer, parameter :: n = 32
-  integer :: arr(n)
-
-  do i = 1, n
-    arr(i) = 3
-  end do
-
-  !$acc parallel private(x) copy(arr) num_gangs(n) num_workers(8) vector_length(32)
-    !$acc loop gang(static:1)
-    do i = 1, n
-      x = i * 2;
-    end do
-
-   !$acc loop gang(static:1)
-    do i = 1, n
-      if (acc_on_device (acc_device_host) .eqv. .TRUE.) x = i * 2
-      arr(i) = arr(i) + x
-    end do
-  !$acc end parallel
-
-  do i = 1, n
-    if (arr(i) .ne. (3 + i * 2)) call abort
-  end do
-
-end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90 libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90
index 1a60ca8..6301b03 100644
--- libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90
@@ -1,5 +1,5 @@
 ! { dg-do run }
-! { dg-additional-options "-cpp -w" }
+! { dg-additional-options "-cpp" }
 
 #define M 8
 #define N 32
@@ -99,7 +99,7 @@ subroutine gang (a)
   integer, intent (inout) :: a(N)
   integer :: i
 
-  !$acc loop gang
+  !$acc loop gang worker vector
   do i = 1, N
     a(i) = a(i) - i 
   end do
diff --git libgomp/testsuite/libgomp.oacc-fortran/update-1-2.f90 libgomp/testsuite/libgomp.oacc-fortran/update-1-2.f90
deleted file mode 100644
index 3f47c3c..0000000
--- libgomp/testsuite/libgomp.oacc-fortran/update-1-2.f90
+++ /dev/null
@@ -1,239 +0,0 @@
-! Copy of update-1.f90 with self exchanged with host for !$acc update
-
-! { dg-do run }
-! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
-
-program update
-  use openacc
-  implicit none 
-  integer, parameter :: N = 8
-  real :: a(N), b(N)
-  integer i
-
-  do i = 1, N
-    a(i) = 3.0
-    b(i) = 0.0
-  end do
-
-  !$acc enter data copyin (a, b)
-
-  !$acc parallel present (a, b)
-    do i = 1, N
-      b(i) = a(i)
-    end do
-  !$acc end parallel
-
-  !$acc update self (a, b)
-
-  do i = 1, N
-    if (a(i) .ne. 3.0) call abort
-    if (b(i) .ne. 3.0) call abort
-  end do
-
-  if (acc_is_present (a) .neqv. .TRUE.) call abort
-  if (acc_is_present (b) .neqv. .TRUE.) call abort
-
-  do i = 1, N
-    a(i) = 5.0
-    b(i) = 1.0
-  end do
-
-  !$acc update device (a, b)
-
-  !$acc parallel present (a, b)
-    do i = 1, N
-      b(i) = a(i)
-    end do 
-  !$acc end parallel
-
-  !$acc update self (a, b)
-
-  do i = 1, N
-    if (a(i) .ne. 5.0) call abort
-    if (b(i) .ne. 5.0) call abort
- end do
-
-  if (acc_is_present (a) .neqv. .TRUE.) call abort
-  if (acc_is_present (b) .neqv. .TRUE.) call abort
-
-  !$acc parallel present (a, b)
-  do i = 1, N
-    b(i) = a(i)
-  end do
-  !$acc end parallel
-
-  !$acc update self (a, b)
-
-  do i = 1, N
-    if (a(i) .ne. 5.0) call abort
-    if (b(i) .ne. 5.0) call abort
-  end do
-
-  do i = 1, N
-    a(i) = 6.0
-    b(i) = 0.0
-  end do
-
-  !$acc update device (a, b)
-
-  do i = 1, N
-    a(i) = 9.0
-  end do
-
-  !$acc parallel present (a, b)
-    do i = 1, N
-      b(i) = a(i)
-    end do
-  !$acc end parallel
-
-  !$acc update self (a, b)
-
-  do i = 1, N
-    if (a(i) .ne. 6.0) call abort
-    if (b(i) .ne. 6.0) call abort
-  end do
-
-  if (acc_is_present (a) .neqv. .TRUE.) call abort
-  if (acc_is_present (b) .neqv. .TRUE.) call abort
-
-  do i = 1, N
-    a(i) = 7.0
-    b(i) = 2.0
-  end do
-
-  !$acc update device (a, b)
-
-  do i = 1, N
-    a(i) = 9.0
-  end do
-
-  !$acc parallel present (a, b)
-    do i = 1, N
-      b(i) = a(i)
-    end do
-  !$acc end parallel
-
-  !$acc update self (a, b)
-
-  do i = 1, N
-    if (a(i) .ne. 7.0) call abort
-    if (b(i) .ne. 7.0) call abort
-  end do
-
-  do i = 1, N
-    a(i) = 9.0
-  end do
-
-  !$acc update device (a)
-
-  !$acc parallel present (a, b)
-    do i = 1, N
-      b(i) = a(i)
-    end do
-  !$acc end parallel
-
-  !$acc update self (a, b)
-
-  do i = 1, N
-    if (a(i) .ne. 9.0) call abort
-    if (b(i) .ne. 9.0) call abort
-  end do
-
-  if (acc_is_present (a) .neqv. .TRUE.) call abort
-  if (acc_is_present (b) .neqv. .TRUE.) call abort
-
-  do i = 1, N
-    a(i) = 5.0
-  end do
-
-  !$acc update device (a)
-
-  do i = 1, N
-    a(i) = 6.0
-  end do
-
-  !$acc update device (a(1:rshift (N, 1)))
-
-  !$acc parallel present (a, b)
-    do i = 1, N
-      b(i) = a(i)
-    end do
-  !$acc end parallel
-
-  !$acc update self (a, b)
-
-  do i = 1, rshift (N, 1)
-    if (a(i) .ne. 6.0) call abort
-    if (b(i) .ne. 6.0) call abort
-  end do
-
-  do i = rshift (N, 1) + 1, N
-    if (a(i) .ne. 5.0) call abort
-    if (b(i) .ne. 5.0) call abort
-  end do
-
-  if (acc_is_present (a) .neqv. .TRUE.) call abort
-  if (acc_is_present (b) .neqv. .TRUE.) call abort
-
-  do i = 1, N
-    a(i) = 0.0
-  end do
-
-  !$acc update device (a(1:4))
-
-  !$acc parallel present (a)
-    do i = 1, N
-      a(i) = a(i) + 1.0
-    end do
-  !$acc end parallel
-
-  !$acc update self (a(5:N))
-
-  do i = 1, rshift (N, 1)
-    if (a(i) .ne. 0.0) call abort
-  end do
-
-  do i = rshift (N, 1) + 1, N
-    if (a(i) .ne. 6.0) call abort
-  end do
-
-  !$acc update self (a(1:4))
-
-  do i = 1, rshift (N, 1)
-    if (a(i) .ne. 1.0) call abort
-  end do
-
-  do i = rshift (N, 1) + 1, N
-    if (a(i) .ne. 6.0) call abort
-  end do
-
-  a(3) = 9
-  a(4) = 9
-  a(5) = 9
-  a(6) = 9
-
-  !$acc update device (a(3:6))
-
-  !$acc parallel present (a(1:N))
-    do i = 1, N
-      a(i) = a(i) + 1.0
-    end do
-  !$acc end parallel
-
-  !$acc update self (a(3:6))
-
-  do i = 1, 2
-    if (a(i) .ne. 1.0) call abort
-  end do
-
-  do i = 3, 6
-    if (a(i) .ne. 10.0) call abort
-  end do
-
-  do i = 7, N
-    if (a(i) .ne. 6.0) call abort
-  end do
-  !$acc exit data delete (a, b)
-
-end program
-


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Merge libgomp.oacc-c-c++-common/loop-reduction-*.c into libgomp.oacc-c-c++-common/reduction-7.c (was: [gomp4] Update OpenACC test cases)
  2016-04-04 10:40     ` [gomp4] " Thomas Schwinge
@ 2016-04-12 11:08       ` Thomas Schwinge
  0 siblings, 0 replies; 5+ messages in thread
From: Thomas Schwinge @ 2016-04-12 11:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: Cesar Philippidis

Hi!

On Mon, 04 Apr 2016 12:39:37 +0200, I wrote:
> [...] gomp-4_0-branch [...] additional (cleanup) changes [...]

>     	libgomp/
>     	[...]
>     	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c:
>     	Merge this file, and...
>     	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c:
>     	... this file, and...
>     	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c:
>     	... this file, and...
>     	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c:
>     	... this file, and...
>     	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c:
>     	... this file, and...
>     	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c:
>     	... this file, and...
>     	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c:
>     	... this file, and...
>     	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c:
>     	... this file, and...
>     	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c:
>     	... this file, and...
>     	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c:
>     	... this file, and...
>     	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c:
>     	... this file, and...
>     	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c:
>     	... this file, and...
>     	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c:
>     	... this file into...
>     	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: ... this new
>     	file.  Use dg-warning directives instead of specifying the -w
>     	compiler option.
>     	[...]

Cesar didn't pick these up in his recent trunk commit, so now applied
this to trunk in r234899:

commit 40495bd0847a05aa76cc37e05292cf937449f9dd
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Tue Apr 12 11:02:32 2016 +0000

    Merge libgomp.oacc-c-c++-common/loop-reduction-*.c into libgomp.oacc-c-c++-common/reduction-7.c
    
    	libgomp/
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c:
    	Merge this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c:
    	... this file, and...
    	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c:
    	... this file into...
    	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: ... this
    	file.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@234899 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog                                  |  29 ++
 .../loop-reduction-gang-np-1.c                     |  45 ---
 .../loop-reduction-gv-np-1.c                       |  30 --
 .../loop-reduction-gw-np-1.c                       |  30 --
 .../loop-reduction-gwv-np-1.c                      |  28 --
 .../loop-reduction-gwv-np-2.c                      |  34 --
 .../loop-reduction-gwv-np-3.c                      |  33 --
 .../loop-reduction-gwv-np-4.c                      |  55 ----
 .../loop-reduction-vector-p-1.c                    |  43 ---
 .../loop-reduction-vector-p-2.c                    |  41 ---
 .../loop-reduction-worker-p-1.c                    |  43 ---
 .../loop-reduction-wv-p-1.c                        |  41 ---
 .../loop-reduction-wv-p-2.c                        |  45 ---
 .../loop-reduction-wv-p-3.c                        |  38 ---
 .../libgomp.oacc-c-c++-common/reduction-7.c        | 351 +++++++++++++++++++++
 15 files changed, 380 insertions(+), 506 deletions(-)

diff --git libgomp/ChangeLog libgomp/ChangeLog
index 6071b23..1716ba0 100644
--- libgomp/ChangeLog
+++ libgomp/ChangeLog
@@ -1,5 +1,34 @@
 2016-04-12  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c:
+	Merge this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c:
+	... this file, and...
+	* testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c:
+	... this file into...
+	* testsuite/libgomp.oacc-c-c++-common/reduction-7.c: ... this
+	file.
+
 	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-2.c:
 	Make failure observable.
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c
deleted file mode 100644
index 55ab3c9..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gang-np-1.c
+++ /dev/null
@@ -1,45 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test of reduction on loop directive (gangs, non-private reduction
-   variable).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, arr[1024], res = 0, hres = 0;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res)
-  {
-    #pragma acc loop gang reduction(+:res)
-    for (i = 0; i < 1024; i++)
-      res += arr[i];
-  }
-
-  for (i = 0; i < 1024; i++)
-    hres += arr[i];
-
-  assert (res == hres);
-
-  res = hres = 1;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res)
-  {
-    #pragma acc loop gang reduction(*:res)
-    for (i = 0; i < 12; i++)
-      res *= arr[i];
-  }
-
-  for (i = 0; i < 12; i++)
-    hres *= arr[i];
-
-  assert (res == hres);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c
deleted file mode 100644
index d4341e9..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gv-np-1.c
+++ /dev/null
@@ -1,30 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test of reduction on loop directive (gangs and vectors, non-private
-   reduction variable).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, arr[1024], res = 0, hres = 0;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res)
-  {
-    #pragma acc loop gang vector reduction(+:res)
-    for (i = 0; i < 1024; i++)
-      res += arr[i];
-  }
-
-  for (i = 0; i < 1024; i++)
-    hres += arr[i];
-
-  assert (res == hres);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c
deleted file mode 100644
index 2e5668b..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gw-np-1.c
+++ /dev/null
@@ -1,30 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test of reduction on loop directive (gangs and workers, non-private
-   reduction variable).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, arr[1024], res = 0, hres = 0;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res)
-  {
-    #pragma acc loop gang worker reduction(+:res)
-    for (i = 0; i < 1024; i++)
-      res += arr[i];
-  }
-
-  for (i = 0; i < 1024; i++)
-    hres += arr[i];
-
-  assert (res == hres);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c
deleted file mode 100644
index d610373..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-1.c
+++ /dev/null
@@ -1,28 +0,0 @@
-#include <assert.h>
-
-/* Test of reduction on loop directive (gangs, workers and vectors, non-private
-   reduction variable).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, arr[1024], res = 0, hres = 0;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res)
-  {
-    #pragma acc loop gang worker vector reduction(+:res)
-    for (i = 0; i < 1024; i++)
-      res += arr[i];
-  }
-
-  for (i = 0; i < 1024; i++)
-    hres += arr[i];
-
-  assert (res == hres);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c
deleted file mode 100644
index ea5c151..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-2.c
+++ /dev/null
@@ -1,34 +0,0 @@
-#include <assert.h>
-
-/* Test of reduction on loop directive (gangs, workers and vectors, non-private
-   reduction variable: separate gang and worker/vector loops).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, j, arr[32768], res = 0, hres = 0;
-
-  for (i = 0; i < 32768; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res)
-  {
-    #pragma acc loop gang reduction(+:res)
-    for (j = 0; j < 32; j++)
-      {
-        #pragma acc loop worker vector reduction(+:res)
-        for (i = 0; i < 1024; i++)
-	  res += arr[j * 1024 + i];
-      }
-    /* "res" is non-private, and is not available until after the parallel
-       region.  */
-  }
-
-  for (i = 0; i < 32768; i++)
-    hres += arr[i];
-
-  assert (res == hres);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c
deleted file mode 100644
index 0056f3c..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-3.c
+++ /dev/null
@@ -1,33 +0,0 @@
-#include <assert.h>
-
-/* Test of reduction on loop directive (gangs, workers and vectors, non-private
-   reduction variable: separate gang and worker/vector loops).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, j;
-  double arr[32768], res = 0, hres = 0;
-
-  for (i = 0; i < 32768; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copyin(arr) copy(res)
-  {
-    #pragma acc loop gang reduction(+:res)
-    for (j = 0; j < 32; j++)
-      {
-        #pragma acc loop worker vector reduction(+:res)
-        for (i = 0; i < 1024; i++)
-	  res += arr[j * 1024 + i];
-      }
-  }
-
-  for (i = 0; i < 32768; i++)
-    hres += arr[i];
-
-  assert (res == hres);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c
deleted file mode 100644
index e69d0ec..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-gwv-np-4.c
+++ /dev/null
@@ -1,55 +0,0 @@
-#include <assert.h>
-
-/* Test of reduction on loop directive (gangs, workers and vectors, multiple
-   non-private reduction variables, float type).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, j;
-  float arr[32768];
-  float res = 0, mres = 0, hres = 0, hmres = 0;
-
-  for (i = 0; i < 32768; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       copy(res, mres)
-  {
-    #pragma acc loop gang reduction(+:res) reduction(max:mres)
-    for (j = 0; j < 32; j++)
-      {
-	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
-	for (i = 0; i < 1024; i++)
-	  {
-	    res += arr[j * 1024 + i];
-	    if (arr[j * 1024 + i] > mres)
-	      mres = arr[j * 1024 + i];
-	  }
-
-	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
-	for (i = 0; i < 1024; i++)
-	  {
-	    res += arr[j * 1024 + (1023 - i)];
-	    if (arr[j * 1024 + (1023 - i)] > mres)
-	      mres = arr[j * 1024 + (1023 - i)];
-	  }
-      }
-  }
-
-  for (j = 0; j < 32; j++)
-    for (i = 0; i < 1024; i++)
-      {
-        hres += arr[j * 1024 + i];
-	hres += arr[j * 1024 + (1023 - i)];
-	if (arr[j * 1024 + i] > hmres)
-	  hmres = arr[j * 1024 + i];
-	if (arr[j * 1024 + (1023 - i)] > hmres)
-	  hmres = arr[j * 1024 + (1023 - i)];
-      }
-
-  assert (res == hres);
-  assert (mres == hmres);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c
deleted file mode 100644
index 31e4366..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-1.c
+++ /dev/null
@@ -1,43 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test of reduction on loop directive (vectors, private reduction
-   variable).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, j, arr[1024], out[32], res = 0, hres = 0;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       private(res) copyout(out)
-  {
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-        res = 0;
-
-	#pragma acc loop vector reduction(+:res)
-	for (i = 0; i < 32; i++)
-	  res += arr[j * 32 + i];
-
-	out[j] = res;
-      }
-  }
-
-  for (j = 0; j < 32; j++)
-    {
-      hres = 0;
-
-      for (i = 0; i < 32; i++)
-	hres += arr[j * 32 + i];
-
-      assert (out[j] == hres);
-    }
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c
deleted file mode 100644
index 15f0053..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-vector-p-2.c
+++ /dev/null
@@ -1,41 +0,0 @@
-#include <assert.h>
-
-/* Test of reduction on loop directive (vector reduction in
-   gang-partitioned/worker-partitioned mode, private reduction variable).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, j, k;
-  double ina[1024], inb[1024], out[1024], acc;
-
-  for (j = 0; j < 32; j++)
-    for (i = 0; i < 32; i++)
-      {
-        ina[j * 32 + i] = (i == j) ? 2.0 : 0.0;
-	inb[j * 32 + i] = (double) (i + j);
-      }
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       private(acc) copyin(ina, inb) copyout(out)
-  {
-    #pragma acc loop gang worker
-    for (k = 0; k < 32; k++)
-      for (j = 0; j < 32; j++)
-        {
-	  acc = 0;
-
-	  #pragma acc loop vector reduction(+:acc)
-	  for (i = 0; i < 32; i++)
-	    acc += ina[k * 32 + i] * inb[i * 32 + j];
-
-	  out[k * 32 + j] = acc;
-	}
-  }
-
-  for (j = 0; j < 32; j++)
-    for (i = 0; i < 32; i++)
-      assert (out[j * 32 + i] == (i + j) * 2);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c
deleted file mode 100644
index 4a92503..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-worker-p-1.c
+++ /dev/null
@@ -1,43 +0,0 @@
-/* { dg-additional-options "-w" } */
-
-#include <assert.h>
-
-/* Test of reduction on loop directive (workers, private reduction
-   variable).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, j, arr[1024], out[32], res = 0, hres = 0;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       private(res) copyout(out)
-  {
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-        res = 0;
-
-	#pragma acc loop worker reduction(+:res)
-	for (i = 0; i < 32; i++)
-	  res += arr[j * 32 + i];
-
-	out[j] = res;
-      }
-  }
-
-  for (j = 0; j < 32; j++)
-    {
-      hres = 0;
-
-      for (i = 0; i < 32; i++)
-	hres += arr[j * 32 + i];
-
-      assert (out[j] == hres);
-    }
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c
deleted file mode 100644
index 1bfb284..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-1.c
+++ /dev/null
@@ -1,41 +0,0 @@
-#include <assert.h>
-
-/* Test of reduction on loop directive (workers and vectors, private reduction
-   variable).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, j, arr[1024], out[32], res = 0, hres = 0;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       private(res) copyout(out)
-  {
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-        res = 0;
-
-	#pragma acc loop worker vector reduction(+:res)
-	for (i = 0; i < 32; i++)
-	  res += arr[j * 32 + i];
-
-	out[j] = res;
-      }
-  }
-
-  for (j = 0; j < 32; j++)
-    {
-      hres = 0;
-
-      for (i = 0; i < 32; i++)
-	hres += arr[j * 32 + i];
-
-      assert (out[j] == hres);
-    }
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c
deleted file mode 100644
index 93ab78f..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-2.c
+++ /dev/null
@@ -1,45 +0,0 @@
-#include <assert.h>
-
-/* Test of reduction on loop directive (workers and vectors, private reduction
-   variable).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, j, arr[32768], out[32], res = 0, hres = 0;
-
-  for (i = 0; i < 32768; i++)
-    arr[i] = i;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       private(res) copyout(out)
-  {
-    #pragma acc loop gang
-    for (j = 0; j < 32; j++)
-      {
-        res = j;
-
-	#pragma acc loop worker reduction(+:res)
-	for (i = 0; i < 1024; i++)
-	  res += arr[j * 1024 + i];
-
-	#pragma acc loop vector reduction(+:res)
-	for (i = 1023; i >= 0; i--)
-	  res += arr[j * 1024 + i];
-
-	out[j] = res;
-      }
-  }
-
-  for (j = 0; j < 32; j++)
-    {
-      hres = j;
-
-      for (i = 0; i < 1024; i++)
-	hres += arr[j * 1024 + i] * 2;
-
-      assert (out[j] == hres);
-    }
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c
deleted file mode 100644
index 298e25c..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/loop-reduction-wv-p-3.c
+++ /dev/null
@@ -1,38 +0,0 @@
-#include <assert.h>
-
-/* Test of reduction on loop directive (workers and vectors, private reduction
-   variable: gang-redundant mode).  */
-
-int
-main (int argc, char *argv[])
-{
-  int i, arr[1024], out[32], res = 0, hres = 0;
-
-  for (i = 0; i < 1024; i++)
-    arr[i] = i ^ 33;
-
-  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		       private(res) copyin(arr) copyout(out)
-  {
-    /* Private variables aren't initialized by default in openacc.  */
-    res = 0;
-
-    /* "res" should be available at the end of the following loop (and should
-       have the same value redundantly in each gang).  */
-    #pragma acc loop worker vector reduction(+:res)
-    for (i = 0; i < 1024; i++)
-      res += arr[i];
-
-    #pragma acc loop gang (static: 1)
-    for (i = 0; i < 32; i++)
-      out[i] = res;
-  }
-
-  for (i = 0; i < 1024; i++)
-    hres += arr[i];
-
-  for (i = 0; i < 32; i++)
-    assert (out[i] == hres);
-
-  return 0;
-}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c
index b23c758..76c33e4 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-7.c
@@ -118,12 +118,363 @@ void gwv_np_1()
 }
 
 
+/* Test of reduction on loop directive (gangs, workers and vectors, non-private
+   reduction variable: separate gang and worker/vector loops).  */
+
+void gwv_np_2()
+{
+  int i, j, arr[32768], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res)
+  {
+    #pragma acc loop gang reduction(+:res)
+    for (j = 0; j < 32; j++)
+      {
+        #pragma acc loop worker vector reduction(+:res)
+        for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+      }
+    /* "res" is non-private, and is not available until after the parallel
+       region.  */
+  }
+
+  for (i = 0; i < 32768; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+}
+
+
+/* Test of reduction on loop directive (gangs, workers and vectors, non-private
+   reduction variable: separate gang and worker/vector loops).  */
+
+void gwv_np_3()
+{
+  int i, j;
+  double arr[32768], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copyin(arr) copy(res)
+  {
+    #pragma acc loop gang reduction(+:res)
+    for (j = 0; j < 32; j++)
+      {
+        #pragma acc loop worker vector reduction(+:res)
+        for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+      }
+  }
+
+  for (i = 0; i < 32768; i++)
+    hres += arr[i];
+
+  assert (res == hres);
+}
+
+
+/* Test of reduction on loop directive (gangs, workers and vectors, multiple
+   non-private reduction variables, float type).  */
+
+void gwv_np_4()
+{
+  int i, j;
+  float arr[32768];
+  float res = 0, mres = 0, hres = 0, hmres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       copy(res, mres)
+  {
+    #pragma acc loop gang reduction(+:res) reduction(max:mres)
+    for (j = 0; j < 32; j++)
+      {
+	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
+	for (i = 0; i < 1024; i++)
+	  {
+	    res += arr[j * 1024 + i];
+	    if (arr[j * 1024 + i] > mres)
+	      mres = arr[j * 1024 + i];
+	  }
+
+	#pragma acc loop worker vector reduction(+:res) reduction(max:mres)
+	for (i = 0; i < 1024; i++)
+	  {
+	    res += arr[j * 1024 + (1023 - i)];
+	    if (arr[j * 1024 + (1023 - i)] > mres)
+	      mres = arr[j * 1024 + (1023 - i)];
+	  }
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 1024; i++)
+      {
+        hres += arr[j * 1024 + i];
+	hres += arr[j * 1024 + (1023 - i)];
+	if (arr[j * 1024 + i] > hmres)
+	  hmres = arr[j * 1024 + i];
+	if (arr[j * 1024 + (1023 - i)] > hmres)
+	  hmres = arr[j * 1024 + (1023 - i)];
+      }
+
+  assert (res == hres);
+  assert (mres == hmres);
+}
+
+
+/* Test of reduction on loop directive (vectors, private reduction
+   variable).  */
+
+void v_p_1()
+{
+  int i, j, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = 0;
+
+	#pragma acc loop vector reduction(+:res)
+	for (i = 0; i < 32; i++)
+	  res += arr[j * 32 + i];
+
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = 0;
+
+      for (i = 0; i < 32; i++)
+	hres += arr[j * 32 + i];
+
+      assert (out[j] == hres);
+    }
+}
+
+
+/* Test of reduction on loop directive (vector reduction in
+   gang-partitioned/worker-partitioned mode, private reduction variable).  */
+
+void v_p_2()
+{
+  int i, j, k;
+  double ina[1024], inb[1024], out[1024], acc;
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 32; i++)
+      {
+        ina[j * 32 + i] = (i == j) ? 2.0 : 0.0;
+	inb[j * 32 + i] = (double) (i + j);
+      }
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(acc) copyin(ina, inb) copyout(out)
+  {
+    #pragma acc loop gang worker
+    for (k = 0; k < 32; k++)
+      for (j = 0; j < 32; j++)
+        {
+	  acc = 0;
+
+	  #pragma acc loop vector reduction(+:acc)
+	  for (i = 0; i < 32; i++)
+	    acc += ina[k * 32 + i] * inb[i * 32 + j];
+
+	  out[k * 32 + j] = acc;
+	}
+  }
+
+  for (j = 0; j < 32; j++)
+    for (i = 0; i < 32; i++)
+      assert (out[j * 32 + i] == (i + j) * 2);
+}
+
+
+/* Test of reduction on loop directive (workers, private reduction
+   variable).  */
+
+void w_p_1()
+{
+  int i, j, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = 0;
+
+	#pragma acc loop worker reduction(+:res)
+	for (i = 0; i < 32; i++)
+	  res += arr[j * 32 + i];
+
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = 0;
+
+      for (i = 0; i < 32; i++)
+	hres += arr[j * 32 + i];
+
+      assert (out[j] == hres);
+    }
+}
+
+
+/* Test of reduction on loop directive (workers and vectors, private reduction
+   variable).  */
+
+void wv_p_1()
+{
+  int i, j, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = 0;
+
+	#pragma acc loop worker vector reduction(+:res)
+	for (i = 0; i < 32; i++)
+	  res += arr[j * 32 + i];
+
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = 0;
+
+      for (i = 0; i < 32; i++)
+	hres += arr[j * 32 + i];
+
+      assert (out[j] == hres);
+    }
+}
+
+
+/* Test of reduction on loop directive (workers and vectors, private reduction
+   variable).  */
+
+void wv_p_2()
+{
+  int i, j, arr[32768], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 32768; i++)
+    arr[i] = i;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyout(out)
+  {
+    #pragma acc loop gang
+    for (j = 0; j < 32; j++)
+      {
+        res = j;
+
+	#pragma acc loop worker reduction(+:res)
+	for (i = 0; i < 1024; i++)
+	  res += arr[j * 1024 + i];
+
+	#pragma acc loop vector reduction(+:res)
+	for (i = 1023; i >= 0; i--)
+	  res += arr[j * 1024 + i];
+
+	out[j] = res;
+      }
+  }
+
+  for (j = 0; j < 32; j++)
+    {
+      hres = j;
+
+      for (i = 0; i < 1024; i++)
+	hres += arr[j * 1024 + i] * 2;
+
+      assert (out[j] == hres);
+    }
+}
+
+
+/* Test of reduction on loop directive (workers and vectors, private reduction
+   variable: gang-redundant mode).  */
+
+void wv_p_3()
+{
+  int i, arr[1024], out[32], res = 0, hres = 0;
+
+  for (i = 0; i < 1024; i++)
+    arr[i] = i ^ 33;
+
+  #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
+		       private(res) copyin(arr) copyout(out)
+  {
+    /* Private variables aren't initialized by default in openacc.  */
+    res = 0;
+
+    /* "res" should be available at the end of the following loop (and should
+       have the same value redundantly in each gang).  */
+    #pragma acc loop worker vector reduction(+:res)
+    for (i = 0; i < 1024; i++)
+      res += arr[i];
+
+    #pragma acc loop gang (static: 1)
+    for (i = 0; i < 32; i++)
+      out[i] = res;
+  }
+
+  for (i = 0; i < 1024; i++)
+    hres += arr[i];
+
+  for (i = 0; i < 32; i++)
+    assert (out[i] == hres);
+}
+
+
 int main()
 {
   g_np_1();
   gv_np_1();
   gw_np_1();
   gwv_np_1();
+  gwv_np_2();
+  gwv_np_3();
+  gwv_np_4();
+  v_p_1();
+  v_p_2();
+  w_p_1();
+  wv_p_1();
+  wv_p_2();
+  wv_p_3();
 
   return 0;
 }


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-04-12 11:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-30 14:22 Update OpenACC test cases Thomas Schwinge
2016-03-30 14:38 ` Jakub Jelinek
2016-03-30 15:55   ` Thomas Schwinge
2016-04-04 10:40     ` [gomp4] " Thomas Schwinge
2016-04-12 11:08       ` Merge libgomp.oacc-c-c++-common/loop-reduction-*.c into libgomp.oacc-c-c++-common/reduction-7.c (was: [gomp4] Update OpenACC test cases) Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).