public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Next set of OpenACC changes
@ 2015-05-05  8:54 Thomas Schwinge
  2015-05-05  8:56 ` Next set of OpenACC changes: middle end, libgomp Thomas Schwinge
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Thomas Schwinge @ 2015-05-05  8:54 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek
  Cc: Bernd Schmidt, Cesar Philippidis, Chung-Lin Tang, James Norris,
	Joseph Myers, Julian Brown, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 8550 bytes --]

Hi!

In follow-up messages, I'll be posting the separated parts (for easier
review) of a next set of OpenACC changes that we'd like to commit.
ChangeLog updates not yet written; will do that before commit, obviously.

Overall diffstat:

 gcc/c-family/c-common.c                            |    3 +-
 gcc/c-family/c-common.h                            |    2 +
 gcc/c-family/c-omp.c                               |  105 ++
 gcc/c-family/c-pragma.c                            |    4 +
 gcc/c-family/c-pragma.h                            |   14 +-
 gcc/c/c-parser.c                                   | 1353 ++++++++++++----
 gcc/c/c-tree.h                                     |    3 +-
 gcc/c/c-typeck.c                                   |  112 +-
 gcc/cp/cp-gimplify.c                               |    3 +-
 gcc/cp/cp-tree.h                                   |    3 +-
 gcc/cp/parser.c                                    | 1382 +++++++++++++----
 gcc/cp/parser.h                                    |    4 +
 gcc/cp/pt.c                                        |   43 +-
 gcc/cp/semantics.c                                 |  151 +-
 gcc/fortran/dump-parse-tree.c                      |   12 +-
 gcc/fortran/gfortran.h                             |   50 +-
 gcc/fortran/match.h                                |    1 +
 gcc/fortran/openmp.c                               |  581 +++++--
 gcc/fortran/parse.c                                |   65 +-
 gcc/fortran/parse.h                                |    2 +-
 gcc/fortran/resolve.c                              |    5 +
 gcc/fortran/st.c                                   |    7 +
 gcc/fortran/trans-decl.c                           |   62 +-
 gcc/fortran/trans-openmp.c                         |   66 +-
 gcc/fortran/trans-stmt.c                           |    7 +-
 gcc/fortran/trans-stmt.h                           |    2 +-
 gcc/fortran/trans.c                                |    2 +
 gcc/gimplify.c                                     |   16 +-
 gcc/omp-low.c                                      |   11 +-
 gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c  |   46 +
 .../c-c++-common/goacc-gomp/nesting-fail-1.c       |   25 -
 gcc/testsuite/c-c++-common/goacc/asyncwait-1.c     |    4 +-
 gcc/testsuite/c-c++-common/goacc/data-2.c          |   12 +-
 gcc/testsuite/c-c++-common/goacc/declare-1.c       |   84 +
 gcc/testsuite/c-c++-common/goacc/declare-2.c       |   67 +
 gcc/testsuite/c-c++-common/goacc/dtype-1.c         |  113 ++
 gcc/testsuite/c-c++-common/goacc/dtype-2.c         |   31 +
 gcc/testsuite/c-c++-common/goacc/host_data-1.c     |   14 +
 gcc/testsuite/c-c++-common/goacc/host_data-2.c     |   14 +
 gcc/testsuite/c-c++-common/goacc/host_data-3.c     |   16 +
 gcc/testsuite/c-c++-common/goacc/host_data-4.c     |   15 +
 gcc/testsuite/c-c++-common/goacc/kernels-1.c       |    6 -
 gcc/testsuite/c-c++-common/goacc/kernels-empty.c   |    6 +
 gcc/testsuite/c-c++-common/goacc/kernels-eternal.c |   11 +
 .../c-c++-common/goacc/kernels-noreturn.c          |   12 +
 gcc/testsuite/c-c++-common/goacc/loop-1.c          |    2 -
 gcc/testsuite/c-c++-common/goacc/parallel-1.c      |    6 -
 gcc/testsuite/c-c++-common/goacc/parallel-empty.c  |    6 +
 .../c-c++-common/goacc/parallel-eternal.c          |   11 +
 .../c-c++-common/goacc/parallel-noreturn.c         |   12 +
 gcc/testsuite/c-c++-common/goacc/reduction-1.c     |   25 +-
 gcc/testsuite/c-c++-common/goacc/reduction-2.c     |   22 +-
 gcc/testsuite/c-c++-common/goacc/reduction-3.c     |   22 +-
 gcc/testsuite/c-c++-common/goacc/reduction-4.c     |   40 +-
 gcc/testsuite/c-c++-common/goacc/routine-1.c       |   35 +
 gcc/testsuite/c-c++-common/goacc/routine-2.c       |   36 +
 gcc/testsuite/c-c++-common/goacc/routine-3.c       |   52 +
 gcc/testsuite/c-c++-common/goacc/routine-4.c       |   87 ++
 gcc/testsuite/c-c++-common/goacc/tile.c            |   26 +
 gcc/testsuite/g++.dg/goacc/template-reduction.C    |  100 ++
 gcc/testsuite/g++.dg/goacc/template.C              |  131 ++
 gcc/testsuite/gfortran.dg/goacc/cache-1.f95        |    1 -
 gcc/testsuite/gfortran.dg/goacc/coarray.f95        |    2 +-
 gcc/testsuite/gfortran.dg/goacc/coarray_2.f90      |    1 +
 gcc/testsuite/gfortran.dg/goacc/combined_loop.f90  |    2 +-
 gcc/testsuite/gfortran.dg/goacc/cray.f95           |    1 -
 gcc/testsuite/gfortran.dg/goacc/declare-1.f95      |    3 +-
 gcc/testsuite/gfortran.dg/goacc/declare-2.f95      |   44 +
 gcc/testsuite/gfortran.dg/goacc/default.f95        |   17 +
 gcc/testsuite/gfortran.dg/goacc/dtype-1.f95        |  161 ++
 gcc/testsuite/gfortran.dg/goacc/dtype-2.f95        |   39 +
 gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95 |    2 +-
 gcc/testsuite/gfortran.dg/goacc/loop-1.f95         |    1 -
 gcc/testsuite/gfortran.dg/goacc/loop-2.f95         |   26 +-
 gcc/testsuite/gfortran.dg/goacc/modules.f95        |   55 +
 gcc/testsuite/gfortran.dg/goacc/parameter.f95      |    1 -
 gcc/testsuite/gfortran.dg/goacc/update.f95         |    5 +
 gcc/tree-core.h                                    |   14 +-
 gcc/tree-pretty-print.c                            |    6 +
 gcc/tree.c                                         |   13 +-
 gcc/tree.h                                         |   21 +-
 include/gomp-constants.h                           |    4 +
 libgomp/oacc-mem.c                                 |    3 +
 libgomp/oacc-ptx.h                                 |   28 +
 libgomp/plugin/plugin-nvptx.c                      |   10 +
 .../libgomp.oacc-c++/template-reduction.C          |  102 ++
 .../libgomp.oacc-c-c++-common/atomic_capture-1.c   |  866 +++++++++++
 .../libgomp.oacc-c-c++-common/atomic_capture-2.c   | 1626 ++++++++++++++++++++
 .../libgomp.oacc-c-c++-common/atomic_update-1.c    |  760 +++++++++
 .../libgomp.oacc-c-c++-common/clauses-1.c          |   26 +
 .../testsuite/libgomp.oacc-c-c++-common/data-2.c   |   44 +-
 .../testsuite/libgomp.oacc-c-c++-common/data-3.c   |   18 +-
 .../libgomp.oacc-c-c++-common/data-clauses.h       |  202 +++
 .../libgomp.oacc-c-c++-common/kernels-1.c          |  182 +--
 .../testsuite/libgomp.oacc-c-c++-common/lib-69.c   |   70 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-70.c   |   79 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-71.c   |   55 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-72.c   |   60 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-73.c   |   64 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-74.c   |   91 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-75.c   |   89 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-76.c   |   88 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-77.c   |   91 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-78.c   |   91 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-79.c   |   91 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-80.c   |   95 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-81.c   |  106 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-82.c   |   43 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-83.c   |   22 +-
 .../libgomp.oacc-c-c++-common/parallel-1.c         |  204 +--
 .../libgomp.oacc-c-c++-common/routine-1.c          |   40 +
 .../libgomp.oacc-c-c++-common/routine-2.c          |   41 +
 libgomp/testsuite/libgomp.oacc-c-c++-common/subr.h |   44 +-
 .../testsuite/libgomp.oacc-c-c++-common/subr.ptx   |  222 +--
 .../testsuite/libgomp.oacc-c-c++-common/timer.h    |  103 --
 .../libgomp.oacc-fortran/atomic_capture-1.f90      |  784 ++++++++++
 .../libgomp.oacc-fortran/atomic_update-1.f90       |  338 ++++
 libgomp/testsuite/libgomp.oacc-fortran/cache-1.f90 |   26 +
 .../testsuite/libgomp.oacc-fortran/clauses-1.f90   |  290 ++++
 libgomp/testsuite/libgomp.oacc-fortran/data-1.f90  |  231 ++-
 libgomp/testsuite/libgomp.oacc-fortran/data-2.f90  |   50 +
 libgomp/testsuite/libgomp.oacc-fortran/data-3.f90  |   34 +-
 .../testsuite/libgomp.oacc-fortran/data-4-2.f90    |   19 +-
 libgomp/testsuite/libgomp.oacc-fortran/data-4.f90  |   19 +-
 .../testsuite/libgomp.oacc-fortran/declare-1.f90   |  229 +++
 libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90  |   24 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90  |   28 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-14.f90  |   79 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-15.f90  |   52 +
 .../testsuite/libgomp.oacc-fortran/routine-5.f90   |   27 +
 130 files changed, 10970 insertions(+), 2495 deletions(-)


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Next set of OpenACC changes: middle end, libgomp
  2015-05-05  8:54 Next set of OpenACC changes Thomas Schwinge
@ 2015-05-05  8:56 ` Thomas Schwinge
  2015-05-05  8:58 ` Next set of OpenACC changes: C family Thomas Schwinge
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Thomas Schwinge @ 2015-05-05  8:56 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek
  Cc: Bernd Schmidt, Cesar Philippidis, Chung-Lin Tang, James Norris,
	Joseph Myers, Julian Brown, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 10747 bytes --]

Hi!

On Tue, 05 May 2015 10:54:02 +0200, I wrote:
> In follow-up messages, I'll be posting the separated parts (for easier
> review) of a next set of OpenACC changes that we'd like to commit.
> ChangeLog updates not yet written; will do that before commit, obviously.

 gcc/gimplify.c                                     |   16 +-
 gcc/omp-low.c                                      |   11 +-
 gcc/tree-core.h                                    |   14 +-
 gcc/tree-pretty-print.c                            |    6 +
 gcc/tree.c                                         |   13 +-
 gcc/tree.h                                         |   21 +-
 include/gomp-constants.h                           |    4 +
 libgomp/oacc-mem.c                                 |    3 +
 libgomp/oacc-ptx.h                                 |   28 +
 libgomp/plugin/plugin-nvptx.c                      |   10 +

diff --git gcc/gimplify.c gcc/gimplify.c
index bda62ce..12efdc8 100644
--- gcc/gimplify.c
+++ gcc/gimplify.c
@@ -6385,6 +6385,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 	case OMP_CLAUSE_MERGEABLE:
 	case OMP_CLAUSE_PROC_BIND:
 	case OMP_CLAUSE_SAFELEN:
+	case OMP_CLAUSE_TILE:
 	  break;
 
 	case OMP_CLAUSE_ALIGNED:
@@ -6770,6 +6771,7 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, tree *list_p)
 	case OMP_CLAUSE_VECTOR:
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
+	case OMP_CLAUSE_TILE:
 	  break;
 
 	default:
@@ -8410,21 +8412,7 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, gimple_seq *post_p,
 	  break;
 
 	case OACC_KERNELS:
-	  if (OACC_KERNELS_COMBINED (*expr_p))
-	    sorry ("directive not yet implemented");
-	  else
-	    gimplify_omp_workshare (expr_p, pre_p);
-	  ret = GS_ALL_DONE;
-	  break;
-
 	case OACC_PARALLEL:
-	  if (OACC_PARALLEL_COMBINED (*expr_p))
-	    sorry ("directive not yet implemented");
-	  else
-	    gimplify_omp_workshare (expr_p, pre_p);
-	  ret = GS_ALL_DONE;
-	  break;
-
 	case OACC_DATA:
 	case OMP_SECTIONS:
 	case OMP_SINGLE:
diff --git gcc/omp-low.c gcc/omp-low.c
index 34e2e5c..6ec5145 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -1928,6 +1928,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_INDEPENDENT:
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
+	case OMP_CLAUSE_BIND:
+	case OMP_CLAUSE_NOHOST:
+	case OMP_CLAUSE_TILE:
 	  sorry ("Clause not supported yet");
 	  break;
 
@@ -2055,6 +2058,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	case OMP_CLAUSE_INDEPENDENT:
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
+	case OMP_CLAUSE_BIND:
+	case OMP_CLAUSE_NOHOST:
+	case OMP_CLAUSE_TILE:
 	  sorry ("Clause not supported yet");
 	  break;
 
@@ -2742,7 +2748,10 @@ check_omp_nesting_restrictions (gimple stmt, omp_context *ctx)
     {
       for (omp_context *ctx_ = ctx; ctx_ != NULL; ctx_ = ctx_->outer)
 	if (is_gimple_omp (ctx_->stmt)
-	    && is_gimple_omp_oacc (ctx_->stmt))
+	    && is_gimple_omp_oacc (ctx_->stmt)
+	    /* Except for atomic codes that we share with OpenMP.  */
+	    && ! (gimple_code (stmt) == GIMPLE_OMP_ATOMIC_LOAD
+		  || gimple_code (stmt) == GIMPLE_OMP_ATOMIC_STORE))
 	  {
 	    error_at (gimple_location (stmt),
 		      "non-OpenACC construct inside of OpenACC region");
diff --git gcc/tree-core.h gcc/tree-core.h
index ad1bb23..ffbccda 100644
--- gcc/tree-core.h
+++ gcc/tree-core.h
@@ -390,7 +390,19 @@ enum omp_clause_code {
   OMP_CLAUSE_NUM_WORKERS,
 
   /* OpenACC clause: vector_length (integer-expression).  */
-  OMP_CLAUSE_VECTOR_LENGTH
+  OMP_CLAUSE_VECTOR_LENGTH,
+
+  /* OpenACC clause: bind ( identifer | string ).  */
+  OMP_CLAUSE_BIND,
+
+  /* OpenACC clause: nohost.  */
+  OMP_CLAUSE_NOHOST,
+
+  /* OpenACC clause: tile ( size-expr-list ).  */
+  OMP_CLAUSE_TILE,
+
+  /* OpenACC clause: device_type ( device-type-list).  */
+  OMP_CLAUSE_DEVICE_TYPE
 };
 
 #undef DEFTREESTRUCT
diff --git gcc/tree-pretty-print.c gcc/tree-pretty-print.c
index d7c049f..5eb4daf 100644
--- gcc/tree-pretty-print.c
+++ gcc/tree-pretty-print.c
@@ -799,6 +799,12 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, int flags)
     case OMP_CLAUSE_INDEPENDENT:
       pp_string (pp, "independent");
       break;
+    case OMP_CLAUSE_TILE:
+      pp_string (pp, "tile(");
+      dump_generic_node (pp, OMP_CLAUSE_TILE_LIST (clause),
+			 spc, flags, false);
+      pp_right_paren (pp);
+      break;
 
     default:
       /* Should never happen.  */
diff --git gcc/tree.c gcc/tree.c
index daf0292..43f80b7 100644
--- gcc/tree.c
+++ gcc/tree.c
@@ -369,6 +369,10 @@ unsigned const char omp_clause_num_ops[] =
   1, /* OMP_CLAUSE_NUM_GANGS  */
   1, /* OMP_CLAUSE_NUM_WORKERS  */
   1, /* OMP_CLAUSE_VECTOR_LENGTH  */
+  1, /* OMP_CLAUSE_BIND  */
+  0, /* OMP_CLAUSE_NOHOST  */
+  1, /* OMP_CLAUSE_TILE */
+  2  /* OMP_CLAUSE_DEVICE_TYPE */
 };
 
 const char * const omp_clause_code_name[] =
@@ -427,7 +431,11 @@ const char * const omp_clause_code_name[] =
   "vector",
   "num_gangs",
   "num_workers",
-  "vector_length"
+  "vector_length",
+  "bind",
+  "nohost",
+  "tile",
+  "device_type"
 };
 
 
@@ -11237,6 +11245,7 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
 	case OMP_CLAUSE__LOOPTEMP_:
 	case OMP_CLAUSE__SIMDUID_:
 	case OMP_CLAUSE__CILK_FOR_COUNT_:
+	case OMP_CLAUSE_BIND:
 	  WALK_SUBTREE (OMP_CLAUSE_OPERAND (*tp, 0));
 	  /* FALLTHRU */
 
@@ -11255,6 +11264,8 @@ walk_tree_1 (tree *tp, walk_tree_fn func, void *data,
 	case OMP_CLAUSE_TASKGROUP:
 	case OMP_CLAUSE_AUTO:
 	case OMP_CLAUSE_SEQ:
+	case OMP_CLAUSE_NOHOST:
+	case OMP_CLAUSE_TILE:
 	  WALK_SUBTREE_TAIL (OMP_CLAUSE_CHAIN (*tp));
 
 	case OMP_CLAUSE_LASTPRIVATE:
diff --git gcc/tree.h gcc/tree.h
index e17bd9b..55c5a6d 100644
--- gcc/tree.h
+++ gcc/tree.h
@@ -1312,15 +1312,6 @@ extern void protected_set_expr_location (tree, location_t);
 #define OMP_SECTION_LAST(NODE) \
   (OMP_SECTION_CHECK (NODE)->base.private_flag)
 
-/* True on an OACC_KERNELS statement if is represents combined kernels loop
-   directive.  */
-#define OACC_KERNELS_COMBINED(NODE) \
-  (OACC_KERNELS_CHECK (NODE)->base.private_flag)
-
-/* Like OACC_KERNELS_COMBINED, but for parallel loop directive.  */
-#define OACC_PARALLEL_COMBINED(NODE) \
-  (OACC_PARALLEL_CHECK (NODE)->base.private_flag)
-
 /* True on an OMP_PARALLEL statement if it represents an explicit
    combined parallel work-sharing constructs.  */
 #define OMP_PARALLEL_COMBINED(NODE) \
@@ -1391,6 +1382,9 @@ extern void protected_set_expr_location (tree, location_t);
 #define OMP_CLAUSE_VECTOR_LENGTH_EXPR(NODE) \
   OMP_CLAUSE_OPERAND ( \
     OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_VECTOR_LENGTH), 0)
+#define OMP_CLAUSE_BIND_NAME(NODE) \
+  OMP_CLAUSE_OPERAND ( \
+    OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_BIND), 0)
 
 #define OMP_CLAUSE_DEPEND_KIND(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_DEPEND)->omp_clause.subcode.depend_kind)
@@ -1495,6 +1489,15 @@ extern void protected_set_expr_location (tree, location_t);
 #define OMP_CLAUSE_DEFAULT_KIND(NODE) \
   (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_DEFAULT)->omp_clause.subcode.default_kind)
 
+#define OMP_CLAUSE_TILE_LIST(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_TILE), 0)
+
+#define OMP_CLAUSE_DEVICE_TYPE_DEVICES(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_DEVICE_TYPE), 0)
+
+#define OMP_CLAUSE_DEVICE_TYPE_CLAUSES(NODE) \
+  OMP_CLAUSE_OPERAND (OMP_CLAUSE_SUBCODE_CHECK (NODE, OMP_CLAUSE_DEVICE_TYPE), 1)
+
 /* SSA_NAME accessors.  */
 
 /* Returns the IDENTIFIER_NODE giving the SSA name a name or NULL_TREE
diff --git include/gomp-constants.h include/gomp-constants.h
index e3d2820..45370b8 100644
--- include/gomp-constants.h
+++ include/gomp-constants.h
@@ -70,6 +70,10 @@ enum gomp_map_kind
     /* Is a device pointer.  OMP_CLAUSE_SIZE for these is unused; is implicitly
        POINTER_SIZE_UNITS.  */
     GOMP_MAP_FORCE_DEVICEPTR =		(GOMP_MAP_FLAG_SPECIAL_1 | 0),
+    /* OpenACC device_resident.  */
+    GOMP_MAP_DEVICE_RESIDENT =		(GOMP_MAP_FLAG_SPECIAL_1 | 1),
+    /* OpenACC link.  */
+    GOMP_MAP_LINK =			(GOMP_MAP_FLAG_SPECIAL_1 | 2),
     /* Allocate.  */
     GOMP_MAP_FORCE_ALLOC =		(GOMP_MAP_FLAG_FORCE | GOMP_MAP_ALLOC),
     /* ..., and copy to device.  */
diff --git libgomp/oacc-mem.c libgomp/oacc-mem.c
index 89ef5fc..0164b3d 100644
--- libgomp/oacc-mem.c
+++ libgomp/oacc-mem.c
@@ -479,6 +479,9 @@ update_dev_host (int is_dev, void *h, size_t s)
 {
   splay_tree_key n;
   void *d;
+
+  goacc_lazy_initialize ();
+
   struct goacc_thread *thr = goacc_thread ();
   struct gomp_device_descr *acc_dev = thr->dev;
 
diff --git libgomp/oacc-ptx.h libgomp/oacc-ptx.h
index 2419a46..104f297 100644
--- libgomp/oacc-ptx.h
+++ libgomp/oacc-ptx.h
@@ -424,3 +424,31 @@
   "st.param.u32 [%out_retval],%retval;\n"				\
   "ret;\n"								\
   "}\n"
+
+ #define GOMP_ATOMIC_PTX \
+  ".version 3.1\n" \
+  ".target sm_30\n" \
+  ".address_size 64\n" \
+  ".global .align 4 .u32 libgomp_ptx_lock;\n" \
+  ".visible .func GOMP_atomic_start;\n" \
+  ".visible .func GOMP_atomic_start\n" \
+  "{\n" \
+  "  .reg .pred    %p<2>;\n" \
+  "  .reg .s32     %r<2>;\n" \
+  "  .reg .s64     %rd<2>;\n" \
+  "BB5_1:\n" \
+  "  mov.u64       %rd1, libgomp_ptx_lock;\n" \
+  "  atom.global.cas.b32   %r1, [%rd1], 0, 1;\n" \
+  "  setp.ne.s32   %p1, %r1, 0;\n" \
+  "  @%p1 bra      BB5_1;\n" \
+  "  ret;\n" \
+  "}\n" \
+  ".visible .func GOMP_atomic_end;\n" \
+  ".visible .func GOMP_atomic_end\n" \
+  "{\n" \
+  "  .reg .s32     %r<2>;\n" \
+  "  .reg .s64     %rd<2>;\n" \
+  "  mov.u64       %rd1, libgomp_ptx_lock;\n" \
+  "  atom.global.exch.b32  %r1, [%rd1], 0;\n" \
+  "  ret;\n" \
+  "}\n"
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 583ec87..ad1163d 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -863,6 +863,16 @@ link_ptx (CUmodule *module, char *ptx_code)
 			 cuda_error (r));
     }
 
+  char *gomp_atomic_ptx = GOMP_ATOMIC_PTX;
+  r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, gomp_atomic_ptx,
+		     strlen (gomp_atomic_ptx) + 1, 0, 0, 0, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      GOMP_PLUGIN_error ("Link error log %s\n", &elog[0]);
+      GOMP_PLUGIN_fatal ("cuLinkAddData (gomp_atomic_ptx) error: %s",
+			 cuda_error (r));
+    }
+
   r = cuLinkAddData (linkstate, CU_JIT_INPUT_PTX, ptx_code,
               strlen (ptx_code) + 1, 0, 0, 0, 0);
   if (r != CUDA_SUCCESS)


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Next set of OpenACC changes: C family
  2015-05-05  8:54 Next set of OpenACC changes Thomas Schwinge
  2015-05-05  8:56 ` Next set of OpenACC changes: middle end, libgomp Thomas Schwinge
@ 2015-05-05  8:58 ` Thomas Schwinge
  2015-05-05 14:19   ` Jakub Jelinek
  2015-05-05  8:59 ` Next set of OpenACC changes: Fortran Thomas Schwinge
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Thomas Schwinge @ 2015-05-05  8:58 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek
  Cc: Bernd Schmidt, Cesar Philippidis, Chung-Lin Tang, James Norris,
	Joseph Myers, Julian Brown, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 168409 bytes --]

Hi!

On Tue, 05 May 2015 10:54:02 +0200, I wrote:
> In follow-up messages, I'll be posting the separated parts (for easier
> review) of a next set of OpenACC changes that we'd like to commit.
> ChangeLog updates not yet written; will do that before commit, obviously.

 gcc/c-family/c-common.c                            |    3 +-
 gcc/c-family/c-common.h                            |    2 +
 gcc/c-family/c-omp.c                               |  105 ++
 gcc/c-family/c-pragma.c                            |    4 +
 gcc/c-family/c-pragma.h                            |   14 +-
 gcc/c/c-parser.c                                   | 1353 ++++++++++++----
 gcc/c/c-tree.h                                     |    3 +-
 gcc/c/c-typeck.c                                   |  112 +-
 gcc/cp/cp-gimplify.c                               |    3 +-
 gcc/cp/cp-tree.h                                   |    3 +-
 gcc/cp/parser.c                                    | 1382 +++++++++++++----
 gcc/cp/parser.h                                    |    4 +
 gcc/cp/pt.c                                        |   43 +-
 gcc/cp/semantics.c                                 |  151 +-

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index 9797e17..d89b348 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -809,7 +809,7 @@ const struct attribute_spec c_common_attribute_table[] =
 			      handle_omp_declare_simd_attribute, false },
   { "cilk simd function",     0, -1, true,  false, false,
 			      handle_omp_declare_simd_attribute, false },
-  { "omp declare target",     0, 0, true, false, false,
+  { "omp declare target",     0, -1, true, false, false,
 			      handle_omp_declare_target_attribute, false },
   { "alloc_align",	      1, 1, false, true, true,
 			      handle_alloc_align_attribute, false },
@@ -823,6 +823,7 @@ const struct attribute_spec c_common_attribute_table[] =
 			      handle_bnd_legacy, false },
   { "bnd_instrument",         0, 0, true, false, false,
 			      handle_bnd_instrument, false },
+  { "oacc declare",           0, -1, true,  false, false, NULL, false },
   { NULL,                     0, 0, false, false, false, NULL, false }
 };
 
diff --git gcc/c-family/c-common.h gcc/c-family/c-common.h
index 603d3f0..fcaebca 100644
--- gcc/c-family/c-common.h
+++ gcc/c-family/c-common.h
@@ -1249,6 +1249,8 @@ extern void c_omp_split_clauses (location_t, enum tree_code, omp_clause_mask,
 extern tree c_omp_declare_simd_clauses_to_numbers (tree, tree);
 extern void c_omp_declare_simd_clauses_to_decls (tree, tree);
 extern enum omp_clause_default_kind c_omp_predetermined_sharing (tree);
+extern int oacc_extract_device_id (const char *);
+extern tree oacc_filter_device_types (tree);
 
 /* Return next tree in the chain for chain_next walking of tree nodes.  */
 static inline tree
diff --git gcc/c-family/c-omp.c gcc/c-family/c-omp.c
index 86a9f54..1c82bf5 100644
--- gcc/c-family/c-omp.c
+++ gcc/c-family/c-omp.c
@@ -1087,3 +1087,108 @@ c_omp_predetermined_sharing (tree decl)
 
   return OMP_CLAUSE_DEFAULT_UNSPECIFIED;
 }
+
+/* Return a numerical code representing the device_type.  Currently,
+   only device_type(nvidia) is supported.  All device_type parameters
+   are treated as case-insensitive keywords.  */
+
+int
+oacc_extract_device_id (const char *device)
+{
+  if (!strcasecmp (device, "nvidia"))
+    return GOMP_DEVICE_NVIDIA_PTX;
+  return GOMP_DEVICE_NONE;
+}
+
+/* Filter out the list of unsupported OpenACC device_types.  */
+
+tree
+oacc_filter_device_types (tree clauses)
+{
+  tree c, prev;
+  tree dtype = NULL_TREE;
+  tree seen_nvidia = NULL_TREE;
+  tree seen_default = NULL_TREE;
+
+  /* First scan for all device_type clauses.  */
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE)
+	{
+	  int code = TREE_INT_CST_LOW (OMP_CLAUSE_DEVICE_TYPE_DEVICES (c));
+
+	  if (code == GOMP_DEVICE_DEFAULT)
+	    {
+	      if (seen_default)
+		{
+		  seen_default = NULL_TREE;
+		  error_at (OMP_CLAUSE_LOCATION (c),
+			    "duplicate device_type (*)");
+		  goto filter_error;
+		}
+	      else
+		seen_default = OMP_CLAUSE_DEVICE_TYPE_CLAUSES (c);
+	    }
+	  if (code & (1 << GOMP_DEVICE_NVIDIA_PTX))
+	    {
+	      if (seen_nvidia)
+		{
+		  seen_nvidia = NULL_TREE;
+		  error_at (OMP_CLAUSE_LOCATION (c),
+			    "duplicate device_type (nvidia)");
+		  goto filter_error;
+		}
+	      else
+		seen_nvidia = OMP_CLAUSE_DEVICE_TYPE_CLAUSES (c);
+	    }
+	}
+    }
+
+  /* Don't do anything if there aren't any device_type clauses.  */
+  if (seen_nvidia == NULL_TREE && seen_default == NULL_TREE)
+    return clauses;
+
+  dtype = seen_nvidia ? seen_nvidia : seen_default;
+
+  /* Now filter out clauses if necessary.  */
+  for (c = clauses; c && OMP_CLAUSE_CODE (c) != OMP_CLAUSE_DEVICE_TYPE;
+       c = OMP_CLAUSE_CHAIN (c))
+    {
+      tree t;
+
+      prev = NULL_TREE;
+
+      for (t = dtype; t; t = OMP_CLAUSE_CHAIN (t))
+	{
+	  if (OMP_CLAUSE_CODE (t) == OMP_CLAUSE_CODE (c))
+	    {
+	      /* Remove c from clauses.  */
+	      tree next = OMP_CLAUSE_CHAIN (c);
+
+	      if (prev)
+	        OMP_CLAUSE_CHAIN (prev) = next;
+
+	      break;
+	    }
+	}
+
+      prev = c;
+    }
+  
+ filter_error:
+  /* Remove all device_type clauses.  Those clauses are located at the
+     beginning of the clause list.  */
+  for (c = clauses; c && OMP_CLAUSE_CODE (c) == OMP_CLAUSE_DEVICE_TYPE;
+       c = OMP_CLAUSE_CHAIN (c))
+    ;
+
+  if (c == NULL_TREE)
+    return dtype;
+
+  clauses = c;
+  for (prev = c, c = OMP_CLAUSE_CHAIN (c); c; c = OMP_CLAUSE_CHAIN (c))
+    prev = c;
+
+  OMP_CLAUSE_CHAIN (prev) = dtype;
+  return clauses;
+}
diff --git gcc/c-family/c-pragma.c gcc/c-family/c-pragma.c
index 6894f0e..a1e8da3 100644
--- gcc/c-family/c-pragma.c
+++ gcc/c-family/c-pragma.c
@@ -1194,13 +1194,17 @@ static vec<pragma_ns_name> registered_pp_pragmas;
 
 struct omp_pragma_def { const char *name; unsigned int id; };
 static const struct omp_pragma_def oacc_pragmas[] = {
+  { "atomic", PRAGMA_OACC_ATOMIC },
   { "cache", PRAGMA_OACC_CACHE },
   { "data", PRAGMA_OACC_DATA },
+  { "declare", PRAGMA_OACC_DECLARE },
   { "enter", PRAGMA_OACC_ENTER_DATA },
   { "exit", PRAGMA_OACC_EXIT_DATA },
+  { "host_data", PRAGMA_OACC_HOST_DATA },
   { "kernels", PRAGMA_OACC_KERNELS },
   { "loop", PRAGMA_OACC_LOOP },
   { "parallel", PRAGMA_OACC_PARALLEL },
+  { "routine", PRAGMA_OACC_ROUTINE },
   { "update", PRAGMA_OACC_UPDATE },
   { "wait", PRAGMA_OACC_WAIT }
 };
diff --git gcc/c-family/c-pragma.h gcc/c-family/c-pragma.h
index eff94c1..fe4c168 100644
--- gcc/c-family/c-pragma.h
+++ gcc/c-family/c-pragma.h
@@ -27,13 +27,17 @@ along with GCC; see the file COPYING3.  If not see
 typedef enum pragma_kind {
   PRAGMA_NONE = 0,
 
+  PRAGMA_OACC_ATOMIC,
   PRAGMA_OACC_CACHE,
   PRAGMA_OACC_DATA,
+  PRAGMA_OACC_DECLARE,
   PRAGMA_OACC_ENTER_DATA,
   PRAGMA_OACC_EXIT_DATA,
+  PRAGMA_OACC_HOST_DATA,
   PRAGMA_OACC_KERNELS,
   PRAGMA_OACC_LOOP,
   PRAGMA_OACC_PARALLEL,
+  PRAGMA_OACC_ROUTINE,
   PRAGMA_OACC_UPDATE,
   PRAGMA_OACC_WAIT,
   PRAGMA_OMP_ATOMIC,
@@ -132,13 +136,19 @@ typedef enum pragma_omp_clause {
   /* Clauses for OpenACC.  */
   PRAGMA_OACC_CLAUSE_ASYNC = PRAGMA_CILK_CLAUSE_VECTORLENGTH + 1,
   PRAGMA_OACC_CLAUSE_AUTO,
+  PRAGMA_OACC_CLAUSE_BIND,
   PRAGMA_OACC_CLAUSE_COPY,
   PRAGMA_OACC_CLAUSE_COPYOUT,
   PRAGMA_OACC_CLAUSE_CREATE,
   PRAGMA_OACC_CLAUSE_DELETE,
   PRAGMA_OACC_CLAUSE_DEVICEPTR,
+  PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT,
+  PRAGMA_OACC_CLAUSE_DEVICE_TYPE,
   PRAGMA_OACC_CLAUSE_GANG,
   PRAGMA_OACC_CLAUSE_HOST,
+  PRAGMA_OACC_CLAUSE_INDEPENDENT,
+  PRAGMA_OACC_CLAUSE_LINK,
+  PRAGMA_OACC_CLAUSE_NOHOST,
   PRAGMA_OACC_CLAUSE_NUM_GANGS,
   PRAGMA_OACC_CLAUSE_NUM_WORKERS,
   PRAGMA_OACC_CLAUSE_PRESENT,
@@ -146,8 +156,9 @@ typedef enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYIN,
   PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYOUT,
   PRAGMA_OACC_CLAUSE_PRESENT_OR_CREATE,
-  PRAGMA_OACC_CLAUSE_SELF,
   PRAGMA_OACC_CLAUSE_SEQ,
+  PRAGMA_OACC_CLAUSE_TILE,
+  PRAGMA_OACC_CLAUSE_USE_DEVICE,
   PRAGMA_OACC_CLAUSE_VECTOR,
   PRAGMA_OACC_CLAUSE_VECTOR_LENGTH,
   PRAGMA_OACC_CLAUSE_WAIT,
@@ -155,6 +166,7 @@ typedef enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_COLLAPSE = PRAGMA_OMP_CLAUSE_COLLAPSE,
   PRAGMA_OACC_CLAUSE_COPYIN = PRAGMA_OMP_CLAUSE_COPYIN,
   PRAGMA_OACC_CLAUSE_DEVICE = PRAGMA_OMP_CLAUSE_DEVICE,
+  PRAGMA_OACC_CLAUSE_DEFAULT = PRAGMA_OMP_CLAUSE_DEFAULT,
   PRAGMA_OACC_CLAUSE_FIRSTPRIVATE = PRAGMA_OMP_CLAUSE_FIRSTPRIVATE,
   PRAGMA_OACC_CLAUSE_IF = PRAGMA_OMP_CLAUSE_IF,
   PRAGMA_OACC_CLAUSE_PRIVATE = PRAGMA_OMP_CLAUSE_PRIVATE,
diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index 015de7f..a1543a7 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -240,6 +240,10 @@ typedef struct GTY(()) c_parser {
   /* Buffer to hold all the tokens from parsing the vector attribute for the
      SIMD-enabled functions (formerly known as elemental functions).  */
   vec <c_token, va_gc> *cilk_simd_fn_tokens;
+
+  /* OpenACC specific parser information.  */
+
+  vec <tree, va_gc> *oacc_routines;
 } c_parser;
 
 
@@ -1181,7 +1185,8 @@ enum c_parser_prec {
 static void c_parser_external_declaration (c_parser *);
 static void c_parser_asm_definition (c_parser *);
 static void c_parser_declaration_or_fndef (c_parser *, bool, bool, bool,
-					   bool, bool, tree *, vec<c_token>);
+					   bool, bool, tree *, vec<c_token>,
+					   tree, bool);
 static void c_parser_static_assert_declaration_no_semi (c_parser *);
 static void c_parser_static_assert_declaration (c_parser *);
 static void c_parser_declspecs (c_parser *, struct c_declspecs *, bool, bool,
@@ -1252,7 +1257,8 @@ static vec<tree, va_gc> *c_parser_expr_list (c_parser *, bool, bool,
 					     unsigned int * = NULL);
 static void c_parser_oacc_enter_exit_data (c_parser *, bool);
 static void c_parser_oacc_update (c_parser *);
-static tree c_parser_oacc_loop (location_t, c_parser *, char *);
+static tree c_parser_oacc_loop (location_t, c_parser *, char *,
+				omp_clause_mask, tree *);
 static void c_parser_omp_construct (c_parser *);
 static void c_parser_omp_threadprivate (c_parser *);
 static void c_parser_omp_barrier (c_parser *);
@@ -1270,6 +1276,9 @@ static bool c_parser_pragma (c_parser *, enum pragma_context);
 static bool c_parser_omp_target (c_parser *, enum pragma_context);
 static void c_parser_omp_end_declare_target (c_parser *);
 static void c_parser_omp_declare (c_parser *, enum pragma_context);
+static void c_parser_oacc_routine (c_parser *parser, enum pragma_context
+				      context);
+static void c_parser_oacc_declare (c_parser *parser);
 
 /* These Objective-C parser functions are only ever called when
    compiling Objective-C.  */
@@ -1306,6 +1315,11 @@ static tree c_parser_array_notation (location_t, c_parser *, tree, tree);
 static tree c_parser_cilk_clause_vectorlength (c_parser *, tree, bool);
 static void c_parser_cilk_grainsize (c_parser *);
 
+/* OpenACC support.  */
+static tree c_parser_oacc_all_clauses (c_parser *, omp_clause_mask,
+				       const char *, omp_clause_mask,
+				       bool, bool);
+
 /* Parse a translation unit (C90 6.7, C99 6.9).
 
    translation-unit:
@@ -1449,12 +1463,13 @@ c_parser_external_declaration (c_parser *parser)
 	 only tell which after parsing the declaration specifiers, if
 	 any, and the first declarator.  */
       c_parser_declaration_or_fndef (parser, true, true, true, false, true,
-				     NULL, vNULL);
+				     NULL, vNULL, NULL_TREE, false);
       break;
     }
 }
 
 static void c_finish_omp_declare_simd (c_parser *, tree, tree, vec<c_token>);
+static void c_finish_oacc_routine (c_parser *, tree, tree, bool);
 
 /* Parse a declaration or function definition (C90 6.5, 6.7.1, C99
    6.7, 6.9.1).  If FNDEF_OK is true, a function definition is
@@ -1532,7 +1547,8 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
 			       bool static_assert_ok, bool empty_ok,
 			       bool nested, bool start_attr_ok,
 			       tree *objc_foreach_object_declaration,
-			       vec<c_token> omp_declare_simd_clauses)
+			       vec<c_token> omp_declare_simd_clauses,
+			       tree oacc_routine_clauses, bool oacc_routine_named)
 {
   struct c_declspecs *specs;
   tree prefix_attrs;
@@ -1710,6 +1726,9 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
 	      || !vec_safe_is_empty (parser->cilk_simd_fn_tokens))
 	    c_finish_omp_declare_simd (parser, NULL_TREE, NULL_TREE,
 				       omp_declare_simd_clauses);
+	  else
+	    c_finish_oacc_routine (parser, NULL_TREE,
+				      oacc_routine_clauses, oacc_routine_named);
 	  c_parser_skip_to_end_of_block_or_statement (parser);
 	  return;
 	}
@@ -1806,6 +1825,9 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
 		      || !vec_safe_is_empty (parser->cilk_simd_fn_tokens))
 		    c_finish_omp_declare_simd (parser, d, NULL_TREE,
 					       omp_declare_simd_clauses);
+		  else
+		    c_finish_oacc_routine (parser, d, oacc_routine_clauses,
+					      oacc_routine_named);
 		}
 	      else
 		{
@@ -1819,6 +1841,10 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
 		      || !vec_safe_is_empty (parser->cilk_simd_fn_tokens))
 		    c_finish_omp_declare_simd (parser, d, NULL_TREE,
 					       omp_declare_simd_clauses);
+		  else
+		    c_finish_oacc_routine (parser, d, oacc_routine_clauses,
+					      oacc_routine_named);
+
 		  start_init (d, asm_name, global_bindings_p ());
 		  init_loc = c_parser_peek_token (parser)->location;
 		  init = c_parser_initializer (parser);
@@ -1864,6 +1890,9 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
 		    temp_store_parm_decls (d, parms);
 		  c_finish_omp_declare_simd (parser, d, parms,
 					     omp_declare_simd_clauses);
+		  c_finish_oacc_routine (parser, d, oacc_routine_clauses,
+					    oacc_routine_named);
+
 		  if (parms)
 		    temp_pop_parm_decls ();
 		}
@@ -1970,13 +1999,17 @@ c_parser_declaration_or_fndef (c_parser *parser, bool fndef_ok,
 	 function definitions either.  */
       while (c_parser_next_token_is_not (parser, CPP_EOF)
 	     && c_parser_next_token_is_not (parser, CPP_OPEN_BRACE))
-	c_parser_declaration_or_fndef (parser, false, false, false,
-				       true, false, NULL, vNULL);
+	c_parser_declaration_or_fndef (parser, false, false, false, true,
+				       false, NULL, vNULL, NULL_TREE, false);
       store_parm_decls ();
       if (omp_declare_simd_clauses.exists ()
 	  || !vec_safe_is_empty (parser->cilk_simd_fn_tokens))
 	c_finish_omp_declare_simd (parser, current_function_decl, NULL_TREE,
 				   omp_declare_simd_clauses);
+      else
+	c_finish_oacc_routine (parser, current_function_decl,
+				  oacc_routine_clauses, oacc_routine_named);
+
       DECL_STRUCT_FUNCTION (current_function_decl)->function_start_locus
 	= c_parser_peek_token (parser)->location;
       fnbody = c_parser_compound_statement (parser);
@@ -4624,7 +4657,7 @@ c_parser_compound_statement_nostart (c_parser *parser)
 	  last_label = false;
 	  mark_valid_location_for_stdc_pragma (false);
 	  c_parser_declaration_or_fndef (parser, true, true, true, true,
-					 true, NULL, vNULL);
+					 true, NULL, vNULL, NULL_TREE, false);
 	  if (last_stmt)
 	    pedwarn_c90 (loc, OPT_Wdeclaration_after_statement,
 			 "ISO C90 forbids mixed declarations and code");
@@ -4649,7 +4682,8 @@ c_parser_compound_statement_nostart (c_parser *parser)
 	      last_label = false;
 	      mark_valid_location_for_stdc_pragma (false);
 	      c_parser_declaration_or_fndef (parser, true, true, true, true,
-					     true, NULL, vNULL);
+					     true, NULL, vNULL, NULL_TREE,
+					     false);
 	      /* Following the old parser, __extension__ does not
 		 disable this diagnostic.  */
 	      restore_extension_diagnostics (ext);
@@ -4798,7 +4832,7 @@ c_parser_label (c_parser *parser)
 					 /*static_assert_ok*/ true,
 					 /*empty_ok*/ true, /*nested*/ true,
 					 /*start_attr_ok*/ true, NULL,
-					 vNULL);
+					 vNULL, NULL_TREE, false);
 	}
     }
 }
@@ -5501,7 +5535,8 @@ c_parser_for_statement (c_parser *parser, bool ivdep)
       else if (c_parser_next_tokens_start_declaration (parser))
 	{
 	  c_parser_declaration_or_fndef (parser, true, true, true, true, true, 
-					 &object_expression, vNULL);
+					 &object_expression, vNULL, NULL_TREE,
+					 false);
 	  parser->objc_could_be_foreach_context = false;
 	  
 	  if (c_parser_next_token_is_keyword (parser, RID_IN))
@@ -5530,7 +5565,8 @@ c_parser_for_statement (c_parser *parser, bool ivdep)
 	      ext = disable_extension_diagnostics ();
 	      c_parser_consume_token (parser);
 	      c_parser_declaration_or_fndef (parser, true, true, true, true,
-					     true, &object_expression, vNULL);
+					     true, &object_expression, vNULL,
+					     NULL_TREE, false);
 	      parser->objc_could_be_foreach_context = false;
 	      
 	      restore_extension_diagnostics (ext);
@@ -8658,8 +8694,9 @@ c_parser_objc_methodprotolist (c_parser *parser)
 	      c_parser_consume_token (parser);
 	    }
 	  else
-	    c_parser_declaration_or_fndef (parser, false, false, true,
-					   false, true, NULL, vNULL);
+	    c_parser_declaration_or_fndef (parser, false, false, true,false,
+					   true, NULL, vNULL, NULL_TREE,
+					   false);
 	  break;
 	}
     }
@@ -9608,14 +9645,36 @@ c_parser_pragma (c_parser *parser, enum pragma_context context)
 
   switch (id)
     {
+    case PRAGMA_OACC_DECLARE:
+      c_parser_oacc_declare (parser);
+      return false;
+
     case PRAGMA_OACC_ENTER_DATA:
+      if (context != pragma_compound)
+	{
+	  if (context == pragma_stmt)
+	    c_parser_error (parser, "%<#pragma acc enter data%> may only be "
+			    "used in compound statements");
+	  goto bad_stmt;
+	}
       c_parser_oacc_enter_exit_data (parser, true);
       return false;
 
     case PRAGMA_OACC_EXIT_DATA:
+      if (context != pragma_compound)
+	{
+	  if (context == pragma_stmt)
+	    c_parser_error (parser, "%<#pragma acc exit data%> may only be "
+			    "used in compound statements");
+	  goto bad_stmt;
+	}
       c_parser_oacc_enter_exit_data (parser, false);
       return false;
 
+    case PRAGMA_OACC_ROUTINE:
+      c_parser_oacc_routine (parser, context);
+      return false;
+
     case PRAGMA_OACC_UPDATE:
       if (context != pragma_compound)
 	{
@@ -9761,6 +9820,16 @@ c_parser_pragma (c_parser *parser, enum pragma_context context)
       c_parser_cilk_grainsize (parser);
       return false;
 
+    case PRAGMA_OACC_WAIT:
+      if (context != pragma_compound)
+	{
+	  if (context == pragma_stmt)
+	    c_parser_error (parser, "%<#pragma acc enter data%> may only be "
+			    "used in compound statements");
+	  goto bad_stmt;
+	}
+	/* FALL THROUGH.  */
+
     default:
       if (id < PRAGMA_FIRST_EXTERNAL)
 	{
@@ -9837,7 +9906,7 @@ c_parser_pragma_pch_preprocess (c_parser *parser)
    returned and the token is consumed.  */
 
 static pragma_omp_clause
-c_parser_omp_clause_name (c_parser *parser)
+c_parser_omp_clause_name (c_parser *parser, bool consume_token = true)
 {
   pragma_omp_clause result = PRAGMA_OMP_CLAUSE_NONE;
 
@@ -9861,6 +9930,10 @@ c_parser_omp_clause_name (c_parser *parser)
 	  else if (!strcmp ("async", p))
 	    result = PRAGMA_OACC_CLAUSE_ASYNC;
 	  break;
+	case 'b':
+	  if (!strcmp ("bind", p))
+	    result = PRAGMA_OACC_CLAUSE_BIND;
+	  break;
 	case 'c':
 	  if (!strcmp ("collapse", p))
 	    result = PRAGMA_OMP_CLAUSE_COLLAPSE;
@@ -9882,6 +9955,11 @@ c_parser_omp_clause_name (c_parser *parser)
 	    result = PRAGMA_OMP_CLAUSE_DEPEND;
 	  else if (!strcmp ("device", p))
 	    result = PRAGMA_OMP_CLAUSE_DEVICE;
+	  else if (!strcmp ("device_resident", p))
+	    result = PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT;
+	  else if (!strcmp ("device_type", p)
+		   || !strcmp ("dtype", p))
+	    result = PRAGMA_OACC_CLAUSE_DEVICE_TYPE;
 	  else if (!strcmp ("deviceptr", p))
 	    result = PRAGMA_OACC_CLAUSE_DEVICEPTR;
 	  else if (!strcmp ("dist_schedule", p))
@@ -9906,12 +9984,16 @@ c_parser_omp_clause_name (c_parser *parser)
 	case 'i':
 	  if (!strcmp ("inbranch", p))
 	    result = PRAGMA_OMP_CLAUSE_INBRANCH;
+	  else if (!strcmp ("independent", p))
+	    result = PRAGMA_OACC_CLAUSE_INDEPENDENT;
 	  break;
 	case 'l':
 	  if (!strcmp ("lastprivate", p))
 	    result = PRAGMA_OMP_CLAUSE_LASTPRIVATE;
 	  else if (!strcmp ("linear", p))
 	    result = PRAGMA_OMP_CLAUSE_LINEAR;
+	  else if (!strcmp ("link", p))
+	    result = PRAGMA_OACC_CLAUSE_LINK;
 	  break;
 	case 'm':
 	  if (!strcmp ("map", p))
@@ -9926,6 +10008,8 @@ c_parser_omp_clause_name (c_parser *parser)
 	    result = PRAGMA_OMP_CLAUSE_NOTINBRANCH;
 	  else if (!strcmp ("nowait", p))
 	    result = PRAGMA_OMP_CLAUSE_NOWAIT;
+	  else if (!strcmp ("nohost", p))
+	    result = PRAGMA_OACC_CLAUSE_NOHOST;
 	  else if (!strcmp ("num_gangs", p))
 	    result = PRAGMA_OACC_CLAUSE_NUM_GANGS;
 	  else if (!strcmp ("num_teams", p))
@@ -9974,20 +10058,22 @@ c_parser_omp_clause_name (c_parser *parser)
 	    result = PRAGMA_OMP_CLAUSE_SCHEDULE;
 	  else if (!strcmp ("sections", p))
 	    result = PRAGMA_OMP_CLAUSE_SECTIONS;
+	  else if (!strcmp ("self", p)) /* "self" is a synonym for "host".  */
+	    result = PRAGMA_OACC_CLAUSE_HOST;
 	  else if (!strcmp ("seq", p))
 	    result = PRAGMA_OACC_CLAUSE_SEQ;
 	  else if (!strcmp ("shared", p))
 	    result = PRAGMA_OMP_CLAUSE_SHARED;
 	  else if (!strcmp ("simdlen", p))
 	    result = PRAGMA_OMP_CLAUSE_SIMDLEN;
-	  else if (!strcmp ("self", p))
-	    result = PRAGMA_OACC_CLAUSE_SELF;
 	  break;
 	case 't':
 	  if (!strcmp ("taskgroup", p))
 	    result = PRAGMA_OMP_CLAUSE_TASKGROUP;
 	  else if (!strcmp ("thread_limit", p))
 	    result = PRAGMA_OMP_CLAUSE_THREAD_LIMIT;
+	  else if (!strcmp ("tile", p))
+	    result = PRAGMA_OACC_CLAUSE_TILE;
 	  else if (!strcmp ("to", p))
 	    result = PRAGMA_OMP_CLAUSE_TO;
 	  break;
@@ -9996,6 +10082,8 @@ c_parser_omp_clause_name (c_parser *parser)
 	    result = PRAGMA_OMP_CLAUSE_UNIFORM;
 	  else if (!strcmp ("untied", p))
 	    result = PRAGMA_OMP_CLAUSE_UNTIED;
+	  else if (!strcmp ("use_device", p))
+	    result = PRAGMA_OACC_CLAUSE_USE_DEVICE;
 	  break;
 	case 'v':
 	  if (!strcmp ("vector", p))
@@ -10014,7 +10102,7 @@ c_parser_omp_clause_name (c_parser *parser)
 	}
     }
 
-  if (result != PRAGMA_OMP_CLAUSE_NONE)
+  if (consume_token && result != PRAGMA_OMP_CLAUSE_NONE)
     c_parser_consume_token (parser);
 
   return result;
@@ -10053,7 +10141,8 @@ c_parser_oacc_wait_list (c_parser *parser, location_t clause_loc, tree list)
 
   if (args->length () == 0)
     {
-      c_parser_error (parser, "expected integer expression before ')'");
+      c_parser_error (parser,
+		      "expected integer expression list before %<)%>");
       release_tree_vector (args);
       return list;
     }
@@ -10245,6 +10334,8 @@ c_parser_omp_var_list_parens (c_parser *parser, enum omp_clause_code kind,
    copyout ( variable-list )
    create ( variable-list )
    delete ( variable-list )
+   device_resident ( variable-list )
+   link ( variable-list )
    present ( variable-list )
    present_or_copy ( variable-list )
      pcopy ( variable-list )
@@ -10280,10 +10371,15 @@ c_parser_oacc_data_clause (c_parser *parser, pragma_omp_clause c_kind,
     case PRAGMA_OACC_CLAUSE_DEVICE:
       kind = GOMP_MAP_FORCE_TO;
       break;
+    case PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT:
+      kind = GOMP_MAP_DEVICE_RESIDENT;
+      break;
     case PRAGMA_OACC_CLAUSE_HOST:
-    case PRAGMA_OACC_CLAUSE_SELF:
       kind = GOMP_MAP_FORCE_FROM;
       break;
+    case PRAGMA_OACC_CLAUSE_LINK:
+      kind = GOMP_MAP_LINK;
+      break;
     case PRAGMA_OACC_CLAUSE_PRESENT:
       kind = GOMP_MAP_FORCE_PRESENT;
       break;
@@ -10410,7 +10506,8 @@ c_parser_omp_clause_copyprivate (c_parser *parser, tree list)
    default ( shared | none ) */
 
 static tree
-c_parser_omp_clause_default (c_parser *parser, tree list)
+c_parser_omp_clause_default (c_parser *parser, tree list,
+			     bool only_none = false)
 {
   enum omp_clause_default_kind kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED;
   location_t loc = c_parser_peek_token (parser)->location;
@@ -10431,7 +10528,7 @@ c_parser_omp_clause_default (c_parser *parser, tree list)
 	  break;
 
 	case 's':
-	  if (strcmp ("shared", p) != 0)
+	  if (strcmp ("shared", p) != 0 || only_none)
 	    goto invalid_kind;
 	  kind = OMP_CLAUSE_DEFAULT_SHARED;
 	  break;
@@ -10445,7 +10542,10 @@ c_parser_omp_clause_default (c_parser *parser, tree list)
   else
     {
     invalid_kind:
-      c_parser_error (parser, "expected %<none%> or %<shared%>");
+      if (only_none)
+	c_parser_error (parser, "expected %<none%>");
+      else
+	c_parser_error (parser, "expected %<none%> or %<shared%>");
     }
   c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, "expected %<)%>");
 
@@ -10562,139 +10662,195 @@ c_parser_omp_clause_nowait (c_parser *parser ATTRIBUTE_UNUSED, tree list)
   return c;
 }
 
-/* OpenACC:
-   num_gangs ( expression ) */
 
+/* Attempt to statically determine when the number T isn't positive.
+   Warn if we determined this and return positive one as the new
+   expression.  */
 static tree
-c_parser_omp_clause_num_gangs (c_parser *parser, tree list)
+require_positive_expr (tree t, location_t loc, const char *str)
 {
-  location_t num_gangs_loc = c_parser_peek_token (parser)->location;
-  if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
+  tree c = fold_build2_loc (loc, LE_EXPR, boolean_type_node, t,
+			    build_int_cst (TREE_TYPE (t), 0));
+  if (c == boolean_true_node)
     {
-      location_t expr_loc = c_parser_peek_token (parser)->location;
-      tree c, t = c_parser_expression (parser).value;
-      mark_exp_read (t);
-      t = c_fully_fold (t, false, NULL);
-
-      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, "expected %<)%>");
-
-      if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
-	{
-	  c_parser_error (parser, "expected integer expression");
-	  return list;
-	}
-
-      /* Attempt to statically determine when the number isn't positive.  */
-      c = fold_build2_loc (expr_loc, LE_EXPR, boolean_type_node, t,
-		       build_int_cst (TREE_TYPE (t), 0));
-      if (CAN_HAVE_LOCATION_P (c))
-	SET_EXPR_LOCATION (c, expr_loc);
-      if (c == boolean_true_node)
-	{
-	  warning_at (expr_loc, 0,
-		      "%<num_gangs%> value must be positive");
-	  t = integer_one_node;
-	}
-
-      check_no_duplicate_clause (list, OMP_CLAUSE_NUM_GANGS, "num_gangs");
-
-      c = build_omp_clause (num_gangs_loc, OMP_CLAUSE_NUM_GANGS);
-      OMP_CLAUSE_NUM_GANGS_EXPR (c) = t;
-      OMP_CLAUSE_CHAIN (c) = list;
-      list = c;
+      warning_at (loc, 0,
+		  "%<%s%> value must be positive", str);
+      t = integer_one_node;
     }
-
-  return list;
+  return t;
 }
 
-/* OpenMP 2.5:
+/* OpenACC:
+   num_gangs ( expression )
+   num_workers ( expression )
+   vector_length ( expression )
+
+   OpenMP 2.5:
    num_threads ( expression ) */
 
 static tree
-c_parser_omp_clause_num_threads (c_parser *parser, tree list)
+c_parser_omp_positive_int_clause (c_parser *parser, pragma_omp_clause c_kind,
+				  const char *str, tree list)
 {
-  location_t num_threads_loc = c_parser_peek_token (parser)->location;
-  if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
+  omp_clause_code kind;
+  switch (c_kind)
     {
-      location_t expr_loc = c_parser_peek_token (parser)->location;
-      tree c, t = c_parser_expression (parser).value;
-      mark_exp_read (t);
-      t = c_fully_fold (t, false, NULL);
-
-      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, "expected %<)%>");
+    default:
+      gcc_unreachable ();
+    case PRAGMA_OACC_CLAUSE_NUM_GANGS:
+      kind = OMP_CLAUSE_NUM_GANGS;
+      break;
+    case PRAGMA_OMP_CLAUSE_NUM_THREADS:
+      kind = OMP_CLAUSE_NUM_THREADS;
+      break;
+    case PRAGMA_OACC_CLAUSE_NUM_WORKERS:
+      kind = OMP_CLAUSE_NUM_WORKERS;
+      break;
+    case PRAGMA_OACC_CLAUSE_VECTOR_LENGTH:
+      kind = OMP_CLAUSE_VECTOR_LENGTH;
+      break;
+    }
 
-      if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
-	{
-	  c_parser_error (parser, "expected integer expression");
-	  return list;
-	}
+  location_t loc = c_parser_peek_token (parser)->location;
+  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
+    return list;
 
-      /* Attempt to statically determine when the number isn't positive.  */
-      c = fold_build2_loc (expr_loc, LE_EXPR, boolean_type_node, t,
-		       build_int_cst (TREE_TYPE (t), 0));
-      if (CAN_HAVE_LOCATION_P (c))
-	SET_EXPR_LOCATION (c, expr_loc);
-      if (c == boolean_true_node)
-	{
-	  warning_at (expr_loc, 0,
-		      "%<num_threads%> value must be positive");
-	  t = integer_one_node;
-	}
+  location_t expr_loc = c_parser_peek_token (parser)->location;
+  tree c, t = c_parser_expression (parser).value;
+  mark_exp_read (t);
+  t = c_fully_fold (t, false, NULL);
 
-      check_no_duplicate_clause (list, OMP_CLAUSE_NUM_THREADS, "num_threads");
+  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, "expected %<)%>");
 
-      c = build_omp_clause (num_threads_loc, OMP_CLAUSE_NUM_THREADS);
-      OMP_CLAUSE_NUM_THREADS_EXPR (c) = t;
-      OMP_CLAUSE_CHAIN (c) = list;
-      list = c;
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
+    {
+      c_parser_error (parser, "expected integer expression");
+      return list;
     }
 
-  return list;
+  require_positive_expr (t, expr_loc, str);
+
+  check_no_duplicate_clause (list, kind, str);
+
+  c = build_omp_clause (loc, kind);
+  OMP_CLAUSE_OPERAND (c, 0) = t;
+  OMP_CLAUSE_CHAIN (c) = list;
+  return c;
 }
 
 /* OpenACC:
-   num_workers ( expression ) */
+   gang [( gang_expr_list )]
+   worker [( expression )]
+   vector [( expression )] */
 
 static tree
-c_parser_omp_clause_num_workers (c_parser *parser, tree list)
+c_parser_oacc_shape_clause (c_parser *parser, pragma_omp_clause c_kind,
+			    const char *str, tree list)
 {
-  location_t num_workers_loc = c_parser_peek_token (parser)->location;
-  if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
+  omp_clause_code kind;
+  const char *id = "num";
+
+  switch (c_kind)
     {
-      location_t expr_loc = c_parser_peek_token (parser)->location;
-      tree c, t = c_parser_expression (parser).value;
-      mark_exp_read (t);
-      t = c_fully_fold (t, false, NULL);
+    default:
+      gcc_unreachable ();
+    case PRAGMA_OACC_CLAUSE_GANG:
+      kind = OMP_CLAUSE_GANG;
+      break;
+    case PRAGMA_OACC_CLAUSE_VECTOR:
+      kind = OMP_CLAUSE_VECTOR;
+      id = "length";
+      break;
+    case PRAGMA_OACC_CLAUSE_WORKER:
+      kind = OMP_CLAUSE_WORKER;
+      break;
+    }
 
-      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, "expected %<)%>");
+  tree op0 = NULL_TREE, op1 = NULL_TREE;
+  location_t loc = c_parser_peek_token (parser)->location;
 
-      if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
-	{
-	  c_parser_error (parser, "expected integer expression");
-	  return list;
-	}
+  if (c_parser_next_token_is (parser, CPP_OPEN_PAREN))
+    {
+      tree *op_to_parse = &op0;
+      c_parser_consume_token (parser);
 
-      /* Attempt to statically determine when the number isn't positive.  */
-      c = fold_build2_loc (expr_loc, LE_EXPR, boolean_type_node, t,
-		       build_int_cst (TREE_TYPE (t), 0));
-      if (CAN_HAVE_LOCATION_P (c))
-	SET_EXPR_LOCATION (c, expr_loc);
-      if (c == boolean_true_node)
+      do
 	{
-	  warning_at (expr_loc, 0,
-		      "%<num_workers%> value must be positive");
-	  t = integer_one_node;
-	}
+	  if (c_parser_next_token_is (parser, CPP_NAME)
+	      || c_parser_next_token_is (parser, CPP_KEYWORD))
+	    {
+	      tree name_kind = c_parser_peek_token (parser)->value;
+	      const char *p = IDENTIFIER_POINTER (name_kind);
+	      if (kind == OMP_CLAUSE_GANG && strcmp ("static", p) == 0)
+		{
+		  c_parser_consume_token (parser);
+		  if (!c_parser_require (parser, CPP_COLON, "expected %<:%>"))
+		    {
+		      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, 0);
+		      return list;
+		    }
+		  op_to_parse = &op1;
+		  if (c_parser_next_token_is (parser, CPP_MULT))
+		    {
+		      c_parser_consume_token (parser);
+		      *op_to_parse = integer_minus_one_node;
+		      continue;
+		    }
+		}
+	      else if (strcmp (id, p) == 0)
+		{
+		  c_parser_consume_token (parser);
+		  if (!c_parser_require (parser, CPP_COLON, "expected %<:%>"))
+		    {
+		      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, 0);
+		      return list;
+		    }
+		}
+	      else
+		{
+		  if (kind == OMP_CLAUSE_GANG)
+		    c_parser_error (parser, "expected %<%num%> or %<static%>");
+		  else if (kind == OMP_CLAUSE_VECTOR)
+		    c_parser_error (parser, "expected %<length%>");
+		  else
+		    c_parser_error (parser, "expected %<num%>");
+		  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, 0);
+		  return list;
+		}
+	    }
+
+	  if (*op_to_parse != NULL_TREE)
+	    {
+	      c_parser_error (parser, "duplicate operand to clause");
+	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, 0);
+	      return list;
+	    }
 
-      check_no_duplicate_clause (list, OMP_CLAUSE_NUM_WORKERS, "num_workers");
+	  location_t expr_loc = c_parser_peek_token (parser)->location;
+	  tree expr = c_parser_expression (parser).value;
+	  if (expr == error_mark_node)
+	    {
+	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, 0);
+	      return list;
+	    }
 
-      c = build_omp_clause (num_workers_loc, OMP_CLAUSE_NUM_WORKERS);
-      OMP_CLAUSE_NUM_WORKERS_EXPR (c) = t;
-      OMP_CLAUSE_CHAIN (c) = list;
-      list = c;
+	  mark_exp_read (expr);
+	  require_positive_expr (expr, expr_loc, str);
+	  *op_to_parse = expr;
+	}
+      while (!c_parser_next_token_is (parser, CPP_CLOSE_PAREN));
+      c_parser_consume_token (parser);
     }
 
-  return list;
+  check_no_duplicate_clause (list, kind, str);
+
+  tree c = build_omp_clause (loc, kind);
+  if (op0)
+    OMP_CLAUSE_OPERAND (c, 0) = op0;
+  if (op1)
+    OMP_CLAUSE_OPERAND (c, 1) = op1;
+  OMP_CLAUSE_CHAIN (c) = list;
+  return c;
 }
 
 /* OpenACC:
@@ -10732,6 +10888,195 @@ c_parser_oacc_clause_async (c_parser *parser, tree list)
   return list;
 }
 
+/* OpenACC 2.0:
+   bind ( identifier )
+   bind ( string-literal ) */
+
+static tree
+c_parser_oacc_clause_bind (c_parser *parser, tree list)
+{
+  location_t loc = c_parser_peek_token (parser)->location;
+
+  parser->lex_untranslated_string = true;
+  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
+    {
+      parser->lex_untranslated_string = false;
+      return list;
+    }
+  if (c_parser_next_token_is (parser, CPP_NAME)
+      || c_parser_next_token_is (parser, CPP_STRING))
+    {
+      tree t = c_parser_peek_token (parser)->value;
+      c_parser_consume_token (parser);
+      tree c = build_omp_clause (loc, OMP_CLAUSE_BIND);
+      OMP_CLAUSE_BIND_NAME (c) = t;
+      OMP_CLAUSE_CHAIN (c) = list;
+      list = c;
+    }
+  else
+    c_parser_error (parser, "expected identifier or character string literal");
+  parser->lex_untranslated_string = false;
+  c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>");
+  return list;
+}
+
+/* OpenACC 2.0:
+   device_type ( size-expr-list ) clauses */
+
+static tree
+c_parser_oacc_clause_device_type (c_parser *parser, omp_clause_mask mask,
+				  tree list)
+{
+  tree c, clauses;
+  location_t loc;
+  int dev_id = GOMP_DEVICE_NONE;
+
+  loc = c_parser_peek_token (parser)->location;
+  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
+    return list;
+
+  if (c_parser_next_token_is (parser, CPP_MULT))
+    {
+      c_parser_consume_token (parser);
+      dev_id = GOMP_DEVICE_DEFAULT;
+      if (!c_parser_require (parser, CPP_CLOSE_PAREN, "expected %<)%>"))
+	return list;
+    }
+  else
+    {
+      do
+	{
+	  tree keyword = error_mark_node;
+	  int dev = 0;
+
+	  if (c_parser_next_token_is (parser, CPP_NAME))
+	    {
+	      keyword = c_parser_peek_token (parser)->value;
+	      c_parser_consume_token (parser);
+	    }
+
+	  if (keyword == error_mark_node)
+	    {
+	      error_at (loc, "expected keyword or %<)%>");
+	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
+					 "expected %<)%>");
+	      return list;
+	    }
+
+	  dev = oacc_extract_device_id (IDENTIFIER_POINTER (keyword));
+	  if (dev)
+	    dev_id |= 1 << dev;
+
+	  if (c_parser_next_token_is (parser, CPP_COMMA))
+	    c_parser_consume_token (parser);
+	}
+      while (c_parser_next_token_is_not (parser, CPP_CLOSE_PAREN));
+
+      /* Consume the trailing ')'.  */
+      c_parser_consume_token (parser);
+    }
+
+  c = build_omp_clause (loc, OMP_CLAUSE_DEVICE_TYPE);
+  clauses = c_parser_oacc_all_clauses (parser, mask, "device_type", 0, false,
+				       false);
+  OMP_CLAUSE_DEVICE_TYPE_CLAUSES (c) = clauses;
+  OMP_CLAUSE_DEVICE_TYPE_DEVICES (c) = build_int_cst (integer_type_node,
+						      dev_id);
+  OMP_CLAUSE_CHAIN (c) = list;
+  return c;
+}
+
+/* OpenACC 2.0:
+   tile ( size-expr-list ) */
+
+static tree
+c_parser_oacc_clause_tile (c_parser *parser, tree list)
+{
+  tree c, num = error_mark_node;
+  HOST_WIDE_INT n;
+  location_t loc;
+  tree tile = NULL_TREE;
+  vec<tree, va_gc> *tvec = make_tree_vector ();
+
+  check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile");
+
+  loc = c_parser_peek_token (parser)->location;
+  if (!c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
+    {
+      release_tree_vector (tvec);
+      return list;
+    }
+
+  do
+    {
+      if (c_parser_next_token_is (parser, CPP_MULT))
+	{
+	  c_parser_consume_token (parser);
+	  num = integer_minus_one_node;
+	}
+      else
+	{
+	  num = c_parser_expr_no_commas (parser, NULL).value;
+
+	  if (num == error_mark_node)
+	    {
+	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
+					 "expected %<)%>");
+	      release_tree_vector (tvec);
+	      return list;
+	    }
+
+	  mark_exp_read (num);
+	  num = c_fully_fold (num, false, NULL);
+
+	  if (!INTEGRAL_TYPE_P (TREE_TYPE (num))
+	      || !tree_fits_shwi_p (num)
+	      || (n = tree_to_shwi (num)) <= 0
+	      || (int) n != n)
+	    {
+	      error_at (loc,
+			"tile argument needs positive constant integer "
+			"expression");
+	      release_tree_vector (tvec);
+	      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN,
+					 "expected %<)%>");
+	      return list;
+	    }
+	}
+
+      if (num == error_mark_node)
+	{
+	  error_at (loc, "expected positive integer or %<)%>");
+	  release_tree_vector (tvec);
+	  return list;
+	}
+
+      vec_safe_push (tvec, num);
+      if (c_parser_next_token_is (parser, CPP_COMMA))
+	c_parser_consume_token (parser);
+    }
+  while (c_parser_next_token_is_not (parser, CPP_CLOSE_PAREN));
+
+  /* Consume the trailing ')'.  */
+  c_parser_consume_token (parser);
+
+  c = build_omp_clause (loc, OMP_CLAUSE_TILE);
+  tile = build_tree_list_vec (tvec);
+  OMP_CLAUSE_TILE_LIST (c) = tile;
+  OMP_CLAUSE_CHAIN (c) = list;
+  release_tree_vector (tvec);
+  return c;
+}
+
+/* OpenACC 2.0:
+   use_device ( variable-list ) */
+
+static tree
+c_parser_oacc_clause_use_device (c_parser *parser, tree list)
+{
+  return c_parser_omp_var_list_parens (parser, OMP_CLAUSE_USE_DEVICE, list);
+}
+
 /* OpenACC:
    wait ( int-expr-list ) */
 
@@ -10980,74 +11325,20 @@ c_parser_omp_clause_shared (c_parser *parser, tree list)
 }
 
 /* OpenMP 3.0:
-   untied */
+   untied (FIXME: should we allow duplicates?)
 
-static tree
-c_parser_omp_clause_untied (c_parser *parser ATTRIBUTE_UNUSED, tree list)
-{
-  tree c;
-
-  /* FIXME: Should we allow duplicates?  */
-  check_no_duplicate_clause (list, OMP_CLAUSE_UNTIED, "untied");
-
-  c = build_omp_clause (c_parser_peek_token (parser)->location,
-			OMP_CLAUSE_UNTIED);
-  OMP_CLAUSE_CHAIN (c) = list;
-
-  return c;
-}
-
-/* OpenACC:
-   vector_length ( expression ) */
-
-static tree
-c_parser_omp_clause_vector_length (c_parser *parser, tree list)
-{
-  location_t vector_length_loc = c_parser_peek_token (parser)->location;
-  if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
-    {
-      location_t expr_loc = c_parser_peek_token (parser)->location;
-      tree c, t = c_parser_expression (parser).value;
-      mark_exp_read (t);
-      t = c_fully_fold (t, false, NULL);
-
-      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, "expected %<)%>");
-
-      if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
-	{
-	  c_parser_error (parser, "expected integer expression");
-	  return list;
-	}
-
-      /* Attempt to statically determine when the number isn't positive.  */
-      c = fold_build2_loc (expr_loc, LE_EXPR, boolean_type_node, t,
-		       build_int_cst (TREE_TYPE (t), 0));
-      if (CAN_HAVE_LOCATION_P (c))
-	SET_EXPR_LOCATION (c, expr_loc);
-      if (c == boolean_true_node)
-	{
-	  warning_at (expr_loc, 0,
-		      "%<vector_length%> value must be positive");
-	  t = integer_one_node;
-	}
-
-      check_no_duplicate_clause (list, OMP_CLAUSE_VECTOR_LENGTH, "vector_length");
-
-      c = build_omp_clause (vector_length_loc, OMP_CLAUSE_VECTOR_LENGTH);
-      OMP_CLAUSE_VECTOR_LENGTH_EXPR (c) = t;
-      OMP_CLAUSE_CHAIN (c) = list;
-      list = c;
-    }
-
-  return list;
-}
-
-/* OpenMP 4.0:
+   OpenMP 4.0:
    inbranch
-   notinbranch */
+   notinbranch
+
+   OpenACC 2.0:
+   auto
+   independent
+   nohost
+   seq */
 
 static tree
-c_parser_omp_clause_branch (c_parser *parser ATTRIBUTE_UNUSED,
+c_parser_omp_simple_clause (c_parser *parser ATTRIBUTE_UNUSED,
 			    enum omp_clause_code code, tree list)
 {
   check_no_duplicate_clause (list, code, omp_clause_code_name[code]);
@@ -11579,14 +11870,17 @@ c_parser_omp_clause_uniform (c_parser *parser, tree list)
 }
 
 /* Parse all OpenACC clauses.  The set clauses allowed by the directive
-   is a bitmask in MASK.  Return the list of clauses found.  */
+   is a bitmask in MASK.  DTYPE_MASK denotes which clauses may follow a
+   device_type clause.  Return the list of clauses found.  */
 
-static tree
+tree
 c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
-			   const char *where, bool finish_p = true)
+			   const char *where, omp_clause_mask dtype_mask = 0,
+			   bool finish_p = true, bool scan_dtype = true)
 {
   tree clauses = NULL;
   bool first = true;
+  bool seen_dtype = false;
 
   while (c_parser_next_token_is_not (parser, CPP_PRAGMA_EOL))
     {
@@ -11598,15 +11892,35 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
       if (!first && c_parser_next_token_is (parser, CPP_COMMA))
 	c_parser_consume_token (parser);
 
+      if (!scan_dtype && c_parser_omp_clause_name (parser, false)
+	  == PRAGMA_OACC_CLAUSE_DEVICE_TYPE)
+	return clauses;
+
       here = c_parser_peek_token (parser)->location;
       c_kind = c_parser_omp_clause_name (parser);
 
+      if (seen_dtype && c_kind != PRAGMA_OMP_CLAUSE_NONE
+	  && c_kind != PRAGMA_OACC_CLAUSE_DEVICE_TYPE)
+	{
+	  error_at (here, "invalid clauses following device_type");
+	  goto saw_error;
+	}	
+
       switch (c_kind)
 	{
 	case PRAGMA_OACC_CLAUSE_ASYNC:
 	  clauses = c_parser_oacc_clause_async (parser, clauses);
 	  c_name = "async";
 	  break;
+	case PRAGMA_OACC_CLAUSE_AUTO:
+	  clauses = c_parser_omp_simple_clause (parser, OMP_CLAUSE_AUTO,
+						clauses);
+	  c_name = "auto";
+	  break;
+	case PRAGMA_OACC_CLAUSE_BIND:
+	  clauses = c_parser_oacc_clause_bind (parser, clauses);
+	  c_name = "bind";
+	  break;
 	case PRAGMA_OACC_CLAUSE_COLLAPSE:
 	  clauses = c_parser_omp_clause_collapse (parser, clauses);
 	  c_name = "collapse";
@@ -11631,10 +11945,24 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  clauses = c_parser_oacc_data_clause (parser, c_kind, clauses);
 	  c_name = "delete";
 	  break;
+	case PRAGMA_OMP_CLAUSE_DEFAULT:
+	  clauses = c_parser_omp_clause_default (parser, clauses, true);
+	  c_name = "default";
+	  break;
 	case PRAGMA_OACC_CLAUSE_DEVICE:
 	  clauses = c_parser_oacc_data_clause (parser, c_kind, clauses);
 	  c_name = "device";
 	  break;
+	case PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT:
+	  clauses = c_parser_oacc_data_clause (parser, c_kind, clauses);
+	  c_name = "device_resident";
+	  break;
+	case PRAGMA_OACC_CLAUSE_DEVICE_TYPE:
+	  clauses = c_parser_oacc_clause_device_type (parser, dtype_mask,
+						      clauses);
+	  c_name = "device_type";
+	  seen_dtype = true;
+	  break;
 	case PRAGMA_OACC_CLAUSE_DEVICEPTR:
 	  clauses = c_parser_oacc_data_clause_deviceptr (parser, clauses);
 	  c_name = "deviceptr";
@@ -11643,6 +11971,11 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  clauses = c_parser_omp_clause_firstprivate (parser, clauses);
 	  c_name = "firstprivate";
 	  break;
+	case PRAGMA_OACC_CLAUSE_GANG:
+	  c_name = "gang";
+	  clauses = c_parser_oacc_shape_clause (parser, c_kind, c_name,
+						clauses);
+	  break;
 	case PRAGMA_OACC_CLAUSE_HOST:
 	  clauses = c_parser_oacc_data_clause (parser, c_kind, clauses);
 	  c_name = "host";
@@ -11651,13 +11984,29 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  clauses = c_parser_omp_clause_if (parser, clauses);
 	  c_name = "if";
 	  break;
+	case PRAGMA_OACC_CLAUSE_INDEPENDENT:
+	  clauses = c_parser_omp_simple_clause (parser, OMP_CLAUSE_INDEPENDENT,
+						clauses);
+	  c_name = "independent";
+	  break;
+	case PRAGMA_OACC_CLAUSE_LINK:
+	  clauses = c_parser_oacc_data_clause (parser, c_kind, clauses);
+	  c_name = "link";
+	  break;
+	case PRAGMA_OACC_CLAUSE_NOHOST:
+	  clauses = c_parser_omp_simple_clause (parser, OMP_CLAUSE_NOHOST,
+						clauses);
+	  c_name = "nohost";
+	  break;
 	case PRAGMA_OACC_CLAUSE_NUM_GANGS:
-	  clauses = c_parser_omp_clause_num_gangs (parser, clauses);
 	  c_name = "num_gangs";
+	  clauses = c_parser_omp_positive_int_clause (parser, c_kind, c_name,
+						      clauses);
 	  break;
 	case PRAGMA_OACC_CLAUSE_NUM_WORKERS:
-	  clauses = c_parser_omp_clause_num_workers (parser, clauses);
 	  c_name = "num_workers";
+	  clauses = c_parser_omp_positive_int_clause (parser, c_kind, c_name,
+						      clauses);
 	  break;
 	case PRAGMA_OACC_CLAUSE_PRESENT:
 	  clauses = c_parser_oacc_data_clause (parser, c_kind, clauses);
@@ -11687,18 +12036,38 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  clauses = c_parser_omp_clause_reduction (parser, clauses);
 	  c_name = "reduction";
 	  break;
-	case PRAGMA_OACC_CLAUSE_SELF:
-	  clauses = c_parser_oacc_data_clause (parser, c_kind, clauses);
-	  c_name = "self";
+	case PRAGMA_OACC_CLAUSE_SEQ:
+	  clauses = c_parser_omp_simple_clause (parser, OMP_CLAUSE_SEQ,
+						clauses);
+	  c_name = "seq";
+	  break;
+	case PRAGMA_OACC_CLAUSE_TILE:
+	  clauses = c_parser_oacc_clause_tile (parser, clauses);
+	  c_name = "tile";
+	  break;
+	case PRAGMA_OACC_CLAUSE_USE_DEVICE:
+	  clauses = c_parser_oacc_clause_use_device (parser, clauses);
+	  c_name = "use_device";
+	  break;
+	case PRAGMA_OACC_CLAUSE_VECTOR:
+	  c_name = "vector";
+	  clauses = c_parser_oacc_shape_clause (parser, c_kind, c_name,
+						clauses);
 	  break;
 	case PRAGMA_OACC_CLAUSE_VECTOR_LENGTH:
-	  clauses = c_parser_omp_clause_vector_length (parser, clauses);
 	  c_name = "vector_length";
+	  clauses = c_parser_omp_positive_int_clause (parser, c_kind, c_name,
+						      clauses);
 	  break;
 	case PRAGMA_OACC_CLAUSE_WAIT:
 	  clauses = c_parser_oacc_clause_wait (parser, clauses);
 	  c_name = "wait";
 	  break;
+	case PRAGMA_OACC_CLAUSE_WORKER:
+	  c_name = "worker";
+	  clauses = c_parser_oacc_shape_clause (parser, c_kind, c_name,
+						clauses);
+	  break;
 	default:
 	  c_parser_error (parser, "expected %<#pragma acc%> clause");
 	  goto saw_error;
@@ -11715,11 +12084,17 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
 	}
     }
 
+  if (!scan_dtype)
+    return clauses;
+
  saw_error:
   c_parser_skip_to_pragma_eol (parser);
 
   if (finish_p)
-    return c_finish_omp_clauses (clauses);
+    {
+      clauses = oacc_filter_device_types (clauses);
+      return c_finish_omp_clauses (clauses, true);
+    }
 
   return clauses;
 }
@@ -11790,8 +12165,9 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  c_name = "nowait";
 	  break;
 	case PRAGMA_OMP_CLAUSE_NUM_THREADS:
-	  clauses = c_parser_omp_clause_num_threads (parser, clauses);
 	  c_name = "num_threads";
+	  clauses = c_parser_omp_positive_int_clause (parser, c_kind, c_name,
+						      clauses);
 	  break;
 	case PRAGMA_OMP_CLAUSE_ORDERED:
 	  clauses = c_parser_omp_clause_ordered (parser, clauses);
@@ -11814,18 +12190,19 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  c_name = "shared";
 	  break;
 	case PRAGMA_OMP_CLAUSE_UNTIED:
-	  clauses = c_parser_omp_clause_untied (parser, clauses);
+	  clauses = c_parser_omp_simple_clause (parser, OMP_CLAUSE_UNTIED,
+						clauses);
 	  c_name = "untied";
 	  break;
 	case PRAGMA_OMP_CLAUSE_INBRANCH:
 	case PRAGMA_CILK_CLAUSE_MASK:
-	  clauses = c_parser_omp_clause_branch (parser, OMP_CLAUSE_INBRANCH,
+	  clauses = c_parser_omp_simple_clause (parser, OMP_CLAUSE_INBRANCH,
 						clauses);
 	  c_name = "inbranch";
 	  break;
 	case PRAGMA_OMP_CLAUSE_NOTINBRANCH:
 	case PRAGMA_CILK_CLAUSE_NOMASK:
-	  clauses = c_parser_omp_clause_branch (parser, OMP_CLAUSE_NOTINBRANCH,
+	  clauses = c_parser_omp_simple_clause (parser, OMP_CLAUSE_NOTINBRANCH,
 						clauses);
 	  c_name = "notinbranch";
 	  break;
@@ -11948,7 +12325,7 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
   c_parser_skip_to_pragma_eol (parser);
 
   if (finish_p)
-    return c_finish_omp_clauses (clauses);
+    return c_finish_omp_clauses (clauses, false);
 
   return clauses;
 }
@@ -11971,8 +12348,6 @@ c_parser_omp_structured_block (c_parser *parser)
 
 /* OpenACC 2.0:
    # pragma acc cache (variable-list) new-line
-
-   LOC is the location of the #pragma token.
 */
 
 static tree
@@ -11981,7 +12356,7 @@ c_parser_oacc_cache (location_t loc, c_parser *parser)
   tree stmt, clauses;
 
   clauses = c_parser_omp_var_list_parens (parser, OMP_CLAUSE__CACHE_, NULL);
-  clauses = c_finish_omp_clauses (clauses);
+  clauses = c_finish_omp_clauses (clauses, true);
 
   c_parser_skip_to_pragma_eol (parser);
 
@@ -11997,8 +12372,6 @@ c_parser_oacc_cache (location_t loc, c_parser *parser)
 /* OpenACC 2.0:
    # pragma acc data oacc-data-clause[optseq] new-line
      structured-block
-
-   LOC is the location of the #pragma token.
 */
 
 #define OACC_DATA_CLAUSE_MASK						\
@@ -12020,7 +12393,8 @@ c_parser_oacc_data (location_t loc, c_parser *parser)
   tree stmt, clauses, block;
 
   clauses = c_parser_oacc_all_clauses (parser, OACC_DATA_CLAUSE_MASK,
-				       "#pragma acc data");
+				       "#pragma acc data",
+				       OACC_DATA_CLAUSE_MASK);
 
   block = c_begin_omp_parallel ();
   add_stmt (c_parser_omp_structured_block (parser));
@@ -12031,57 +12405,190 @@ c_parser_oacc_data (location_t loc, c_parser *parser)
 }
 
 /* OpenACC 2.0:
-   # pragma acc kernels oacc-kernels-clause[optseq] new-line
-     structured-block
-
-   LOC is the location of the #pragma token.
+   # pragma acc declare oacc-data-clause[optseq] new-line
 */
 
-#define OACC_KERNELS_CLAUSE_MASK					\
-	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_ASYNC)		\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPY)		\
+#define OACC_DECLARE_CLAUSE_MASK					\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPY)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR)		\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF)			\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_LINK)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPY)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYIN)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYOUT)	\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_CREATE)	\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT) )
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_CREATE) )
+
+static void
+c_parser_oacc_declare (c_parser *parser)
+{
+  location_t pragma_loc = c_parser_peek_token (parser)->location;
+  tree clauses;
+
+  c_parser_consume_pragma (parser);
+
+  clauses = c_parser_oacc_all_clauses (parser, OACC_DECLARE_CLAUSE_MASK,
+				       "#pragma acc declare");
+  if (!clauses)
+    {
+      error_at (pragma_loc,
+		"no valid clauses specified in %<#pragma acc declare%>");
+      return;
+    }
+  for (tree t = clauses; t; t = OMP_CLAUSE_CHAIN (t))
+    {
+      location_t loc = OMP_CLAUSE_LOCATION (t);
+      tree decl = OMP_CLAUSE_DECL (t);
+      if (!DECL_P (decl))
+	{
+	  error_at (loc, "subarray in %<#pragma acc declare%>");
+	  continue;
+	}
+      gcc_assert (OMP_CLAUSE_CODE (t) == OMP_CLAUSE_MAP);
+      switch (OMP_CLAUSE_MAP_KIND (t))
+	{
+	case GOMP_MAP_FORCE_ALLOC:
+	case GOMP_MAP_FORCE_TO:
+	case GOMP_MAP_FORCE_DEVICEPTR:
+	case GOMP_MAP_DEVICE_RESIDENT:
+	  break;
+
+	case GOMP_MAP_POINTER:
+	  /* Generated by c_finish_omp_clauses from array sections;
+	     avoid spurious diagnostics.  */
+	  break;
+
+	case GOMP_MAP_LINK:
+	  if (!global_bindings_p () && !DECL_EXTERNAL (decl))
+	    {
+	      error_at (loc,
+			"invalid variable %qD in %<#pragma acc declare link%>",
+			decl);
+	      continue;
+	    }
+	  break;
+
+	default:
+	  if (global_bindings_p ())
+	    {
+	      error_at (loc, "invalid OpenACC clause at file scope");
+	      continue;
+	    }
+	  if (DECL_EXTERNAL (decl))
+	    {
+	      error_at (loc,
+			"invalid use of %<extern%> variable %qD "
+			"in %<#pragma acc declare%>", decl);
+	      continue;
+	    }
+	  break;
+	}
+
+      /* Store the clause in an attribute on the variable, at file
+	 scope, or the function, at block scope.  */
+      tree decl_for_attr;
+      if (global_bindings_p ())
+	{
+	  decl_for_attr = decl;
+	  tree prev_attr = lookup_attribute ("oacc declare",
+					     DECL_ATTRIBUTES (decl));
+	  if (prev_attr)
+	    {
+	      tree p = TREE_VALUE (prev_attr);
+	      error_at (loc,
+			"variable %qD used more than once with "
+			"%<#pragma acc declare%>", decl);
+	      inform (OMP_CLAUSE_LOCATION (TREE_VALUE (p)),
+		      "previous directive was here");
+	      continue;
+	    }
+	}
+      else
+	{
+	  bool ok = true;
+	  decl_for_attr = current_function_decl;
+	  tree prev_attr = lookup_attribute ("oacc declare",
+					     DECL_ATTRIBUTES (decl_for_attr));
+	  for (;
+	       prev_attr;
+	       prev_attr = lookup_attribute ("oacc declare",
+					     TREE_CHAIN (prev_attr)))
+	    {
+	      tree p = TREE_VALUE (prev_attr);
+	      tree cl = TREE_VALUE (p);
+	      if (OMP_CLAUSE_DECL (cl) == decl)
+		{
+		  error_at (loc,
+			    "variable %qD used more than once with "
+			    "%<#pragma acc declare%>", decl);
+		  inform (OMP_CLAUSE_LOCATION (cl),
+			  "previous directive was here");
+		  ok = false;
+		  break;
+		}
+	    }
+	  if (!ok)
+	    continue;
+	}
+      tree attr = tree_cons (NULL_TREE, t, NULL_TREE);
+      tree attrs = tree_cons (get_identifier ("oacc declare"),
+			      attr, NULL_TREE);
+      decl_attributes (&decl_for_attr, attrs, 0);
+    }
+}
+
+/* Split the 'clauses' into a set of 'loop' clauses and a set of
+   'not-loop' clauses.  */
 
 static tree
-c_parser_oacc_kernels (location_t loc, c_parser *parser, char *p_name)
+oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses)
 {
-  tree stmt, clauses = NULL_TREE, block;
+  tree loop_clauses, next, c;
 
-  strcat (p_name, " kernels");
+  loop_clauses = *not_loop_clauses = NULL_TREE;
 
-  if (c_parser_next_token_is (parser, CPP_NAME))
+  for (; clauses ; clauses = next)
     {
-      const char *p = IDENTIFIER_POINTER (c_parser_peek_token (parser)->value);
-      if (strcmp (p, "loop") == 0)
-	{
-	  c_parser_consume_token (parser);
-	  block = c_begin_omp_parallel ();
-	  c_parser_oacc_loop (loc, parser, p_name);
-	  stmt = c_finish_oacc_kernels (loc, clauses, block);
-	  OACC_KERNELS_COMBINED (stmt) = 1;
-	  return stmt;
+      next = OMP_CLAUSE_CHAIN (clauses);
+
+      switch (OMP_CLAUSE_CODE (clauses))
+        {
+	case OMP_CLAUSE_COLLAPSE:
+	case OMP_CLAUSE_REDUCTION:
+	case OMP_CLAUSE_GANG:
+	case OMP_CLAUSE_VECTOR:
+	case OMP_CLAUSE_WORKER:
+	case OMP_CLAUSE_AUTO:
+	case OMP_CLAUSE_SEQ:
+	  OMP_CLAUSE_CHAIN (clauses) = loop_clauses;
+	  loop_clauses = clauses;
+	  break;
+
+	case OMP_CLAUSE_PRIVATE:
+	  c = build_omp_clause (OMP_CLAUSE_LOCATION (clauses),
+			        OMP_CLAUSE_CODE (clauses));
+          OMP_CLAUSE_DECL (c) = OMP_CLAUSE_DECL (clauses);
+	  OMP_CLAUSE_CHAIN (c) = loop_clauses;
+	  loop_clauses = c;
+	  /* FALL THROUGH  */
+
+	default:
+	  OMP_CLAUSE_CHAIN (clauses) = *not_loop_clauses;
+	  *not_loop_clauses = clauses;
+	  break;
 	}
     }
 
-  clauses =  c_parser_oacc_all_clauses (parser, OACC_KERNELS_CLAUSE_MASK,
-					p_name);
+  if (*not_loop_clauses)
+    c_finish_omp_clauses (*not_loop_clauses, true);
 
-  block = c_begin_omp_parallel ();
-  add_stmt (c_parser_omp_structured_block (parser));
+  if (loop_clauses)
+    c_finish_omp_clauses (loop_clauses, true);
 
-  stmt = c_finish_oacc_kernels (loc, clauses, block);
-
-  return stmt;
+  return loop_clauses;
 }
 
 /* OpenACC 2.0:
@@ -12090,9 +12597,6 @@ c_parser_oacc_kernels (location_t loc, c_parser *parser, char *p_name)
    or
 
    # pragma acc exit data oacc-exit-data-clause[optseq] new-line
-
-
-   LOC is the location of the #pragma token.
 */
 
 #define OACC_ENTER_DATA_CLAUSE_MASK					\
@@ -12116,28 +12620,26 @@ c_parser_oacc_enter_exit_data (c_parser *parser, bool enter)
 {
   location_t loc = c_parser_peek_token (parser)->location;
   tree clauses, stmt;
+  const char *p = "";
 
   c_parser_consume_pragma (parser);
 
-  if (!c_parser_next_token_is (parser, CPP_NAME))
+  if (c_parser_next_token_is (parser, CPP_NAME))
     {
-      c_parser_error (parser, enter
-		      ? "expected %<data%> in %<#pragma acc enter data%>"
-		      : "expected %<data%> in %<#pragma acc exit data%>");
-      c_parser_skip_to_pragma_eol (parser);
-      return;
+      p = IDENTIFIER_POINTER (c_parser_peek_token (parser)->value);
+      c_parser_consume_token (parser);
     }
 
-  const char *p = IDENTIFIER_POINTER (c_parser_peek_token (parser)->value);
   if (strcmp (p, "data") != 0)
     {
-      c_parser_error (parser, "invalid pragma");
+      error_at (loc, enter
+		? "expected %<data%> after %<#pragma acc enter%>"
+		: "expected %<data%> after %<#pragma acc exit%>");
+      parser->error = true;
       c_parser_skip_to_pragma_eol (parser);
       return;
     }
 
-  c_parser_consume_token (parser);
-
   if (enter)
     clauses = c_parser_oacc_all_clauses (parser, OACC_ENTER_DATA_CLAUSE_MASK,
 					 "#pragma acc enter data");
@@ -12160,27 +12662,72 @@ c_parser_oacc_enter_exit_data (c_parser *parser, bool enter)
   add_stmt (stmt);
 }
 
+/* OpenACC 2.0:
+   # pragma acc host_data oacc-data-clause[optseq] new-line
+     structured-block
+*/
+
+#define OACC_HOST_DATA_CLAUSE_MASK					\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_USE_DEVICE) )
+
+static tree
+c_parser_oacc_host_data (location_t loc, c_parser *parser)
+{
+  tree stmt, clauses, block;
+
+  clauses = c_parser_oacc_all_clauses (parser, OACC_HOST_DATA_CLAUSE_MASK,
+				       "#pragma acc host_data");
+
+  block = c_begin_omp_parallel ();
+  add_stmt (c_parser_omp_structured_block (parser));
+  stmt = c_finish_oacc_host_data (loc, clauses, block);
+  return stmt;
+}
+
 
 /* OpenACC 2.0:
 
    # pragma acc loop oacc-loop-clause[optseq] new-line
      structured-block
-
-   LOC is the location of the #pragma token.
 */
 
 #define OACC_LOOP_CLAUSE_MASK						\
 	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COLLAPSE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICE_TYPE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_GANG)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WORKER)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_VECTOR)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_AUTO)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_INDEPENDENT) 	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_SEQ)			\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_TILE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRIVATE)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_REDUCTION) )
 
+#define OACC_LOOP_CLAUSE_DEVICE_TYPE_MASK				\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COLLAPSE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_GANG)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WORKER)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_VECTOR)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_AUTO)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_INDEPENDENT) 	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_SEQ)			\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_TILE) )
+
 static tree
-c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name)
+c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name,
+		    omp_clause_mask mask, tree *cclauses)
 {
   tree stmt, clauses, block;
 
   strcat (p_name, " loop");
+  mask |= OACC_LOOP_CLAUSE_MASK;
 
-  clauses = c_parser_oacc_all_clauses (parser, OACC_LOOP_CLAUSE_MASK, p_name);
+  clauses = c_parser_oacc_all_clauses (parser, mask, p_name,
+				       OACC_LOOP_CLAUSE_DEVICE_TYPE_MASK,
+				       cclauses == NULL);
+  if (cclauses)
+    clauses = oacc_split_loop_clauses (clauses, cclauses);
 
   block = c_begin_compound_stmt (true);
   stmt = c_parser_omp_for_loop (loc, parser, OACC_LOOP, clauses, NULL);
@@ -12191,10 +12738,68 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name)
 }
 
 /* OpenACC 2.0:
+   # pragma acc kernels oacc-kernels-clause[optseq] new-line
+     structured-block
+*/
+
+#define OACC_KERNELS_CLAUSE_MASK					\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_ASYNC)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPY)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEFAULT)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICE_TYPE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF)			\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPY)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYIN)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYOUT)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_CREATE)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT) )
+
+#define OACC_KERNELS_CLAUSE_DEVICE_TYPE_MASK				\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_ASYNC)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT) )
+
+static tree
+c_parser_oacc_kernels (location_t loc, c_parser *parser, char *p_name)
+{
+  tree stmt, clauses, block;
+  omp_clause_mask mask;
+
+  strcat (p_name, " kernels");
+
+  mask = OACC_KERNELS_CLAUSE_MASK;
+  if (c_parser_next_token_is (parser, CPP_NAME))
+    {
+      stmt = c_parser_peek_token (parser)->value;
+      if (!strcmp ("loop", IDENTIFIER_POINTER (stmt)))
+	{
+	  tree kernel_clauses;
+
+	  c_parser_consume_token (parser);
+	  mask |= OACC_LOOP_CLAUSE_MASK;
+	  block = c_begin_omp_parallel ();
+	  c_parser_oacc_loop (loc, parser, p_name, mask, &kernel_clauses);
+	  stmt = c_finish_oacc_kernels (loc, kernel_clauses, block);
+	  return stmt;
+	}
+    }
+
+  clauses = c_parser_oacc_all_clauses (parser, mask, p_name,
+				       OACC_KERNELS_CLAUSE_DEVICE_TYPE_MASK);
+
+  block = c_begin_omp_parallel ();
+  add_stmt (c_parser_omp_structured_block (parser));
+  stmt = c_finish_oacc_kernels (loc, clauses, block);
+  return stmt;
+}
+
+/* OpenACC 2.0:
    # pragma acc parallel oacc-parallel-clause[optseq] new-line
      structured-block
-
-   LOC is the location of the #pragma token.
 */
 
 #define OACC_PARALLEL_CLAUSE_MASK					\
@@ -12203,8 +12808,11 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name)
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEFAULT)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICE_TYPE)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF)			\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRIVATE)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_NUM_GANGS)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_NUM_WORKERS)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT)		\
@@ -12216,48 +12824,227 @@ c_parser_oacc_loop (location_t loc, c_parser *parser, char *p_name)
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_VECTOR_LENGTH)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT) )
 
+#define OACC_PARALLEL_CLAUSE_DEVICE_TYPE_MASK				\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_ASYNC)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_NUM_GANGS)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_NUM_WORKERS)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_VECTOR_LENGTH)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT) )
+
 static tree
 c_parser_oacc_parallel (location_t loc, c_parser *parser, char *p_name)
 {
-  tree stmt, clauses = NULL_TREE, block;
+  tree stmt, clauses, block;
+  omp_clause_mask mask, dmask;
 
   strcat (p_name, " parallel");
 
+  mask = OACC_PARALLEL_CLAUSE_MASK;
+  dmask = OACC_PARALLEL_CLAUSE_DEVICE_TYPE_MASK;
   if (c_parser_next_token_is (parser, CPP_NAME))
     {
-      const char *p = IDENTIFIER_POINTER (c_parser_peek_token (parser)->value);
-      if (strcmp (p, "loop") == 0)
+      stmt = c_parser_peek_token (parser)->value;
+      if (!strcmp ("loop", IDENTIFIER_POINTER (stmt)))
 	{
+	  tree parallel_clauses;
+
 	  c_parser_consume_token (parser);
+	  mask |= OACC_LOOP_CLAUSE_MASK;
 	  block = c_begin_omp_parallel ();
-	  c_parser_oacc_loop (loc, parser, p_name);
-	  stmt = c_finish_oacc_parallel (loc, clauses, block);
-	  OACC_PARALLEL_COMBINED (stmt) = 1;
+	  c_parser_oacc_loop (loc, parser, p_name, mask, &parallel_clauses);
+	  stmt = c_finish_oacc_parallel (loc, parallel_clauses, block);
 	  return stmt;
 	}
     }
 
-  clauses =  c_parser_oacc_all_clauses (parser, OACC_PARALLEL_CLAUSE_MASK,
-					p_name);
+  clauses = c_parser_oacc_all_clauses (parser, mask, p_name, dmask);
 
   block = c_begin_omp_parallel ();
   add_stmt (c_parser_omp_structured_block (parser));
-
   stmt = c_finish_oacc_parallel (loc, clauses, block);
-
   return stmt;
 }
 
 /* OpenACC 2.0:
+   # pragma acc routine oacc-routine-clause[optseq] new-line
+     function-definition
+
+   # pragma acc routine ( name ) oacc-routine-clause[optseq] new-line
+*/
+
+#define OACC_ROUTINE_CLAUSE_MASK					\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICE_TYPE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_GANG)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WORKER)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_VECTOR)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_SEQ)			\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_BIND)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_NOHOST))
+
+#define OACC_ROUTINE_CLAUSE_DEVICE_TYPE_MASK				\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_GANG)	       	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WORKER)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_VECTOR)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_SEQ)			\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_BIND))
+
+static void
+c_parser_oacc_routine (c_parser *parser, enum pragma_context context)
+{
+  tree name = NULL_TREE;
+  location_t here = c_parser_peek_token (parser)->location;
+
+  c_parser_consume_pragma (parser);
+
+  /* Scan for optional '( name )'.  */
+  if (c_parser_peek_token (parser)->type == CPP_OPEN_PAREN)
+    {
+      c_parser_consume_token (parser);
+
+      if (c_parser_next_token_is_not (parser, CPP_NAME)
+	  || c_parser_peek_token (parser)->id_kind != C_ID_ID)
+	c_parser_error (parser, "expected identifier");
+
+      // name should be an IDENTIFIER_NODE
+      name = c_parser_peek_token (parser)->value;
+
+      if (name == NULL_TREE)
+	{
+	  undeclared_variable (c_parser_peek_token (parser)->location,
+			       c_parser_peek_token (parser)->value);
+	  name = error_mark_node;
+	}
+
+      c_parser_consume_token (parser);
+
+      if (name == error_mark_node)
+	return;
+
+      c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, 0);
+    }
+
+  /* Build a chain of clauses.  */
+  parser->in_pragma = true;
+  tree clauses = NULL_TREE;
+  clauses = c_parser_oacc_all_clauses (parser, OACC_ROUTINE_CLAUSE_MASK,
+				       "#pragma acc routine",
+				       OACC_ROUTINE_CLAUSE_DEVICE_TYPE_MASK);
+
+  /* Check of the presence if gang, worker, vector and seq clauses, and
+     throw an error if more than one of those clauses is specified.  */
+  int parallelism = 0;
+  tree c;
+
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    switch (OMP_CLAUSE_CODE (c))
+      {
+      case OMP_CLAUSE_GANG:
+      case OMP_CLAUSE_WORKER:
+      case OMP_CLAUSE_VECTOR:
+      case OMP_CLAUSE_SEQ:
+	++parallelism;
+	break;
+      default:
+	break;
+      }
+
+  if (parallelism > 1)
+    {
+      error_at (here, "invalid combination of gang, worker, vector or seq for"
+		"%<#pragma acc routine%>");
+    }
+
+  if (name)
+    {
+      TREE_CHAIN (name) = clauses;
+      vec_safe_push (parser->oacc_routines, name);
+    }
+  else
+    {
+      if (context != pragma_external)
+	{
+	  c_parser_error (parser, "%<#pragma acc routine%> must be "
+			  "followed by function declaration or definition");
+	  return;
+	}
+
+      if (c_parser_next_token_is (parser, CPP_KEYWORD)
+	  && c_parser_peek_token (parser)->keyword == RID_EXTENSION)
+	{
+	  int ext = disable_extension_diagnostics ();
+	  do
+	    c_parser_consume_token (parser);
+	  while (c_parser_next_token_is (parser, CPP_KEYWORD)
+		 && c_parser_peek_token (parser)->keyword
+		 == RID_EXTENSION);
+	  c_parser_declaration_or_fndef (parser, true, true, true, false,
+					 true, NULL, vNULL, clauses, true);
+	  restore_extension_diagnostics (ext);
+	}
+      else
+	c_parser_declaration_or_fndef (parser, true, true, true, false,
+				       true, NULL, vNULL, clauses, true);
+    }
+}
+
+static void
+c_finish_oacc_routine (c_parser *parser, tree fndecl, tree clauses,
+			  bool named)
+{
+  if (fndecl == NULL_TREE || TREE_CODE (fndecl) != FUNCTION_DECL)
+    {
+      if (!named)
+	return;
+
+      error ("%<#pragma acc routine%> not immediately followed by "
+	     "a function declaration or definition");
+      gcc_unreachable();
+      return;
+    }
+
+  if (!named)
+    {
+      bool found = false;
+      int i;
+      tree t;
+
+      for (i = 0; vec_safe_iterate (parser->oacc_routines, i, &t); i++)
+	{
+	  if (!strcmp (IDENTIFIER_POINTER (DECL_NAME (fndecl)),
+		       IDENTIFIER_POINTER (t)))
+	    {
+	      found = true;
+	      clauses = TREE_CHAIN (t);
+	      break;
+	    }
+	}
+
+      if (!found)
+	return;
+    }
+
+  if (clauses != NULL_TREE)
+    clauses = tree_cons (NULL_TREE, clauses, NULL_TREE);
+  clauses = build_tree_list (get_identifier ("omp declare target"),
+			     clauses);
+  TREE_CHAIN (clauses) = DECL_ATTRIBUTES (fndecl);
+  DECL_ATTRIBUTES (fndecl) = clauses;
+}
+
+/* OpenACC 2.0:
    # pragma acc update oacc-update-clause[optseq] new-line
 */
 
 #define OACC_UPDATE_CLAUSE_MASK						\
 	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_ASYNC)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICE_TYPE)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_HOST)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF)			\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_SELF)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT) )
+
+#define OACC_UPDATE_CLAUSE_DEVICE_TYPE_MASK				\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_ASYNC)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT) )
 
 static void
@@ -12268,7 +13055,8 @@ c_parser_oacc_update (c_parser *parser)
   c_parser_consume_pragma (parser);
 
   tree clauses = c_parser_oacc_all_clauses (parser, OACC_UPDATE_CLAUSE_MASK,
-					    "#pragma acc update");
+					    "#pragma acc update",
+			       	        OACC_UPDATE_CLAUSE_DEVICE_TYPE_MASK);
   if (find_omp_clause (clauses, OMP_CLAUSE_MAP) == NULL_TREE)
     {
       error_at (loc,
@@ -12289,8 +13077,6 @@ c_parser_oacc_update (c_parser *parser)
 
 /* OpenACC 2.0:
    # pragma acc wait [(intseq)] oacc-wait-clause[optseq] new-line
-
-   LOC is the location of the #pragma token.
 */
 
 #define OACC_WAIT_CLAUSE_MASK						\
@@ -12844,7 +13630,7 @@ c_parser_omp_for_loop (location_t loc, c_parser *parser, enum tree_code code,
 	  if (i > 0)
 	    vec_safe_push (for_block, c_begin_compound_stmt (true));
 	  c_parser_declaration_or_fndef (parser, true, true, true, true, true,
-					 NULL, vNULL);
+					 NULL, vNULL, NULL_TREE, false);
 	  decl = check_for_loop_decls (for_loc, flag_isoc99);
 	  if (decl == NULL)
 	    goto error_init;
@@ -13115,7 +13901,7 @@ omp_split_clauses (location_t loc, enum tree_code code,
   c_omp_split_clauses (loc, code, mask, clauses, cclauses);
   for (i = 0; i < C_OMP_CLAUSE_SPLIT_COUNT; i++)
     if (cclauses[i])
-      cclauses[i] = c_finish_omp_clauses (cclauses[i]);
+      cclauses[i] = c_finish_omp_clauses (cclauses[i], false);
 }
 
 /* OpenMP 4.0:
@@ -14032,12 +14818,12 @@ c_parser_omp_declare_simd (c_parser *parser, enum pragma_context context)
 	  while (c_parser_next_token_is (parser, CPP_KEYWORD)
 		 && c_parser_peek_token (parser)->keyword == RID_EXTENSION);
 	  c_parser_declaration_or_fndef (parser, true, true, true, false, true,
-					 NULL, clauses);
+					 NULL, clauses, NULL_TREE, false);
 	  restore_extension_diagnostics (ext);
 	}
       else
 	c_parser_declaration_or_fndef (parser, true, true, true, false, true,
-				       NULL, clauses);
+				       NULL, clauses, NULL_TREE, false);
       break;
     case pragma_struct:
     case pragma_param:
@@ -14057,7 +14843,8 @@ c_parser_omp_declare_simd (c_parser *parser, enum pragma_context context)
 	  if (c_parser_next_tokens_start_declaration (parser))
 	    {
 	      c_parser_declaration_or_fndef (parser, true, true, true, true,
-					     true, NULL, clauses);
+					     true, NULL, clauses, NULL_TREE,
+					     false);
 	      restore_extension_diagnostics (ext);
 	      break;
 	    }
@@ -14066,7 +14853,7 @@ c_parser_omp_declare_simd (c_parser *parser, enum pragma_context context)
       else if (c_parser_next_tokens_start_declaration (parser))
 	{
 	  c_parser_declaration_or_fndef (parser, true, true, true, true, true,
-					 NULL, clauses);
+					 NULL, clauses, NULL_TREE, false);
 	  break;
 	}
       c_parser_error (parser, "%<#pragma omp declare simd%> must be followed by "
@@ -14634,6 +15421,9 @@ c_parser_omp_construct (c_parser *parser)
 
   switch (p_kind)
     {
+    case PRAGMA_OACC_ATOMIC:
+      c_parser_omp_atomic (loc, parser);
+      return;
     case PRAGMA_OACC_CACHE:
       strcpy (p_name, "#pragma acc");
       stmt = c_parser_oacc_cache (loc, parser);
@@ -14641,13 +15431,16 @@ c_parser_omp_construct (c_parser *parser)
     case PRAGMA_OACC_DATA:
       stmt = c_parser_oacc_data (loc, parser);
       break;
+    case PRAGMA_OACC_HOST_DATA:
+      stmt = c_parser_oacc_host_data (loc, parser);
+      break;
     case PRAGMA_OACC_KERNELS:
       strcpy (p_name, "#pragma acc");
       stmt = c_parser_oacc_kernels (loc, parser, p_name);
       break;
     case PRAGMA_OACC_LOOP:
       strcpy (p_name, "#pragma acc");
-      stmt = c_parser_oacc_loop (loc, parser, p_name);
+      stmt = c_parser_oacc_loop (loc, parser, p_name, mask, NULL);
       break;
     case PRAGMA_OACC_PARALLEL:
       strcpy (p_name, "#pragma acc");
@@ -15100,7 +15893,7 @@ c_parser_cilk_for (c_parser *parser, tree grain)
   tree clauses = build_omp_clause (EXPR_LOCATION (grain), OMP_CLAUSE_SCHEDULE);
   OMP_CLAUSE_SCHEDULE_KIND (clauses) = OMP_CLAUSE_SCHEDULE_CILKFOR;
   OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (clauses) = grain;
-  clauses = c_finish_omp_clauses (clauses);
+  clauses = c_finish_omp_clauses (clauses, false);
 
   tree block = c_begin_compound_stmt (true);
   tree sb = push_stmt_list ();
@@ -15165,7 +15958,7 @@ c_parser_cilk_for (c_parser *parser, tree grain)
       OMP_CLAUSE_OPERAND (c, 0)
 	= cilk_for_number_of_iterations (omp_for);
       OMP_CLAUSE_CHAIN (c) = clauses;
-      OMP_PARALLEL_CLAUSES (omp_par) = c_finish_omp_clauses (c);
+      OMP_PARALLEL_CLAUSES (omp_par) = c_finish_omp_clauses (c, false);
       add_stmt (omp_par);
     }
 
@@ -15408,6 +16201,8 @@ c_parse_file (void)
   if (tparser.tokens == &tparser.tokens_buf[0])
     the_parser->tokens = &the_parser->tokens_buf[0];
 
+  the_parser->oacc_routines = NULL;
+
   /* Initialize EH, if we've been told to do so.  */
   if (flag_exceptions)
     using_eh_for_cleanups ();
diff --git gcc/c/c-tree.h gcc/c/c-tree.h
index 7a72665..8750bd7 100644
--- gcc/c/c-tree.h
+++ gcc/c/c-tree.h
@@ -643,13 +643,14 @@ extern tree c_expr_to_decl (tree, bool *, bool *);
 extern tree c_finish_oacc_parallel (location_t, tree, tree);
 extern tree c_finish_oacc_kernels (location_t, tree, tree);
 extern tree c_finish_oacc_data (location_t, tree, tree);
+extern tree c_finish_oacc_host_data (location_t, tree, tree);
 extern tree c_begin_omp_parallel (void);
 extern tree c_finish_omp_parallel (location_t, tree, tree);
 extern tree c_begin_omp_task (void);
 extern tree c_finish_omp_task (location_t, tree, tree);
 extern void c_finish_omp_cancel (location_t, tree);
 extern void c_finish_omp_cancellation_point (location_t, tree);
-extern tree c_finish_omp_clauses (tree);
+extern tree c_finish_omp_clauses (tree, bool);
 extern tree c_build_va_arg (location_t, tree, tree);
 extern tree c_finish_transaction (location_t, tree, int);
 extern bool c_tree_equal (tree, tree);
diff --git gcc/c/c-typeck.c gcc/c/c-typeck.c
index 91735b5..e27a1c7 100644
--- gcc/c/c-typeck.c
+++ gcc/c/c-typeck.c
@@ -11449,6 +11449,25 @@ c_finish_oacc_data (location_t loc, tree clauses, tree block)
   return add_stmt (stmt);
 }
 
+/* Generate OACC_HOST_DATA, with CLAUSES and BLOCK as its compound
+   statement.  LOC is the location of the OACC_HOST_DATA.  */
+
+tree
+c_finish_oacc_host_data (location_t loc, tree clauses, tree block)
+{
+  tree stmt;
+
+  block = c_end_compound_stmt (loc, block, true);
+
+  stmt = make_node (OACC_HOST_DATA);
+  TREE_TYPE (stmt) = void_type_node;
+  OACC_HOST_DATA_CLAUSES (stmt) = clauses;
+  OACC_HOST_DATA_BODY (stmt) = block;
+  SET_EXPR_LOCATION (stmt, loc);
+
+  return add_stmt (stmt);
+}
+
 /* Like c_begin_compound_stmt, except force the retention of the BLOCK.  */
 
 tree
@@ -12048,13 +12067,14 @@ c_find_omp_placeholder_r (tree *tp, int *, void *data)
    Remove any elements from the list that are invalid.  */
 
 tree
-c_finish_omp_clauses (tree clauses)
+c_finish_omp_clauses (tree clauses, bool oacc)
 {
   bitmap_head generic_head, firstprivate_head, lastprivate_head;
-  bitmap_head aligned_head;
+  bitmap_head aligned_head, oacc_data_head;
   tree c, t, *pc;
   bool branch_seen = false;
   bool copyprivate_seen = false;
+  bool oacc_data = false;
   tree *nowait_clause = NULL;
 
   bitmap_obstack_initialize (NULL);
@@ -12062,6 +12082,7 @@ c_finish_omp_clauses (tree clauses)
   bitmap_initialize (&firstprivate_head, &bitmap_default_obstack);
   bitmap_initialize (&lastprivate_head, &bitmap_default_obstack);
   bitmap_initialize (&aligned_head, &bitmap_default_obstack);
+  bitmap_initialize (&oacc_data_head, &bitmap_default_obstack);
 
   for (pc = &clauses, c = clauses; c ; c = *pc)
     {
@@ -12077,11 +12098,16 @@ c_finish_omp_clauses (tree clauses)
 
 	case OMP_CLAUSE_PRIVATE:
 	  need_complete = true;
+	  oacc_data = true;
 	  need_implicitly_determined = true;
-	  goto check_dup_generic;
+	  if (oacc)
+	    goto check_dup_oacc;
+	  else
+	    goto check_dup_generic;
 
 	case OMP_CLAUSE_REDUCTION:
 	  need_implicitly_determined = true;
+	  oacc_data = false;
 	  t = OMP_CLAUSE_DECL (c);
 	  if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c) == NULL_TREE
 	      && (FLOAT_TYPE_P (TREE_TYPE (t))
@@ -12201,7 +12227,10 @@ c_finish_omp_clauses (tree clauses)
 			       OMP_CLAUSE_REDUCTION_INIT (c), NULL_TREE);
 	      TREE_SIDE_EFFECTS (OMP_CLAUSE_REDUCTION_INIT (c)) = 1;
 	    }
-	  goto check_dup_generic;
+	  if (oacc)
+	    goto check_dup_oacc;
+	  else
+	    goto check_dup_generic;
 
 	case OMP_CLAUSE_COPYPRIVATE:
 	  copyprivate_seen = true;
@@ -12262,9 +12291,9 @@ c_finish_omp_clauses (tree clauses)
 			omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
 	      remove = true;
 	    }
-	  else if (bitmap_bit_p (&generic_head, DECL_UID (t))
-		   || bitmap_bit_p (&firstprivate_head, DECL_UID (t))
-		   || bitmap_bit_p (&lastprivate_head, DECL_UID (t)))
+	  if (bitmap_bit_p (&generic_head, DECL_UID (t))
+	      || bitmap_bit_p (&firstprivate_head, DECL_UID (t))
+	      || bitmap_bit_p (&lastprivate_head, DECL_UID (t)))
 	    {
 	      error_at (OMP_CLAUSE_LOCATION (c),
 			"%qE appears more than once in data clauses", t);
@@ -12274,6 +12303,39 @@ c_finish_omp_clauses (tree clauses)
 	    bitmap_set_bit (&generic_head, DECL_UID (t));
 	  break;
 
+	check_dup_oacc:
+	  t = OMP_CLAUSE_DECL (c);
+	  if (TREE_CODE (t) != VAR_DECL && TREE_CODE (t) != PARM_DECL)
+	    {
+	      error_at (OMP_CLAUSE_LOCATION (c),
+			"%qE is not a variable in clause %qs", t,
+			omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
+	      remove = true;
+	    }
+	  if (oacc_data)
+	    {
+	      if (bitmap_bit_p (&oacc_data_head, DECL_UID (t)))
+		{
+		  error_at (OMP_CLAUSE_LOCATION (c),
+			    "%qE appears more than once in data clauses", t);
+		  remove = true;
+		}
+	      else
+		bitmap_set_bit (&oacc_data_head, DECL_UID (t));
+	    }
+	  else
+	    {
+	      if (bitmap_bit_p (&generic_head, DECL_UID (t)))
+		{
+		  error_at (OMP_CLAUSE_LOCATION (c),
+			    "%qE appears more than once in data clauses", t);
+		  remove = true;
+		}
+	      else
+		bitmap_set_bit (&generic_head, DECL_UID (t));
+	    }
+	  break;
+
 	case OMP_CLAUSE_FIRSTPRIVATE:
 	  t = OMP_CLAUSE_DECL (c);
 	  need_complete = true;
@@ -12284,15 +12346,29 @@ c_finish_omp_clauses (tree clauses)
 			"%qE is not a variable in clause %<firstprivate%>", t);
 	      remove = true;
 	    }
-	  else if (bitmap_bit_p (&generic_head, DECL_UID (t))
-		   || bitmap_bit_p (&firstprivate_head, DECL_UID (t)))
+	  else if (oacc)
 	    {
-	      error_at (OMP_CLAUSE_LOCATION (c),
-			"%qE appears more than once in data clauses", t);
-	      remove = true;
+	      if (bitmap_bit_p (&oacc_data_head, DECL_UID (t)))
+		{
+		  error_at (OMP_CLAUSE_LOCATION (c),
+			    "%qE appears more than once in data clauses", t);
+		  remove = true;
+		}
+	      else
+		bitmap_set_bit (&oacc_data_head, DECL_UID (t));
 	    }
 	  else
-	    bitmap_set_bit (&firstprivate_head, DECL_UID (t));
+	    {
+	      if (bitmap_bit_p (&generic_head, DECL_UID (t))
+		  || bitmap_bit_p (&firstprivate_head, DECL_UID (t)))
+		{
+		  error_at (OMP_CLAUSE_LOCATION (c),
+			    "%qE appears more than once in data clauses", t);
+		  remove = true;
+		}
+	      else
+		bitmap_set_bit (&firstprivate_head, DECL_UID (t));
+	    }
 	  break;
 
 	case OMP_CLAUSE_LASTPRIVATE:
@@ -12415,7 +12491,8 @@ c_finish_omp_clauses (tree clauses)
 			omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
 	      remove = true;
 	    }
-	  else if (bitmap_bit_p (&generic_head, DECL_UID (t)))
+	  if ((oacc && bitmap_bit_p (&oacc_data_head, DECL_UID (t)))
+	      || bitmap_bit_p (&generic_head, DECL_UID (t)))
 	    {
 	      if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_MAP)
 		error ("%qD appears more than once in motion clauses", t);
@@ -12423,6 +12500,8 @@ c_finish_omp_clauses (tree clauses)
 		error ("%qD appears more than once in map clauses", t);
 	      remove = true;
 	    }
+	  else if (oacc)
+	    bitmap_set_bit (&oacc_data_head, DECL_UID (t));
 	  else
 	    bitmap_set_bit (&generic_head, DECL_UID (t));
 	  break;
@@ -12482,10 +12561,15 @@ c_finish_omp_clauses (tree clauses)
 	case OMP_CLAUSE_ASYNC:
 	case OMP_CLAUSE_WAIT:
 	case OMP_CLAUSE_AUTO:
+	case OMP_CLAUSE_INDEPENDENT:
 	case OMP_CLAUSE_SEQ:
 	case OMP_CLAUSE_GANG:
 	case OMP_CLAUSE_WORKER:
 	case OMP_CLAUSE_VECTOR:
+	case OMP_CLAUSE_USE_DEVICE:
+	case OMP_CLAUSE_BIND:
+	case OMP_CLAUSE_NOHOST:
+	case OMP_CLAUSE_TILE:
 	  pc = &OMP_CLAUSE_CHAIN (c);
 	  continue;
 
diff --git gcc/cp/cp-gimplify.c gcc/cp/cp-gimplify.c
index 70645b5..569733c 100644
--- gcc/cp/cp-gimplify.c
+++ gcc/cp/cp-gimplify.c
@@ -1533,7 +1533,8 @@ cxx_omp_clause_default_ctor (tree clause, tree decl, tree /*outer*/)
 tree
 cxx_omp_clause_copy_ctor (tree clause, tree dst, tree src)
 {
-  tree info = CP_OMP_CLAUSE_INFO (clause);
+  tree info = OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_MAP ? NULL
+    : CP_OMP_CLAUSE_INFO (clause);
   tree ret = NULL;
 
   if (info)
diff --git gcc/cp/cp-tree.h gcc/cp/cp-tree.h
index 2a904a5..251ed38 100644
--- gcc/cp/cp-tree.h
+++ gcc/cp/cp-tree.h
@@ -5986,11 +5986,12 @@ extern void note_decl_for_pch			(tree);
 extern tree omp_reduction_id			(enum tree_code, tree, tree);
 extern tree cp_remove_omp_priv_cleanup_stmt	(tree *, int *, void *);
 extern void cp_check_omp_declare_reduction	(tree);
-extern tree finish_omp_clauses			(tree);
+extern tree finish_omp_clauses			(tree, bool);
 extern void finish_omp_threadprivate		(tree);
 extern tree begin_omp_structured_block		(void);
 extern tree finish_omp_structured_block		(tree);
 extern tree finish_oacc_data			(tree, tree);
+extern tree finish_oacc_host_data		(tree, tree);
 extern tree finish_oacc_kernels			(tree, tree);
 extern tree finish_oacc_parallel		(tree, tree);
 extern tree begin_omp_parallel			(void);
diff --git gcc/cp/parser.c gcc/cp/parser.c
index cfb512b..6e177f6 100644
--- gcc/cp/parser.c
+++ gcc/cp/parser.c
@@ -1303,7 +1303,8 @@ cp_token_cache_new (cp_token *first, cp_token *last)
 }
 
 /* Diagnose if #pragma omp declare simd isn't followed immediately
-   by function declaration or definition.  */
+   by function declaration or definition.  Likewise for
+   #pragma acc routine.  */
 
 static inline void
 cp_ensure_no_omp_declare_simd (cp_parser *parser)
@@ -1314,6 +1315,13 @@ cp_ensure_no_omp_declare_simd (cp_parser *parser)
 	     "function declaration or definition");
       parser->omp_declare_simd = NULL;
     }
+
+  if (parser->oacc_routine && !parser->oacc_routine->error_seen)
+    {
+      error ("%<#pragma acc routine%> not immediately followed by "
+	     "function declaration or definition");
+      parser->oacc_routine = NULL;
+    }
 }
 
 /* Finalize #pragma omp declare simd clauses after FNDECL has been parsed,
@@ -1336,6 +1344,58 @@ cp_finalize_omp_declare_simd (cp_parser *parser, tree fndecl)
 	}
     }
 }
+
+/* Finalize #pragma acc routine clauses after FNDECL has been parsed,
+   and put that into "acc routine" attribute.  */
+
+static inline void
+cp_finalize_oacc_routine (cp_parser *parser, tree fndecl)
+{
+  if (__builtin_expect (parser->omp_declare_simd != NULL, 0))
+    {
+      if (fndecl == error_mark_node)
+	{
+	  parser->omp_declare_simd = NULL;
+	  return;
+	}
+      if (TREE_CODE (fndecl) != FUNCTION_DECL)
+	{
+	  cp_ensure_no_omp_declare_simd (parser);
+	  return;
+	}
+    }
+  else // Is this fndecl associated with a named routine?
+    {
+      if (fndecl == NULL_TREE || fndecl == error_mark_node
+	  || TREE_CODE (fndecl) != FUNCTION_DECL)
+	return;
+
+      bool found = false;
+      int i;
+      tree t, clauses = NULL_TREE;
+
+      for (i = 0; vec_safe_iterate (parser->named_oacc_routines, i, &t); i++)
+	{
+	  if (!strcmp (IDENTIFIER_POINTER (DECL_NAME (fndecl)),
+		       IDENTIFIER_POINTER (t)))
+	    {
+	      found = true;
+	      clauses = TREE_CHAIN (t);
+	      break;
+	    }
+	}
+
+      if (!found)
+	return;
+
+      if (clauses != NULL_TREE)
+	clauses = tree_cons (NULL_TREE, clauses, NULL_TREE);
+      clauses = build_tree_list (get_identifier ("omp declare target"),
+				 clauses);
+      TREE_CHAIN (clauses) = DECL_ATTRIBUTES (fndecl);
+      DECL_ATTRIBUTES (fndecl) = clauses;
+    }
+}
 \f
 /* Decl-specifiers.  */
 
@@ -2185,6 +2245,9 @@ static tree cp_parser_late_parsing_omp_declare_simd
 static tree cp_parser_late_parsing_cilk_simd_fn_info
   (cp_parser *, tree);
 
+static tree cp_parser_late_parsing_oacc_routine
+  (cp_parser *, tree);
+
 static tree synthesize_implicit_template_parm
   (cp_parser *);
 static tree finish_fully_implicit_template
@@ -2541,6 +2604,11 @@ static bool cp_parser_array_designator_p
 static bool cp_parser_skip_to_closing_square_bracket
   (cp_parser *);
 
+/* OpenACC routines.  */
+static tree cp_parser_oacc_all_clauses (cp_parser *, omp_clause_mask,
+					const char *, cp_token *,
+					omp_clause_mask, bool, bool);
+
 /* Returns nonzero if we are parsing tentatively.  */
 
 static inline bool
@@ -3561,6 +3629,10 @@ cp_parser_new (void)
   parser->implicit_template_parms = 0;
   parser->implicit_template_scope = 0;
 
+  /* The list of OpenACC routines pragmas is unitialized.  */
+  parser->oacc_routine = NULL;
+  parser->named_oacc_routines = NULL;
+
   return parser;
 }
 
@@ -17150,6 +17222,7 @@ cp_parser_init_declarator (cp_parser* parser,
 			 range_for_decl_p? SD_INITIALIZED : is_initialized,
 			 attributes, prefix_attributes, &pushed_scope);
       cp_finalize_omp_declare_simd (parser, decl);
+      cp_finalize_oacc_routine (parser, decl);
       /* Adjust location of decl if declarator->id_loc is more appropriate:
 	 set, and decl wasn't merged with another decl, in which case its
 	 location would be different from input_location, and more accurate.  */
@@ -17263,6 +17336,7 @@ cp_parser_init_declarator (cp_parser* parser,
       if (decl && TREE_CODE (decl) == FUNCTION_DECL)
 	cp_parser_save_default_args (parser, decl);
       cp_finalize_omp_declare_simd (parser, decl);
+      cp_finalize_oacc_routine (parser, decl);
     }
 
   /* Finish processing the declaration.  But, skip member
@@ -18329,11 +18403,15 @@ cp_parser_late_return_type_opt (cp_parser* parser, cp_declarator *declarator,
 
   bool cilk_simd_fn_vector_p = (parser->cilk_simd_fn_info 
 				&& declarator && declarator->kind == cdk_id);
-  
+
+  bool oacc_routine_p = (parser->oacc_routine
+			&& declarator && declarator->kind == cdk_id);
+
   /* Peek at the next token.  */
   token = cp_lexer_peek_token (parser->lexer);
   /* A late-specified return type is indicated by an initial '->'. */
-  if (token->type != CPP_DEREF && !(declare_simd_p || cilk_simd_fn_vector_p))
+  if (token->type != CPP_DEREF && !(declare_simd_p || cilk_simd_fn_vector_p
+				    || oacc_routine_p))
     return NULL_TREE;
 
   tree save_ccp = current_class_ptr;
@@ -18360,6 +18438,10 @@ cp_parser_late_return_type_opt (cp_parser* parser, cp_declarator *declarator,
     declarator->std_attributes
       = cp_parser_late_parsing_omp_declare_simd (parser,
 						 declarator->std_attributes);
+  if (oacc_routine_p)
+    declarator->std_attributes
+      = cp_parser_late_parsing_oacc_routine (parser,
+					     declarator->std_attributes);
 
   if (quals >= 0)
     {
@@ -21097,6 +21179,7 @@ cp_parser_member_declaration (cp_parser* parser)
 	    }
 
 	  cp_finalize_omp_declare_simd (parser, decl);
+	  cp_finalize_oacc_routine (parser, decl);
 
 	  /* Reset PREFIX_ATTRIBUTES.  */
 	  while (attributes && TREE_CHAIN (attributes) != first_attribute)
@@ -23349,6 +23432,9 @@ cp_parser_function_definition_from_specifiers_and_declarator
     {
       cp_finalize_omp_declare_simd (parser, current_function_decl);
       parser->omp_declare_simd = NULL;
+
+      cp_finalize_oacc_routine (parser, current_function_decl);
+      parser->oacc_routine = NULL;
     }
 
   if (!success_p)
@@ -23910,6 +23996,7 @@ cp_parser_save_member_function_body (cp_parser* parser,
   /* Create the FUNCTION_DECL.  */
   fn = grokmethod (decl_specifiers, declarator, attributes);
   cp_finalize_omp_declare_simd (parser, fn);
+  cp_finalize_oacc_routine (parser, fn);
   /* If something went badly wrong, bail out now.  */
   if (fn == error_mark_node)
     {
@@ -27529,11 +27616,13 @@ cp_parser_objc_at_dynamic_declaration (cp_parser *parser)
    returned and the token is consumed.  */
 
 static pragma_omp_clause
-cp_parser_omp_clause_name (cp_parser *parser)
+cp_parser_omp_clause_name (cp_parser *parser, bool consume_token = true)
 {
   pragma_omp_clause result = PRAGMA_OMP_CLAUSE_NONE;
 
-  if (cp_lexer_next_token_is_keyword (parser->lexer, RID_IF))
+  if (cp_lexer_next_token_is_keyword (parser->lexer, RID_AUTO))
+    result = PRAGMA_OACC_CLAUSE_AUTO;
+  else if (cp_lexer_next_token_is_keyword (parser->lexer, RID_IF))
     result = PRAGMA_OMP_CLAUSE_IF;
   else if (cp_lexer_next_token_is_keyword (parser->lexer, RID_DEFAULT))
     result = PRAGMA_OMP_CLAUSE_DEFAULT;
@@ -27556,6 +27645,10 @@ cp_parser_omp_clause_name (cp_parser *parser)
 	  else if (!strcmp ("async", p))
 	    result = PRAGMA_OACC_CLAUSE_ASYNC;
 	  break;
+	case 'b':
+	  if (!strcmp ("bind", p))
+	    result = PRAGMA_OACC_CLAUSE_BIND;
+	  break;
 	case 'c':
 	  if (!strcmp ("collapse", p))
 	    result = PRAGMA_OMP_CLAUSE_COLLAPSE;
@@ -27575,6 +27668,11 @@ cp_parser_omp_clause_name (cp_parser *parser)
 	    result = PRAGMA_OMP_CLAUSE_DEPEND;
 	  else if (!strcmp ("device", p))
 	    result = PRAGMA_OMP_CLAUSE_DEVICE;
+	  else if (!strcmp ("device_resident", p))
+	    result = PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT;
+	  else if (!strcmp ("device_type", p)
+		   || !strcmp ("dtype", p))
+	    result = PRAGMA_OACC_CLAUSE_DEVICE_TYPE;
 	  else if (!strcmp ("deviceptr", p))
 	    result = PRAGMA_OACC_CLAUSE_DEVICEPTR;
 	  else if (!strcmp ("dist_schedule", p))
@@ -27592,15 +27690,23 @@ cp_parser_omp_clause_name (cp_parser *parser)
 	  if (!strcmp ("host", p))
 	    result = PRAGMA_OACC_CLAUSE_HOST;
 	  break;
+	case 'g':
+	  if (!strcmp ("gang", p))
+	    result = PRAGMA_OACC_CLAUSE_GANG;
+	  break;
 	case 'i':
 	  if (!strcmp ("inbranch", p))
 	    result = PRAGMA_OMP_CLAUSE_INBRANCH;
+	  else if (!strcmp ("independent", p))
+	    result = PRAGMA_OACC_CLAUSE_INDEPENDENT;
 	  break;
 	case 'l':
 	  if (!strcmp ("lastprivate", p))
 	    result = PRAGMA_OMP_CLAUSE_LASTPRIVATE;
 	  else if (!strcmp ("linear", p))
 	    result = PRAGMA_OMP_CLAUSE_LINEAR;
+	  else if (!strcmp ("link", p))
+	    result = PRAGMA_OACC_CLAUSE_LINK;
 	  break;
 	case 'm':
 	  if (!strcmp ("map", p))
@@ -27615,6 +27721,8 @@ cp_parser_omp_clause_name (cp_parser *parser)
 	    result = PRAGMA_OMP_CLAUSE_NOTINBRANCH;
 	  else if (!strcmp ("nowait", p))
 	    result = PRAGMA_OMP_CLAUSE_NOWAIT;
+	  else if (!strcmp ("nohost", p))
+	    result = PRAGMA_OACC_CLAUSE_NOHOST;
 	  else if (flag_cilkplus && !strcmp ("nomask", p))
 	    result = PRAGMA_CILK_CLAUSE_NOMASK;
 	  else if (!strcmp ("num_gangs", p))
@@ -27661,8 +27769,10 @@ cp_parser_omp_clause_name (cp_parser *parser)
 	    result = PRAGMA_OMP_CLAUSE_SCHEDULE;
 	  else if (!strcmp ("sections", p))
 	    result = PRAGMA_OMP_CLAUSE_SECTIONS;
-	  else if (!strcmp ("self", p))
-	    result = PRAGMA_OACC_CLAUSE_SELF;
+	  else if (!strcmp ("self", p)) /* "self" is a synonym for "host".  */
+	    result = PRAGMA_OACC_CLAUSE_HOST;
+	  else if (!strcmp ("seq", p))
+	    result = PRAGMA_OACC_CLAUSE_SEQ;
 	  else if (!strcmp ("shared", p))
 	    result = PRAGMA_OMP_CLAUSE_SHARED;
 	  else if (!strcmp ("simdlen", p))
@@ -27673,6 +27783,8 @@ cp_parser_omp_clause_name (cp_parser *parser)
 	    result = PRAGMA_OMP_CLAUSE_TASKGROUP;
 	  else if (!strcmp ("thread_limit", p))
 	    result = PRAGMA_OMP_CLAUSE_THREAD_LIMIT;
+	  else if (!strcmp ("tile", p))
+	    result = PRAGMA_OACC_CLAUSE_TILE;
 	  else if (!strcmp ("to", p))
 	    result = PRAGMA_OMP_CLAUSE_TO;
 	  break;
@@ -27681,9 +27793,13 @@ cp_parser_omp_clause_name (cp_parser *parser)
 	    result = PRAGMA_OMP_CLAUSE_UNIFORM;
 	  else if (!strcmp ("untied", p))
 	    result = PRAGMA_OMP_CLAUSE_UNTIED;
+	  else if (!strcmp ("use_device", p))
+	    result = PRAGMA_OACC_CLAUSE_USE_DEVICE;
 	  break;
 	case 'v':
-	  if (!strcmp ("vector_length", p))
+	  if (!strcmp ("vector", p))
+	    result = PRAGMA_OACC_CLAUSE_VECTOR;
+	  else if (!strcmp ("vector_length", p))
 	    result = PRAGMA_OACC_CLAUSE_VECTOR_LENGTH;
 	  else if (flag_cilkplus && !strcmp ("vectorlength", p))
 	    result = PRAGMA_CILK_CLAUSE_VECTORLENGTH;
@@ -27691,11 +27807,13 @@ cp_parser_omp_clause_name (cp_parser *parser)
 	case 'w':
 	  if (!strcmp ("wait", p))
 	    result = PRAGMA_OACC_CLAUSE_WAIT;
+	  else if (!strcmp ("worker", p))
+	    result = PRAGMA_OACC_CLAUSE_WORKER;
 	  break;
 	}
     }
 
-  if (result != PRAGMA_OMP_CLAUSE_NONE)
+  if (consume_token && result != PRAGMA_OMP_CLAUSE_NONE)
     cp_lexer_consume_token (parser->lexer);
 
   return result;
@@ -27893,6 +28011,8 @@ cp_parser_omp_var_list (cp_parser *parser, enum omp_clause_code kind, tree list)
    copyout ( variable-list )
    create ( variable-list )
    delete ( variable-list )
+   device_resident ( variable-list )
+   link ( variable-list )
    present ( variable-list )
    present_or_copy ( variable-list )
      pcopy ( variable-list )
@@ -27928,10 +28048,15 @@ cp_parser_oacc_data_clause (cp_parser *parser, pragma_omp_clause c_kind,
     case PRAGMA_OACC_CLAUSE_DEVICE:
       kind = GOMP_MAP_FORCE_TO;
       break;
+    case PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT:
+      kind = GOMP_MAP_DEVICE_RESIDENT;
+      break;
     case PRAGMA_OACC_CLAUSE_HOST:
-    case PRAGMA_OACC_CLAUSE_SELF:
       kind = GOMP_MAP_FORCE_FROM;
       break;
+    case PRAGMA_OACC_CLAUSE_LINK:
+      kind = GOMP_MAP_LINK;
+      break;
     case PRAGMA_OACC_CLAUSE_PRESENT:
       kind = GOMP_MAP_FORCE_PRESENT;
       break;
@@ -27999,43 +28124,360 @@ cp_parser_oacc_data_clause_deviceptr (cp_parser *parser, tree list)
   return list;
 }
 
+/* Attempt to statically determine when the number T isn't positive.
+   Warn if we determined this and return positive one as the new
+   expression.  */
+static tree
+require_positive_expr (tree t, location_t loc, const char *str)
+{
+  tree c = fold_build2_loc (loc, LE_EXPR, boolean_type_node, t,
+			    build_int_cst (TREE_TYPE (t), 0));
+  if (c == boolean_true_node)
+    {
+      warning_at (loc, 0,
+		  "%<%s%> value must be positive", str);
+      t = integer_one_node;
+    }
+  return t;
+}
+
 /* OpenACC:
-   vector_length ( expression ) */
+   num_gangs ( expression )
+   num_workers ( expression )
+   vector_length ( expression )
+
+   OpenMP 2.5:
+   num_threads ( expression ) */
 
 static tree
-cp_parser_oacc_clause_vector_length (cp_parser *parser, tree list)
+cp_parser_omp_positive_int_clause (cp_parser *parser, pragma_omp_clause c_kind,
+				   const char *str, tree list)
 {
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-  bool error = false;
+  omp_clause_code kind;
+  switch (c_kind)
+    {
+    default:
+      gcc_unreachable ();
+    case PRAGMA_OACC_CLAUSE_NUM_GANGS:
+      kind = OMP_CLAUSE_NUM_GANGS;
+      break;
+    case PRAGMA_OMP_CLAUSE_NUM_THREADS:
+      kind = OMP_CLAUSE_NUM_THREADS;
+      break;
+    case PRAGMA_OACC_CLAUSE_NUM_WORKERS:
+      kind = OMP_CLAUSE_NUM_WORKERS;
+      break;
+    case PRAGMA_OACC_CLAUSE_VECTOR_LENGTH:
+      kind = OMP_CLAUSE_VECTOR_LENGTH;
+      break;
+    }
+
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
 
   if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
     return list;
 
-  t = cp_parser_condition (parser);
-  if (t == error_mark_node || !INTEGRAL_TYPE_P (TREE_TYPE (t)))
-    {
-      error_at (location, "expected positive integer expression");
-      error = true;
-    }
+  tree t = cp_parser_assignment_expression (parser, NULL, false, false);
 
-  if (error || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-    {
-      cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
+  if (t == error_mark_node
+      || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
+    cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
 					   /*or_comma=*/false,
 					   /*consume_paren=*/true);
+
+  check_no_duplicate_clause (list, kind, str, loc);
+
+  tree c = build_omp_clause (loc, kind);
+  OMP_CLAUSE_OPERAND (c, 0) = t;
+  OMP_CLAUSE_CHAIN (c) = list;
+  return c;
+}
+
+/* OpenACC:
+   gang [( gang_expr_list )]
+   worker [( expression )]
+   vector [( expression )] */
+
+static tree
+cp_parser_oacc_shape_clause (cp_parser *parser, pragma_omp_clause c_kind,
+			     const char *str, tree list)
+{
+  omp_clause_code kind;
+  const char *id = "num";
+  cp_lexer *lexer = parser->lexer;
+
+  switch (c_kind)
+    {
+    default:
+      gcc_unreachable ();
+    case PRAGMA_OACC_CLAUSE_GANG:
+      kind = OMP_CLAUSE_GANG;
+      break;
+    case PRAGMA_OACC_CLAUSE_VECTOR:
+      kind = OMP_CLAUSE_VECTOR;
+      id = "length";
+      break;
+    case PRAGMA_OACC_CLAUSE_WORKER:
+      kind = OMP_CLAUSE_WORKER;
+      break;
+    }
+
+  tree op0 = NULL_TREE, op1 = NULL_TREE;
+  location_t loc = cp_lexer_peek_token (lexer)->location;
+
+  if (cp_lexer_next_token_is (lexer, CPP_OPEN_PAREN))
+    {
+      tree *op_to_parse = &op0;
+      cp_lexer_consume_token (lexer);
+
+      do
+	{
+	  if (cp_lexer_next_token_is (lexer, CPP_NAME)
+	      || cp_lexer_next_token_is (lexer, CPP_KEYWORD))
+	    {
+	      tree name_kind = cp_lexer_peek_token (lexer)->u.value;
+	      const char *p = IDENTIFIER_POINTER (name_kind);
+	      if (kind == OMP_CLAUSE_GANG && strcmp ("static", p) == 0)
+		{
+		  cp_lexer_consume_token (lexer);
+		  if (!cp_parser_require (parser, CPP_COLON, RT_COLON))
+		    {
+		      cp_parser_skip_to_closing_parenthesis (parser, false,
+							     false, true);
+		      return list;
+		    }
+		  op_to_parse = &op1;
+		  if (cp_lexer_next_token_is (lexer, CPP_MULT))
+		    {
+		      if (*op_to_parse != NULL_TREE)
+			{
+			  cp_parser_error (parser,
+					   "duplicate %<num%> argument");
+			  cp_parser_skip_to_closing_parenthesis (parser,
+								 false, false,
+								 true);
+			  return list;
+			}
+		      cp_lexer_consume_token (lexer);
+		      *op_to_parse = integer_minus_one_node;
+		      if (cp_lexer_next_token_is (lexer, CPP_COMMA))
+			cp_lexer_consume_token (lexer);
+		      continue;
+		    }
+		}
+	      else if (strcmp (id, p) == 0)
+		{
+		  op_to_parse = &op0;
+		  cp_lexer_consume_token (lexer);
+		  if (!cp_parser_require (parser, CPP_COLON, RT_COLON))
+		    {
+		      cp_parser_skip_to_closing_parenthesis (parser, false,
+							     false, true);
+		      return list;
+		    }
+		}
+	      else
+		{
+		  if (kind == OMP_CLAUSE_GANG)
+		    cp_parser_error (parser,
+				     "expected %<%num%> or %<static%>");
+		  else if (kind == OMP_CLAUSE_VECTOR)
+		    cp_parser_error (parser, "expected %<length%>");
+		  else
+		    cp_parser_error (parser, "expected %<num%>");
+		  cp_parser_skip_to_closing_parenthesis (parser, false, false,
+							 true);
+		  return list;
+		}
+	    }
+
+	  if (*op_to_parse != NULL_TREE)
+	    {
+	      cp_parser_error (parser, "duplicate operand to clause");
+	      cp_parser_skip_to_closing_parenthesis (parser, false, false,
+						     true);
+	      return list;
+	    }
+
+	  location_t expr_loc = cp_lexer_peek_token (lexer)->location;
+	  tree expr = cp_parser_assignment_expression (parser, NULL, false,
+						       false);
+	  if (expr == error_mark_node)
+	    {
+	      cp_parser_skip_to_closing_parenthesis (parser, false, false,
+						     true);
+	      return list;
+	    }
+
+	  mark_exp_read (expr);
+	  require_positive_expr (expr, expr_loc, str);
+	  *op_to_parse = expr;
+
+	  if (cp_lexer_next_token_is (lexer, CPP_COMMA))
+	    cp_lexer_consume_token (lexer);
+	}
+      while (!cp_lexer_next_token_is (lexer, CPP_CLOSE_PAREN));
+      cp_lexer_consume_token (lexer);
+    }
+
+  check_no_duplicate_clause (list, kind, str, loc);
+
+  tree c = build_omp_clause (loc, kind);
+  if (op0)
+    OMP_CLAUSE_OPERAND (c, 0) = op0;
+  if (op1)
+    OMP_CLAUSE_OPERAND (c, 1) = op1;
+  OMP_CLAUSE_CHAIN (c) = list;
+  return c;
+}
+
+/* OpenACC 2.0:
+   device_type ( size-expr-list ) clauses */
+
+static tree
+cp_parser_oacc_clause_device_type (cp_parser *parser, omp_clause_mask mask,
+				   tree list, cp_token *pragma_tok)
+{
+  tree c, clauses;
+  location_t loc;
+  int dev_id = GOMP_DEVICE_NONE;
+
+  loc = cp_lexer_peek_token (parser->lexer)->location;
+  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
+    return list;
+
+  if (cp_lexer_next_token_is (parser->lexer, CPP_MULT))
+    {
+      cp_lexer_consume_token (parser->lexer);
+      dev_id = GOMP_DEVICE_DEFAULT;
+      if (!cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
+	return list;
+    }
+  else
+    {
+      do
+	{
+	  tree keyword = error_mark_node;
+	  int dev = 0;
+
+	  if (cp_lexer_next_token_is (parser->lexer, CPP_NAME))
+	    {
+	      keyword = cp_lexer_peek_token (parser->lexer)->u.value;
+	      cp_lexer_consume_token (parser->lexer);
+	    }
+
+	  if (keyword == error_mark_node)
+	    {
+	      error_at (loc, "expected keyword or %<)%>");
+	      cp_parser_skip_to_closing_parenthesis (parser, true, false,
+						     true);
+	      return list;
+	    }
+
+	  dev = oacc_extract_device_id (IDENTIFIER_POINTER (keyword));
+	  if (dev)
+	    dev_id |= 1 << dev;
+
+	  if (cp_lexer_next_token_is (parser->lexer, CPP_COMMA))
+	    cp_lexer_consume_token (parser->lexer);
+	}
+      while (cp_lexer_next_token_is_not (parser->lexer, CPP_CLOSE_PAREN));
+
+      /* Consume the trailing ')'.  */
+      cp_lexer_consume_token (parser->lexer);
+    }
+
+  c = build_omp_clause (loc, OMP_CLAUSE_DEVICE_TYPE);
+  clauses = cp_parser_oacc_all_clauses (parser, mask, "device_type",
+					pragma_tok, 0, false, false);
+  OMP_CLAUSE_DEVICE_TYPE_CLAUSES (c) = clauses;
+  OMP_CLAUSE_DEVICE_TYPE_DEVICES (c) = build_int_cst (integer_type_node,
+						      dev_id);
+  OMP_CLAUSE_CHAIN (c) = list;
+  return c;
+}
+
+/* OpenACC 2.0:
+   tile ( size-expr-list ) */
+
+static tree
+cp_parser_oacc_clause_tile (cp_parser *parser, tree list, location_t here)
+{
+  tree c, num = error_mark_node;
+  HOST_WIDE_INT n;
+  location_t loc;
+  tree tile = NULL_TREE;
+  vec<tree, va_gc> *tvec = make_tree_vector ();
+
+  check_no_duplicate_clause (list, OMP_CLAUSE_TILE, "tile", here);
+
+  loc = cp_lexer_peek_token (parser->lexer)->location;
+  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
+    {
+      release_tree_vector (tvec);
       return list;
     }
 
-  check_no_duplicate_clause (list, OMP_CLAUSE_VECTOR_LENGTH, "vector_length",
-			     location);
+  do
+    {
+      if (cp_lexer_next_token_is (parser->lexer, CPP_MULT))
+	{
+	  cp_lexer_consume_token (parser->lexer);
+	  num = integer_minus_one_node;
+	}
+      else
+	{
+	  bool non_constant = false;
+	  num = cp_parser_constant_expression (parser, true, &non_constant);
 
-  c = build_omp_clause (location, OMP_CLAUSE_VECTOR_LENGTH);
-  OMP_CLAUSE_VECTOR_LENGTH_EXPR (c) = t;
+	  if (num == error_mark_node)
+	    {
+	      cp_parser_skip_to_closing_parenthesis (parser, true, false,
+						     true);
+	      release_tree_vector (tvec);
+	      return list;
+	    }
+
+	  num = fold_non_dependent_expr (num);
+
+	  if (non_constant
+	      || !INTEGRAL_TYPE_P (TREE_TYPE (num))
+	      || !tree_fits_shwi_p (num)
+	      || (n = tree_to_shwi (num)) <= 0
+	      || (int) n != n)
+	    {
+	      error_at (loc,
+			"tile argument needs positive constant integer "
+			"expression");
+	      release_tree_vector (tvec);
+	      cp_parser_skip_to_closing_parenthesis (parser, true, false,
+						     true);
+	      return list;
+	    }
+	}
+
+      if (num == error_mark_node)
+	{
+	  error_at (loc, "expected positive integer or %<)%>");
+	  release_tree_vector (tvec);
+	  return list;
+	}
+
+      vec_safe_push (tvec, num);
+      if (cp_lexer_next_token_is (parser->lexer, CPP_COMMA))
+	cp_lexer_consume_token (parser->lexer);
+    }
+  while (cp_lexer_next_token_is_not (parser->lexer, CPP_CLOSE_PAREN));
+
+  /* Consume the trailing ')'.  */
+  cp_lexer_consume_token (parser->lexer);
+
+  c = build_omp_clause (loc, OMP_CLAUSE_TILE);
+  tile = build_tree_list_vec (tvec);
+  OMP_CLAUSE_TILE_LIST (c) = tile;
   OMP_CLAUSE_CHAIN (c) = list;
-  list = c;
-
-  return list;
+  release_tree_vector (tvec);
+  return c;
 }
 
 /* OpenACC 2.0
@@ -28054,7 +28496,8 @@ cp_parser_oacc_wait_list (cp_parser *parser, location_t clause_loc, tree list)
 
   if (args == NULL || args->length () == 0)
     {
-      cp_parser_error (parser, "expected integer expression before ')'");
+      cp_parser_error (parser,
+		       "expected integer expression list before %<)%>");
       if (args != NULL)
 	release_tree_vector (args);
       return list;
@@ -28148,7 +28591,8 @@ cp_parser_omp_clause_collapse (cp_parser *parser, tree list, location_t location
    default ( shared | none ) */
 
 static tree
-cp_parser_omp_clause_default (cp_parser *parser, tree list, location_t location)
+cp_parser_omp_clause_default (cp_parser *parser, tree list,
+			      location_t location, bool is_omp)
 {
   enum omp_clause_default_kind kind = OMP_CLAUSE_DEFAULT_UNSPECIFIED;
   tree c;
@@ -28169,7 +28613,7 @@ cp_parser_omp_clause_default (cp_parser *parser, tree list, location_t location)
 	  break;
 
 	case 's':
-	  if (strcmp ("shared", p) != 0)
+	  if (strcmp ("shared", p) != 0 || !is_omp)
 	    goto invalid_kind;
 	  kind = OMP_CLAUSE_DEFAULT_SHARED;
 	  break;
@@ -28183,7 +28627,10 @@ cp_parser_omp_clause_default (cp_parser *parser, tree list, location_t location)
   else
     {
     invalid_kind:
-      cp_parser_error (parser, "expected %<none%> or %<shared%>");
+      if (is_omp)
+	cp_parser_error (parser, "expected %<none%> or %<shared%>");
+      else
+	cp_parser_error (parser, "expected %<none%>");
     }
 
   if (!cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
@@ -28291,109 +28738,6 @@ cp_parser_omp_clause_nowait (cp_parser * /*parser*/,
   return c;
 }
 
-/* OpenACC:
-   num_gangs ( expression ) */
-
-static tree
-cp_parser_omp_clause_num_gangs (cp_parser *parser, tree list)
-{
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-
-  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
-    return list;
-
-  t = cp_parser_condition (parser);
-
-  if (t == error_mark_node
-      || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-    cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-					   /*or_comma=*/false,
-					   /*consume_paren=*/true);
-
-  if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
-    {
-      error_at (location, "expected positive integer expression");
-      return list;
-    }
-
-  check_no_duplicate_clause (list, OMP_CLAUSE_NUM_GANGS, "num_gangs", location);
-
-  c = build_omp_clause (location, OMP_CLAUSE_NUM_GANGS);
-  OMP_CLAUSE_NUM_GANGS_EXPR (c) = t;
-  OMP_CLAUSE_CHAIN (c) = list;
-  list = c;
-
-  return list;
-}
-
-/* OpenMP 2.5:
-   num_threads ( expression ) */
-
-static tree
-cp_parser_omp_clause_num_threads (cp_parser *parser, tree list,
-				  location_t location)
-{
-  tree t, c;
-
-  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
-    return list;
-
-  t = cp_parser_expression (parser);
-
-  if (t == error_mark_node
-      || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-    cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-					   /*or_comma=*/false,
-					   /*consume_paren=*/true);
-
-  check_no_duplicate_clause (list, OMP_CLAUSE_NUM_THREADS,
-			     "num_threads", location);
-
-  c = build_omp_clause (location, OMP_CLAUSE_NUM_THREADS);
-  OMP_CLAUSE_NUM_THREADS_EXPR (c) = t;
-  OMP_CLAUSE_CHAIN (c) = list;
-
-  return c;
-}
-
-/* OpenACC:
-   num_workers ( expression ) */
-
-static tree
-cp_parser_omp_clause_num_workers (cp_parser *parser, tree list)
-{
-  tree t, c;
-  location_t location = cp_lexer_peek_token (parser->lexer)->location;
-
-  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
-    return list;
-
-  t = cp_parser_condition (parser);
-
-  if (t == error_mark_node
-      || !cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
-    cp_parser_skip_to_closing_parenthesis (parser, /*recovering=*/true,
-					   /*or_comma=*/false,
-					   /*consume_paren=*/true);
-
-  if (!INTEGRAL_TYPE_P (TREE_TYPE (t)))
-    {
-      error_at (location, "expected positive integer expression");
-      return list;
-    }
-
-  check_no_duplicate_clause (list, OMP_CLAUSE_NUM_WORKERS, "num_gangs",
-								location);
-
-  c = build_omp_clause (location, OMP_CLAUSE_NUM_WORKERS);
-  OMP_CLAUSE_NUM_WORKERS_EXPR (c) = t;
-  OMP_CLAUSE_CHAIN (c) = list;
-  list = c;
-
-  return list;
-}
-
 /* OpenMP 2.5:
    ordered */
 
@@ -28613,27 +28957,20 @@ cp_parser_omp_clause_schedule (cp_parser *parser, tree list, location_t location
 }
 
 /* OpenMP 3.0:
-   untied */
+   untied
 
-static tree
-cp_parser_omp_clause_untied (cp_parser * /*parser*/,
-			     tree list, location_t location)
-{
-  tree c;
-
-  check_no_duplicate_clause (list, OMP_CLAUSE_UNTIED, "untied", location);
-
-  c = build_omp_clause (location, OMP_CLAUSE_UNTIED);
-  OMP_CLAUSE_CHAIN (c) = list;
-  return c;
-}
-
-/* OpenMP 4.0:
+   OpenMP 4.0:
    inbranch
-   notinbranch */
+   notinbranch
+
+   OpenACC 2.0:
+   auto
+   independent
+   nohost
+   seq */
 
 static tree
-cp_parser_omp_clause_branch (cp_parser * /*parser*/, enum omp_clause_code code,
+cp_parser_omp_simple_clause (cp_parser * /*parser*/, enum omp_clause_code code,
 			     tree list, location_t location)
 {
   check_no_duplicate_clause (list, code, omp_clause_code_name[code], location);
@@ -29121,16 +29458,66 @@ cp_parser_oacc_clause_async (cp_parser *parser, tree list)
   return list;
 }
 
+/* OpenACC 2.0:
+   bind ( identifier )
+   bind ( string-literal ) */
+
+static tree
+cp_parser_oacc_clause_bind (cp_parser *parser, tree list)
+{
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
+  bool save_translate_strings_p = parser->translate_strings_p;
+
+  parser->translate_strings_p = false;
+  if (!cp_parser_require (parser, CPP_OPEN_PAREN, RT_OPEN_PAREN))
+    {
+      parser->translate_strings_p = save_translate_strings_p;
+      return list;
+    }
+  if (cp_lexer_next_token_is (parser->lexer, CPP_NAME)
+      || cp_lexer_next_token_is (parser->lexer, CPP_STRING))
+    {
+      tree t;
+
+      if (cp_lexer_peek_token (parser->lexer)->type == CPP_STRING)
+	{
+	  t = cp_lexer_peek_token (parser->lexer)->u.value;
+	  cp_lexer_consume_token (parser->lexer);
+	}
+      else
+	t = cp_parser_id_expression (parser, /*template_p=*/false,
+				     /*check_dependency_p=*/true,
+				     /*template_p=*/NULL,
+				     /*declarator_p=*/false,
+				     /*optional_p=*/false);
+      if (t == error_mark_node)
+	return t;
+
+      tree c = build_omp_clause (loc, OMP_CLAUSE_BIND);
+      OMP_CLAUSE_BIND_NAME (c) = t;
+      OMP_CLAUSE_CHAIN (c) = list;
+      list = c;
+    }
+  else
+    cp_parser_error (parser, "expected identifier or character string literal");
+  parser->translate_strings_p = save_translate_strings_p;
+  cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN);
+  return list;
+}
+
 /* Parse all OpenACC clauses.  The set clauses allowed by the directive
-   is a bitmask in MASK.  Return the list of clauses found.  */
+   is a bitmask in MASK.  DTYPE_MASK denotes clauses which may follow a
+   device_type mask.  Return the list of clauses found.  */
 
-static tree
+tree
 cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
-			   const char *where, cp_token *pragma_tok,
-			   bool finish_p = true)
+			    const char *where, cp_token *pragma_tok,
+			    omp_clause_mask dtype_mask = 0,
+			    bool finish_p = true, bool scan_dtype = true)
 {
   tree clauses = NULL;
   bool first = true;
+  bool seen_dtype = false;
 
   while (cp_lexer_next_token_is_not (parser->lexer, CPP_PRAGMA_EOL))
     {
@@ -29142,15 +29529,35 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
       if (!first && cp_lexer_next_token_is (parser->lexer, CPP_COMMA))
 	cp_lexer_consume_token (parser->lexer);
 
+      if (!scan_dtype && cp_parser_omp_clause_name (parser, false)
+	  == PRAGMA_OACC_CLAUSE_DEVICE_TYPE)
+	return clauses;
+
       here = cp_lexer_peek_token (parser->lexer)->location;
       c_kind = cp_parser_omp_clause_name (parser);
 
+      if (seen_dtype && c_kind != PRAGMA_OMP_CLAUSE_NONE
+	  && c_kind != PRAGMA_OACC_CLAUSE_DEVICE_TYPE)
+	{
+	  error_at (here, "invalid clauses following device_type");
+	  goto saw_error;
+	}
+
       switch (c_kind)
 	{
 	case PRAGMA_OACC_CLAUSE_ASYNC:
 	  clauses = cp_parser_oacc_clause_async (parser, clauses);
 	  c_name = "async";
 	  break;
+	case PRAGMA_OACC_CLAUSE_AUTO:
+	  clauses = cp_parser_omp_simple_clause (parser, OMP_CLAUSE_AUTO,
+						 clauses, here);
+	  c_name = "auto";
+	  break;
+	case PRAGMA_OACC_CLAUSE_BIND:
+	  clauses = cp_parser_oacc_clause_bind (parser, clauses);
+	  c_name = "bind";
+	  break;
 	case PRAGMA_OACC_CLAUSE_COLLAPSE:
 	  clauses = cp_parser_omp_clause_collapse (parser, clauses, here);
 	  c_name = "collapse";
@@ -29175,29 +29582,66 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	  clauses = cp_parser_oacc_data_clause (parser, c_kind, clauses);
 	  c_name = "delete";
 	  break;
+	case PRAGMA_OMP_CLAUSE_DEFAULT:
+	  clauses = cp_parser_omp_clause_default (parser, clauses, here,
+						  false);
+	  c_name = "default";
+	  break;
 	case PRAGMA_OACC_CLAUSE_DEVICE:
 	  clauses = cp_parser_oacc_data_clause (parser, c_kind, clauses);
 	  c_name = "device";
 	  break;
+	case PRAGMA_OACC_CLAUSE_DEVICE_TYPE:
+	  clauses = cp_parser_oacc_clause_device_type (parser, dtype_mask,
+						       clauses, pragma_tok);
+	  c_name = "device_type";
+	  seen_dtype = true;
+	  break;
+	case PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT:
+	  clauses = cp_parser_oacc_data_clause (parser, c_kind, clauses);
+	  c_name = "device_resident";
+	  break;
 	case PRAGMA_OACC_CLAUSE_DEVICEPTR:
 	  clauses = cp_parser_oacc_data_clause_deviceptr (parser, clauses);
 	  c_name = "deviceptr";
 	  break;
-	case PRAGMA_OACC_CLAUSE_HOST:
-	  clauses = cp_parser_oacc_data_clause (parser, c_kind, clauses);
-	  c_name = "host";
-	  break;
 	case PRAGMA_OACC_CLAUSE_IF:
 	  clauses = cp_parser_omp_clause_if (parser, clauses, here);
 	  c_name = "if";
 	  break;
+	case PRAGMA_OACC_CLAUSE_INDEPENDENT:
+	  clauses = cp_parser_omp_simple_clause (parser,
+						 OMP_CLAUSE_INDEPENDENT,
+						 clauses, here);
+	  c_name = "independent";
+	  break;
+	case PRAGMA_OACC_CLAUSE_GANG:
+	  c_name = "gang";
+	  clauses = cp_parser_oacc_shape_clause (parser, c_kind, c_name,
+						 clauses);
+	  break;
+	case PRAGMA_OACC_CLAUSE_HOST:
+	  clauses = cp_parser_oacc_data_clause (parser, c_kind, clauses);
+	  c_name = "host";
+	  break;
+	case PRAGMA_OACC_CLAUSE_LINK:
+	  clauses = cp_parser_oacc_data_clause (parser, c_kind, clauses);
+	  c_name = "link";
+	  break;
+	case PRAGMA_OACC_CLAUSE_NOHOST:
+	  clauses = cp_parser_omp_simple_clause (parser, OMP_CLAUSE_NOHOST,
+						 clauses, here);
+	  c_name = "nohost";
+	  break;
 	case PRAGMA_OACC_CLAUSE_NUM_GANGS:
-	  clauses = cp_parser_omp_clause_num_gangs (parser, clauses);
 	  c_name = "num_gangs";
+	  clauses = cp_parser_omp_positive_int_clause (parser, c_kind, c_name,
+						       clauses);
 	  break;
 	case PRAGMA_OACC_CLAUSE_NUM_WORKERS:
-	  clauses = cp_parser_omp_clause_num_workers (parser, clauses);
 	  c_name = "num_workers";
+	  clauses = cp_parser_omp_positive_int_clause (parser, c_kind, c_name,
+						       clauses);
 	  break;
 	case PRAGMA_OACC_CLAUSE_PRESENT:
 	  clauses = cp_parser_oacc_data_clause (parser, c_kind, clauses);
@@ -29219,22 +29663,48 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	  clauses = cp_parser_oacc_data_clause (parser, c_kind, clauses);
 	  c_name = "present_or_create";
 	  break;
+	case PRAGMA_OACC_CLAUSE_PRIVATE:
+	  clauses = cp_parser_omp_var_list (parser, OMP_CLAUSE_PRIVATE,
+					    clauses);
+	  c_name = "private";
+	  break;
 	case PRAGMA_OACC_CLAUSE_REDUCTION:
 	  clauses = cp_parser_omp_clause_reduction (parser, clauses);
 	  c_name = "reduction";
 	  break;
-	case PRAGMA_OACC_CLAUSE_SELF:
-	  clauses = cp_parser_oacc_data_clause (parser, c_kind, clauses);
-	  c_name = "self";
+	case PRAGMA_OACC_CLAUSE_SEQ:
+	  clauses = cp_parser_omp_simple_clause (parser, OMP_CLAUSE_SEQ,
+						 clauses, here);
+	  c_name = "seq";
+	  break;
+	case PRAGMA_OACC_CLAUSE_TILE:
+	  clauses = cp_parser_oacc_clause_tile (parser, clauses, here);
+	  c_name = "tile";
+	  break;
+	case PRAGMA_OACC_CLAUSE_USE_DEVICE:
+	  clauses = cp_parser_omp_var_list (parser, OMP_CLAUSE_USE_DEVICE,
+					    clauses);
+	  c_name = "use_device";
+	  break;
+	case PRAGMA_OACC_CLAUSE_VECTOR:
+	  c_name = "vector";
+	  clauses = cp_parser_oacc_shape_clause (parser, c_kind, c_name,
+						 clauses);
 	  break;
 	case PRAGMA_OACC_CLAUSE_VECTOR_LENGTH:
-	  clauses = cp_parser_oacc_clause_vector_length (parser, clauses);
 	  c_name = "vector_length";
+	  clauses = cp_parser_omp_positive_int_clause (parser, c_kind, c_name,
+						       clauses);
 	  break;
 	case PRAGMA_OACC_CLAUSE_WAIT:
 	  clauses = cp_parser_oacc_clause_wait (parser, clauses);
 	  c_name = "wait";
 	  break;
+	case PRAGMA_OACC_CLAUSE_WORKER:
+	  c_name = "worker";
+	  clauses = cp_parser_oacc_shape_clause (parser, c_kind, c_name,
+						clauses);
+	  break;
 	default:
 	  cp_parser_error (parser, "expected %<#pragma acc%> clause");
 	  goto saw_error;
@@ -29251,11 +29721,17 @@ cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	}
     }
 
+  if (!scan_dtype)
+    return clauses;
+
  saw_error:
   cp_parser_skip_to_pragma_eol (parser, pragma_tok);
 
   if (finish_p)
-    return finish_omp_clauses (clauses);
+    {
+      clauses = oacc_filter_device_types (clauses);
+      return finish_omp_clauses (clauses, true);
+    }
 
   return clauses;
 }
@@ -29304,7 +29780,7 @@ cp_parser_omp_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	  break;
 	case PRAGMA_OMP_CLAUSE_DEFAULT:
 	  clauses = cp_parser_omp_clause_default (parser, clauses,
-						  token->location);
+						  token->location, true);
 	  c_name = "default";
 	  break;
 	case PRAGMA_OMP_CLAUSE_FINAL:
@@ -29335,9 +29811,9 @@ cp_parser_omp_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	  c_name = "nowait";
 	  break;
 	case PRAGMA_OMP_CLAUSE_NUM_THREADS:
-	  clauses = cp_parser_omp_clause_num_threads (parser, clauses,
-						      token->location);
 	  c_name = "num_threads";
+	  clauses = cp_parser_omp_positive_int_clause (parser, c_kind, c_name,
+						       clauses);
 	  break;
 	case PRAGMA_OMP_CLAUSE_ORDERED:
 	  clauses = cp_parser_omp_clause_ordered (parser, clauses,
@@ -29364,19 +29840,19 @@ cp_parser_omp_all_clauses (cp_parser *parser, omp_clause_mask mask,
 	  c_name = "shared";
 	  break;
 	case PRAGMA_OMP_CLAUSE_UNTIED:
-	  clauses = cp_parser_omp_clause_untied (parser, clauses,
-						 token->location);
+	  clauses = cp_parser_omp_simple_clause (parser, OMP_CLAUSE_UNTIED,
+						 clauses, token->location);
 	  c_name = "untied";
 	  break;
 	case PRAGMA_OMP_CLAUSE_INBRANCH:
 	case PRAGMA_CILK_CLAUSE_MASK:
-	  clauses = cp_parser_omp_clause_branch (parser, OMP_CLAUSE_INBRANCH,
+	  clauses = cp_parser_omp_simple_clause (parser, OMP_CLAUSE_INBRANCH,
 						 clauses, token->location);
 	  c_name = "inbranch";
 	  break;
 	case PRAGMA_OMP_CLAUSE_NOTINBRANCH:
 	case PRAGMA_CILK_CLAUSE_NOMASK:
-	  clauses = cp_parser_omp_clause_branch (parser,
+	  clauses = cp_parser_omp_simple_clause (parser,
 						 OMP_CLAUSE_NOTINBRANCH,
 						 clauses, token->location);
 	  c_name = "notinbranch";
@@ -29507,7 +29983,7 @@ cp_parser_omp_all_clauses (cp_parser *parser, omp_clause_mask mask,
   if (!(flag_cilkplus && pragma_tok == NULL))
     cp_parser_skip_to_pragma_eol (parser, pragma_tok);
   if (finish_p)
-    return finish_omp_clauses (clauses);
+    return finish_omp_clauses (clauses, false);
   return clauses;
 }
 
@@ -30501,7 +30977,7 @@ cp_parser_omp_for_loop (cp_parser *parser, enum tree_code code, tree clauses,
 	    {
 	      c = build_omp_clause (loc, OMP_CLAUSE_PRIVATE);
 	      OMP_CLAUSE_DECL (c) = decl;
-	      c = finish_omp_clauses (c);
+	      c = finish_omp_clauses (c, false);
 	      if (c)
 		{
 		  OMP_CLAUSE_CHAIN (c) = clauses;
@@ -30640,7 +31116,7 @@ cp_omp_split_clauses (location_t loc, enum tree_code code,
   c_omp_split_clauses (loc, code, mask, clauses, cclauses);
   for (i = 0; i < C_OMP_CLAUSE_SPLIT_COUNT; i++)
     if (cclauses[i])
-      cclauses[i] = finish_omp_clauses (cclauses[i]);
+      cclauses[i] = finish_omp_clauses (cclauses[i], false);
 }
 
 /* OpenMP 4.0:
@@ -31490,7 +31966,7 @@ cp_parser_oacc_cache (cp_parser *parser, cp_token *pragma_tok)
   tree stmt, clauses;
 
   clauses = cp_parser_omp_var_list (parser, OMP_CLAUSE__CACHE_, NULL_TREE);
-  clauses = finish_omp_clauses (clauses);
+  clauses = finish_omp_clauses (clauses, true);
 
   cp_parser_require_pragma_eol (parser, cp_lexer_peek_token (parser->lexer));
 
@@ -31537,14 +32013,36 @@ cp_parser_oacc_data (cp_parser *parser, cp_token *pragma_tok)
   return stmt;
 }
 
+#define OACC_HOST_DATA_CLAUSE_MASK					\
+  ( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_USE_DEVICE) )
+
+/* OpenACC 2.0:
+  # pragma acc host_data <clauses> new-line
+  structured-block  */
+
+static tree
+cp_parser_oacc_host_data (cp_parser *parser, cp_token *pragma_tok)
+{
+  tree stmt, clauses, block;
+  unsigned int save;
+  
+  clauses = cp_parser_oacc_all_clauses (parser, OACC_HOST_DATA_CLAUSE_MASK,
+					"#pragma acc host_data", pragma_tok);
+
+  block = begin_omp_parallel ();
+  save = cp_parser_begin_omp_structured_block (parser);
+  cp_parser_statement (parser, NULL_TREE, false, NULL);
+  cp_parser_end_omp_structured_block (parser, save);
+  stmt = finish_oacc_host_data (clauses, block);
+  return stmt;
+}
+
 /* OpenACC 2.0:
    # pragma acc enter data oacc-enter-data-clause[optseq] new-line
 
    or
 
    # pragma acc exit data oacc-exit-data-clause[optseq] new-line
-
-   LOC is the location of the #pragma token.
 */
 
 #define OACC_ENTER_DATA_CLAUSE_MASK					\
@@ -31567,23 +32065,18 @@ static tree
 cp_parser_oacc_enter_exit_data (cp_parser *parser, cp_token *pragma_tok,
 				bool enter)
 {
+  location_t loc = pragma_tok->location;
   tree stmt, clauses;
+  const char *p = "";
 
-  if (cp_lexer_next_token_is (parser->lexer, CPP_PRAGMA_EOL)
-     || cp_lexer_next_token_is_not (parser->lexer, CPP_NAME))
-    {
-      cp_parser_error (parser, enter
-		       ? "expected %<data%> in %<#pragma acc enter data%>"
-		       : "expected %<data%> in %<#pragma acc exit data%>");
-      cp_parser_skip_to_pragma_eol (parser, pragma_tok);
-      return NULL_TREE;
-    }
+  if (cp_lexer_next_token_is (parser->lexer, CPP_NAME))
+    p = IDENTIFIER_POINTER (cp_lexer_peek_token (parser->lexer)->u.value);
 
-  const char *p =
-    IDENTIFIER_POINTER (cp_lexer_peek_token (parser->lexer)->u.value);
   if (strcmp (p, "data") != 0)
     {
-      cp_parser_error (parser, "invalid pragma");
+      error_at (loc, enter
+		? "expected %<data%> after %<#pragma acc enter%>"
+		: "expected %<data%> after %<#pragma acc exit%>");
       cp_parser_skip_to_pragma_eol (parser, pragma_tok);
       return NULL_TREE;
     }
@@ -31599,53 +32092,68 @@ cp_parser_oacc_enter_exit_data (cp_parser *parser, cp_token *pragma_tok,
 
   if (find_omp_clause (clauses, OMP_CLAUSE_MAP) == NULL_TREE)
     {
-      error_at (pragma_tok->location,
-		"%<#pragma acc enter data%> has no data movement clause");
+      error_at (loc, "%<#pragma acc %s data%> has no data movement clause",
+		enter ? "enter" : "exit");
       return NULL_TREE;
     }
 
   stmt = enter ? make_node (OACC_ENTER_DATA) : make_node (OACC_EXIT_DATA);
   TREE_TYPE (stmt) = void_type_node;
   OMP_STANDALONE_CLAUSES (stmt) = clauses;
-  SET_EXPR_LOCATION (stmt, pragma_tok->location);
+  SET_EXPR_LOCATION (stmt, loc);
   add_stmt (stmt);
   return stmt;
 }
 
-/* OpenACC 2.0:
-   # pragma acc kernels oacc-kernels-clause[optseq] new-line
-     structured-block  */
-
-#define OACC_KERNELS_CLAUSE_MASK					\
-	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_ASYNC)		\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPY)		\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN)		\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT)		\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE)		\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR)		\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF)			\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT)		\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPY)	\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYIN)	\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYOUT)	\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_CREATE)	\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT))
+/* Split the 'clauses' into a set of 'loop' clauses and a set of
+   'not-loop' clauses.  */
 
 static tree
-cp_parser_oacc_kernels (cp_parser *parser, cp_token *pragma_tok)
+oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses)
 {
-  tree stmt, clauses, block;
-  unsigned int save;
-
-  clauses = cp_parser_oacc_all_clauses (parser, OACC_KERNELS_CLAUSE_MASK,
-					"#pragma acc kernels", pragma_tok);
-
-  block = begin_omp_parallel ();
-  save = cp_parser_begin_omp_structured_block (parser);
-  cp_parser_statement (parser, NULL_TREE, false, NULL);
-  cp_parser_end_omp_structured_block (parser, save);
-  stmt = finish_oacc_kernels (clauses, block);
-  return stmt;
+  tree loop_clauses, next, c;
+
+  loop_clauses = *not_loop_clauses = NULL_TREE;
+
+  for (; clauses ; clauses = next)
+    {
+      next = OMP_CLAUSE_CHAIN (clauses);
+
+      switch (OMP_CLAUSE_CODE (clauses))
+        {
+	case OMP_CLAUSE_COLLAPSE:
+	case OMP_CLAUSE_REDUCTION:
+	case OMP_CLAUSE_GANG:
+	case OMP_CLAUSE_VECTOR:
+	case OMP_CLAUSE_WORKER:
+	case OMP_CLAUSE_AUTO:
+	case OMP_CLAUSE_SEQ:
+	  OMP_CLAUSE_CHAIN (clauses) = loop_clauses;
+	  loop_clauses = clauses;
+	  break;
+
+	case OMP_CLAUSE_PRIVATE:
+	  c = build_omp_clause (OMP_CLAUSE_LOCATION (clauses),
+			        OMP_CLAUSE_CODE (clauses));
+          OMP_CLAUSE_DECL (c) = OMP_CLAUSE_DECL (clauses);
+	  OMP_CLAUSE_CHAIN (c) = loop_clauses;
+	  loop_clauses = c;
+	  /* FALL THROUGH  */
+
+	default:
+	  OMP_CLAUSE_CHAIN (clauses) = *not_loop_clauses;
+	  *not_loop_clauses = clauses;
+	  break;
+	}
+    }
+
+  if (*not_loop_clauses)
+    finish_omp_clauses (*not_loop_clauses, true);
+
+  if (loop_clauses)
+    finish_omp_clauses (loop_clauses, true);
+
+  return loop_clauses;
 }
 
 /* OpenACC 2.0:
@@ -31654,16 +32162,43 @@ cp_parser_oacc_kernels (cp_parser *parser, cp_token *pragma_tok)
 
 #define OACC_LOOP_CLAUSE_MASK						\
 	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COLLAPSE)		\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_REDUCTION))
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICE_TYPE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_GANG)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRIVATE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_REDUCTION)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_VECTOR)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WORKER)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_AUTO)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_INDEPENDENT)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_SEQ)			\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_TILE))
+
+#define OACC_LOOP_CLAUSE_DEVICE_TYPE_MASK				\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COLLAPSE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_GANG)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WORKER)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_VECTOR)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_AUTO)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_INDEPENDENT) 	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_SEQ)			\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_TILE) )
 
 static tree
-cp_parser_oacc_loop (cp_parser *parser, cp_token *pragma_tok)
+cp_parser_oacc_loop (cp_parser *parser, cp_token *pragma_tok, char *p_name,
+		     omp_clause_mask mask, tree *cclauses)
 {
   tree stmt, clauses, block;
   int save;
 
-  clauses = cp_parser_oacc_all_clauses (parser, OACC_LOOP_CLAUSE_MASK,
-					"#pragma acc loop", pragma_tok);
+  strcat (p_name, " loop");
+  mask |= OACC_LOOP_CLAUSE_MASK;
+
+  clauses = cp_parser_oacc_all_clauses (parser, mask, p_name, pragma_tok,
+					OACC_LOOP_CLAUSE_DEVICE_TYPE_MASK,
+					cclauses == NULL);
+
+  if (cclauses)
+    clauses = oacc_split_loop_clauses (clauses, cclauses);
 
   block = begin_omp_structured_block ();
   save = cp_parser_begin_omp_structured_block (parser);
@@ -31674,6 +32209,31 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token *pragma_tok)
 }
 
 /* OpenACC 2.0:
+   # pragma acc kernels oacc-kernels-clause[optseq] new-line
+     structured-block  */
+
+#define OACC_KERNELS_CLAUSE_MASK					\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_ASYNC)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPY)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEFAULT)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICE_TYPE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF)			\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPY)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYIN)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYOUT)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_CREATE)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT))
+
+#define OACC_KERNELS_CLAUSE_DEVICE_TYPE_MASK				\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_ASYNC)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT) )
+
+/* OpenACC 2.0:
    # pragma acc parallel oacc-parallel-clause[optseq] new-line
      structured-block  */
 
@@ -31683,7 +32243,10 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token *pragma_tok)
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYIN)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_COPYOUT)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_CREATE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEFAULT)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICE_TYPE)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICEPTR)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_GANG)                \
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF)			\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_NUM_GANGS)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_NUM_WORKERS)		\
@@ -31692,24 +32255,68 @@ cp_parser_oacc_loop (cp_parser *parser, cp_token *pragma_tok)
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYIN)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_COPYOUT)	\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRESENT_OR_CREATE)   \
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_PRIVATE)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_REDUCTION)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_VECTOR_LENGTH)       \
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT))
 
+#define OACC_PARALLEL_CLAUSE_DEVICE_TYPE_MASK				\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_ASYNC)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_NUM_GANGS)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_NUM_WORKERS)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_VECTOR_LENGTH)	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT) )
+
 static tree
-cp_parser_oacc_parallel (cp_parser *parser, cp_token *pragma_tok)
+cp_parser_oacc_parallel_kernels (cp_parser *parser, cp_token *pragma_tok,
+				 char *p_name, bool is_parallel)
 {
   tree stmt, clauses, block;
   unsigned int save;
+  cp_lexer *lexer = parser->lexer;
+  omp_clause_mask mask, dtype_mask;
 
-  clauses = cp_parser_oacc_all_clauses (parser, OACC_PARALLEL_CLAUSE_MASK,
-					 "#pragma acc parallel", pragma_tok);
+  if (is_parallel)
+    {
+      mask = OACC_PARALLEL_CLAUSE_MASK;
+      strcat (p_name, " parallel");
+    }
+  else
+    {
+      mask = OACC_KERNELS_CLAUSE_MASK;
+      strcat (p_name, " kernels");
+    }
+
+  if (cp_lexer_next_token_is (lexer, CPP_NAME))
+    {
+      stmt = cp_lexer_peek_token (lexer)->u.value;
+      if (!strcmp ("loop", IDENTIFIER_POINTER (stmt)))
+	{
+	  tree combined_clauses = NULL_TREE;
+
+	  cp_lexer_consume_token (lexer);
+	  mask |= OACC_LOOP_CLAUSE_MASK;
+	  block = begin_omp_parallel ();
+	  cp_parser_oacc_loop (parser, pragma_tok, p_name, mask,
+			       &combined_clauses);
+	  stmt = is_parallel ? finish_oacc_parallel (combined_clauses, block)
+	    : finish_oacc_kernels (combined_clauses, block);
+	  return stmt;
+	}
+    }
+
+  dtype_mask = is_parallel ? OACC_PARALLEL_CLAUSE_DEVICE_TYPE_MASK
+    : OACC_KERNELS_CLAUSE_DEVICE_TYPE_MASK;
+
+  clauses = cp_parser_oacc_all_clauses (parser, mask, p_name, pragma_tok,
+					dtype_mask);
 
   block = begin_omp_parallel ();
   save = cp_parser_begin_omp_structured_block (parser);
   cp_parser_statement (parser, NULL_TREE, false, NULL);
   cp_parser_end_omp_structured_block (parser, save);
-  stmt = finish_oacc_parallel (clauses, block);
+  stmt = is_parallel ? finish_oacc_parallel (clauses, block)
+    : finish_oacc_kernels (clauses, block);
   return stmt;
 }
 
@@ -31720,18 +32327,23 @@ cp_parser_oacc_parallel (cp_parser *parser, cp_token *pragma_tok)
 #define OACC_UPDATE_CLAUSE_MASK						\
 	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_ASYNC)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICE_TYPE)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_HOST)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_IF)			\
-	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_SELF)		\
 	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT))
 
+#define OACC_UPDATE_CLAUSE_DEVICE_TYPE_MASK				\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_ASYNC)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WAIT) )
+
 static tree
 cp_parser_oacc_update (cp_parser *parser, cp_token *pragma_tok)
 {
   tree stmt, clauses;
 
   clauses = cp_parser_oacc_all_clauses (parser, OACC_UPDATE_CLAUSE_MASK,
-					 "#pragma acc update", pragma_tok);
+					"#pragma acc update", pragma_tok,
+					OACC_UPDATE_CLAUSE_DEVICE_TYPE_MASK);
 
   if (find_omp_clause (clauses, OMP_CLAUSE_MAP) == NULL_TREE)
     {
@@ -31751,8 +32363,6 @@ cp_parser_oacc_update (cp_parser *parser, cp_token *pragma_tok)
 
 /* OpenACC 2.0:
    # pragma acc wait [(intseq)] oacc-wait-clause[optseq] new-line
-
-   LOC is the location of the #pragma token.
 */
 
 #define OACC_WAIT_CLAUSE_MASK					\
@@ -31776,7 +32386,14 @@ cp_parser_oacc_wait (cp_parser *parser, cp_token *pragma_tok)
 }
 
 /* OpenMP 4.0:
-   # pragma omp declare simd declare-simd-clauses[optseq] new-line  */
+   # pragma omp declare simd declare-simd-clauses[optseq] new-line
+
+   OpenACC 2.0a:
+   # pragma acc routine oacc-routine-clause[optseq] new-line
+     function-definition
+
+   # pragma acc routine ( name ) oacc-routine-clause[optseq] new-line
+*/
 
 #define OMP_DECLARE_SIMD_CLAUSE_MASK				\
 	( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_SIMDLEN)	\
@@ -31788,26 +32405,42 @@ cp_parser_oacc_wait (cp_parser *parser, cp_token *pragma_tok)
 
 static void
 cp_parser_omp_declare_simd (cp_parser *parser, cp_token *pragma_tok,
-			    enum pragma_context context)
+			    enum pragma_context context, bool is_omp)
 {
-  bool first_p = parser->omp_declare_simd == NULL;
+  bool first_p = is_omp ? parser->omp_declare_simd == NULL
+    : parser->oacc_routine == NULL;
   cp_omp_declare_simd_data data;
   if (first_p)
     {
       data.error_seen = false;
       data.fndecl_seen = false;
       data.tokens = vNULL;
-      parser->omp_declare_simd = &data;
+      if (is_omp)
+	parser->omp_declare_simd = &data;
+      else
+	parser->oacc_routine = &data;
     }
   while (cp_lexer_next_token_is_not (parser->lexer, CPP_PRAGMA_EOL)
 	 && cp_lexer_next_token_is_not (parser->lexer, CPP_EOF))
     cp_lexer_consume_token (parser->lexer);
+
   if (cp_lexer_next_token_is_not (parser->lexer, CPP_PRAGMA_EOL))
-    parser->omp_declare_simd->error_seen = true;
+    {
+      if (is_omp)
+	parser->omp_declare_simd->error_seen = true;
+      else
+	parser->oacc_routine->error_seen = true;
+    }
+
   cp_parser_require_pragma_eol (parser, pragma_tok);
   struct cp_token_cache *cp
     = cp_token_cache_new (pragma_tok, cp_lexer_peek_token (parser->lexer));
-  parser->omp_declare_simd->tokens.safe_push (cp);
+
+  if (is_omp)
+    parser->omp_declare_simd->tokens.safe_push (cp);
+  else
+    parser->oacc_routine->tokens.safe_push (cp);
+
   if (first_p)
     {
       while (cp_lexer_next_token_is (parser->lexer, CPP_PRAGMA))
@@ -31827,14 +32460,23 @@ cp_parser_omp_declare_simd (cp_parser *parser, cp_token *pragma_tok,
 	  cp_parser_declaration_statement (parser);
 	  break;
 	}
-      if (parser->omp_declare_simd
+      if (is_omp && parser->omp_declare_simd
 	  && !parser->omp_declare_simd->error_seen
 	  && !parser->omp_declare_simd->fndecl_seen)
 	error_at (pragma_tok->location,
 		  "%<#pragma omp declare simd%> not immediately followed by "
 		  "function declaration or definition");
+      else if (!is_omp && parser->oacc_routine
+	  && !parser->oacc_routine->error_seen
+	  && !parser->oacc_routine->fndecl_seen)
+	error_at (pragma_tok->location,
+		  "%<#pragma acc routine%> not immediately followed by "
+		  "function declaration or definition");
       data.tokens.release ();
-      parser->omp_declare_simd = NULL;
+      if (is_omp)
+	parser->omp_declare_simd = NULL;
+      else
+	parser->oacc_routine = NULL;
     }
 }
 
@@ -32410,7 +33052,7 @@ cp_parser_omp_declare (cp_parser *parser, cp_token *pragma_tok,
 	{
 	  cp_lexer_consume_token (parser->lexer);
 	  cp_parser_omp_declare_simd (parser, pragma_tok,
-				      context);
+				      context, true);
 	  return;
 	}
       cp_ensure_no_omp_declare_simd (parser);
@@ -32438,6 +33080,167 @@ cp_parser_omp_declare (cp_parser *parser, cp_token *pragma_tok,
   cp_parser_require_pragma_eol (parser, pragma_tok);
 }
 
+static void
+cp_parser_oacc_routine_check_parallelism (tree clauses, location_t loc)
+{
+ /* Check of the presence if gang, worker, vector and seq clauses, and
+     throw an error if more than one of those clauses is specified.  */
+  int parallelism = 0;
+  tree c;
+
+  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    switch (OMP_CLAUSE_CODE (c))
+      {
+      case OMP_CLAUSE_GANG:
+      case OMP_CLAUSE_WORKER:
+      case OMP_CLAUSE_VECTOR:
+      case OMP_CLAUSE_SEQ:
+	++parallelism;
+	break;
+      default:
+	break;
+      }
+
+  if (parallelism > 1)
+    {
+      error_at (loc, "invalid combination of gang, worker, vector or seq for"
+		"%<#pragma acc routine%>");
+    }
+}
+
+/* OpenACC 2.0:
+   # pragma acc routine oacc-routine-clause[optseq] new-line
+     function-definition
+
+   # pragma acc routine ( name ) oacc-routine-clause[optseq] new-line
+*/
+
+#define OACC_ROUTINE_CLAUSE_MASK					\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_DEVICE_TYPE)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_GANG)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WORKER)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_VECTOR)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_SEQ)			\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_BIND)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_NOHOST))
+
+#define OACC_ROUTINE_CLAUSE_DEVICE_TYPE_MASK				\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_GANG)	       	\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_WORKER)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_VECTOR)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_SEQ)			\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OACC_CLAUSE_BIND))
+
+static void
+cp_parser_oacc_routine (cp_parser *parser, cp_token *pragma_tok,
+			enum pragma_context context)
+{
+  tree name = NULL_TREE;
+  location_t here = cp_lexer_peek_token (parser->lexer)->location;
+
+  //cp_lexer_consume_token (parser->lexer);
+
+  /* Scan for optional '( name )'.  */
+  if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN))
+    {
+      cp_lexer_consume_token (parser->lexer);
+      name = cp_parser_id_expression (parser, /*template_p=*/false,
+				      /*check_dependency_p=*/true,
+				      /*template_p=*/NULL,
+				      /*declarator_p=*/false,
+				      /*optional_p=*/false);
+      if (name == error_mark_node)
+	return;
+
+      if (cp_lexer_next_token_is_not (parser->lexer, CPP_CLOSE_PAREN))
+	{
+	  error_at (cp_lexer_peek_token (parser->lexer)->location,
+		    "expected %<)%>");
+	  return;
+	}
+      cp_lexer_consume_token (parser->lexer);
+    }
+
+  /* If this routine construct doesn't explicitly have an optional 'name',
+     then handle it the same way as an omp declare simd.  */
+  if (!name)
+    {
+      cp_parser_omp_declare_simd (parser, pragma_tok, context, false);
+      cp_ensure_no_omp_declare_simd (parser);
+      return;
+    }
+
+  /* Build a chain of clauses.  */
+  parser->lexer->in_pragma = true;
+  tree clauses = NULL_TREE;
+  clauses = cp_parser_oacc_all_clauses (parser, OACC_ROUTINE_CLAUSE_MASK,
+					"#pragma acc routine",
+					cp_lexer_peek_token (parser->lexer),
+					OACC_ROUTINE_CLAUSE_DEVICE_TYPE_MASK);
+
+  cp_parser_oacc_routine_check_parallelism (clauses, here);
+
+  TREE_CHAIN (name) = clauses;
+  vec_safe_push (parser->named_oacc_routines, name);
+}
+
+/* Finalize #pragma acc routine clauses after direct declarator has
+   been parsed, and put that into "omp declare target" attribute.  */
+
+static tree
+cp_parser_late_parsing_oacc_routine (cp_parser *parser, tree attrs)
+{
+  struct cp_token_cache *ce;
+  cp_omp_declare_simd_data *data = parser->oacc_routine;
+  int i;
+  location_t here = UNKNOWN_LOCATION;
+
+  if (!data->error_seen && data->fndecl_seen)
+    {
+      error ("%<#pragma acc routine%> not immediately followed by "
+	     "a single function declaration or definition");
+      data->error_seen = true;
+      return attrs;
+    }
+  if (data->error_seen)
+    return attrs;
+
+  tree c, cl = NULL_TREE;
+
+  FOR_EACH_VEC_ELT (data->tokens, i, ce)
+    {
+      cp_parser_push_lexer_for_tokens (parser, ce);
+      parser->lexer->in_pragma = true;
+      here = cp_lexer_peek_token (parser->lexer)->location;
+      gcc_assert (cp_lexer_peek_token (parser->lexer)->type == CPP_PRAGMA);
+      cp_token *pragma_tok = cp_lexer_consume_token (parser->lexer);
+      c = cp_parser_oacc_all_clauses (parser, OACC_ROUTINE_CLAUSE_MASK,
+				     "#pragma acc routine", pragma_tok);
+      cp_parser_pop_lexer (parser);
+
+      if (cl == NULL_TREE)
+	cl = c;
+      else if (c != NULL_TREE)
+	{
+	  OMP_CLAUSE_CHAIN (c) = cl;
+	  cl = c;
+	  TREE_CHAIN (c) = attrs;
+	  if (processing_template_decl)
+	    ATTR_IS_DEPENDENT (c) = 1;
+	  attrs = c;
+	}
+    }
+
+  cp_parser_oacc_routine_check_parallelism (cl, here);
+
+  if (cl != NULL_TREE)
+    cl = tree_cons (NULL_TREE, cl, NULL_TREE);
+
+  attrs = build_tree_list (get_identifier ("omp declare target"), cl);
+  data->fndecl_seen = true;
+  return attrs;
+}
+
 /* Main entry point to OpenMP statement pragmas.  */
 
 static void
@@ -32449,6 +33252,9 @@ cp_parser_omp_construct (cp_parser *parser, cp_token *pragma_tok)
 
   switch (pragma_tok->pragma_kind)
     {
+    case PRAGMA_OACC_ATOMIC:
+      cp_parser_omp_atomic (parser, pragma_tok);
+      return;
     case PRAGMA_OACC_CACHE:
       stmt = cp_parser_oacc_cache (parser, pragma_tok);
       break;
@@ -32461,14 +33267,22 @@ cp_parser_omp_construct (cp_parser *parser, cp_token *pragma_tok)
     case PRAGMA_OACC_EXIT_DATA:
       stmt = cp_parser_oacc_enter_exit_data (parser, pragma_tok, false);
       break;
+    case PRAGMA_OACC_HOST_DATA:
+      stmt = cp_parser_oacc_host_data (parser, pragma_tok);
+      break;
     case PRAGMA_OACC_KERNELS:
-      stmt = cp_parser_oacc_kernels (parser, pragma_tok);
+      strcpy (p_name, "#pragma acc");
+      stmt = cp_parser_oacc_parallel_kernels (parser, pragma_tok, p_name,
+					      false);
       break;
     case PRAGMA_OACC_LOOP:
-      stmt = cp_parser_oacc_loop (parser, pragma_tok);
+      strcpy (p_name, "#pragma acc");
+      stmt = cp_parser_oacc_loop (parser, pragma_tok, p_name, mask, NULL);
       break;
     case PRAGMA_OACC_PARALLEL:
-      stmt = cp_parser_oacc_parallel (parser, pragma_tok);
+      strcpy (p_name, "#pragma acc");
+      stmt = cp_parser_oacc_parallel_kernels (parser, pragma_tok, p_name,
+					      true);
       break;
     case PRAGMA_OACC_UPDATE:
       stmt = cp_parser_oacc_update (parser, pragma_tok);
@@ -32907,7 +33721,7 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context)
   parser->lexer->in_pragma = true;
 
   id = pragma_tok->pragma_kind;
-  if (id != PRAGMA_OMP_DECLARE_REDUCTION)
+  if (id != PRAGMA_OMP_DECLARE_REDUCTION && id != PRAGMA_OACC_ROUTINE)
     cp_ensure_no_omp_declare_simd (parser);
   switch (id)
     {
@@ -33018,15 +33832,65 @@ cp_parser_pragma (cp_parser *parser, enum pragma_context context)
       cp_parser_omp_declare (parser, pragma_tok, context);
       return false;
 
-    case PRAGMA_OACC_CACHE:
-    case PRAGMA_OACC_DATA:
     case PRAGMA_OACC_ENTER_DATA:
+      if (context == pragma_stmt)
+	{
+	  cp_parser_error (parser, "%<#pragma acc enter data%> may only be "
+			   "used in compound statements");
+	  break;
+	}
+      else if (context != pragma_compound)
+	goto bad_stmt;
+      cp_parser_omp_construct (parser, pragma_tok);
+      return true;
+
     case PRAGMA_OACC_EXIT_DATA:
+      if (context == pragma_stmt)
+	{
+	  cp_parser_error (parser, "%<#pragma acc exit data%> may only be "
+			   "used in compound statements");
+	  break;
+	}
+      else if (context != pragma_compound)
+	goto bad_stmt;
+      cp_parser_omp_construct (parser, pragma_tok);
+      return true;
+
+    case PRAGMA_OACC_ROUTINE:
+      cp_parser_oacc_routine (parser, pragma_tok, context);
+      return false;
+
+    case PRAGMA_OACC_UPDATE:
+      if (context == pragma_stmt)
+	{
+	  cp_parser_error (parser, "%<#pragma acc update%> may only be "
+			   "used in compound statements");
+	  break;
+	}
+      else if (context != pragma_compound)
+	goto bad_stmt;
+      cp_parser_omp_construct (parser, pragma_tok);
+      return true;
+
+    case PRAGMA_OACC_WAIT:
+      if (context == pragma_stmt)
+	{
+	  cp_parser_error (parser, "%<#pragma acc wait%> may only be "
+			   "used in compound statements");
+	  break;
+	}
+      else if (context != pragma_compound)
+	goto bad_stmt;
+      cp_parser_omp_construct (parser, pragma_tok);
+      return true;
+
+    case PRAGMA_OACC_ATOMIC:
+    case PRAGMA_OACC_CACHE:
+    case PRAGMA_OACC_DATA:
+    case PRAGMA_OACC_HOST_DATA:
     case PRAGMA_OACC_KERNELS:
     case PRAGMA_OACC_PARALLEL:
     case PRAGMA_OACC_LOOP:
-    case PRAGMA_OACC_UPDATE:
-    case PRAGMA_OACC_WAIT:
     case PRAGMA_OMP_ATOMIC:
     case PRAGMA_OMP_CRITICAL:
     case PRAGMA_OMP_DISTRIBUTE:
@@ -33468,7 +34332,7 @@ cp_parser_cilk_for (cp_parser *parser, tree grain)
   tree clauses = build_omp_clause (EXPR_LOCATION (grain), OMP_CLAUSE_SCHEDULE);
   OMP_CLAUSE_SCHEDULE_KIND (clauses) = OMP_CLAUSE_SCHEDULE_CILKFOR;
   OMP_CLAUSE_SCHEDULE_CHUNK_EXPR (clauses) = grain;
-  clauses = finish_omp_clauses (clauses);
+  clauses = finish_omp_clauses (clauses, false);
 
   tree ret = cp_parser_omp_for_loop (parser, CILK_FOR, clauses, NULL);
   if (ret)
diff --git gcc/cp/parser.h gcc/cp/parser.h
index 76e5367..8eb5484 100644
--- gcc/cp/parser.h
+++ gcc/cp/parser.h
@@ -373,6 +373,10 @@ typedef struct GTY(()) cp_parser {
      necessary.  */
   cp_omp_declare_simd_data * GTY((skip)) cilk_simd_fn_info;
 
+  /* OpenACC specific parser information.  */
+  cp_omp_declare_simd_data * GTY((skip)) oacc_routine;
+  vec <tree, va_gc> *named_oacc_routines;
+
   /* Nonzero if parsing a parameter list where 'auto' should trigger an implicit
      template parameter.  */
   bool auto_is_implicit_function_template_parm_p;
diff --git gcc/cp/pt.c gcc/cp/pt.c
index 129517a..bbd54fe 100644
--- gcc/cp/pt.c
+++ gcc/cp/pt.c
@@ -8990,7 +8990,7 @@ apply_late_template_attributes (tree *decl_p, tree attributes, int attr_flags,
 		  clauses = tsubst_omp_clauses (clauses, true, args,
 						complain, in_decl);
 		  c_omp_declare_simd_clauses_to_decls (*decl_p, clauses);
-		  clauses = finish_omp_clauses (clauses);
+		  clauses = finish_omp_clauses (clauses, false);
 		  tree parms = DECL_ARGUMENTS (*decl_p);
 		  clauses
 		    = c_omp_declare_simd_clauses_to_numbers (parms, clauses);
@@ -13445,6 +13445,14 @@ tsubst_omp_clauses (tree clauses, bool declare_simd,
 	case OMP_CLAUSE_THREAD_LIMIT:
 	case OMP_CLAUSE_SAFELEN:
 	case OMP_CLAUSE_SIMDLEN:
+	case OMP_CLAUSE_NUM_GANGS:
+	case OMP_CLAUSE_NUM_WORKERS:
+	case OMP_CLAUSE_VECTOR_LENGTH:
+	case OMP_CLAUSE_GANG:
+	case OMP_CLAUSE_WORKER:
+	case OMP_CLAUSE_VECTOR:
+	case OMP_CLAUSE_ASYNC:
+	case OMP_CLAUSE_WAIT:
 	  OMP_CLAUSE_OPERAND (nc, 0)
 	    = tsubst_expr (OMP_CLAUSE_OPERAND (oc, 0), args, complain, 
 			   in_decl, /*integral_constant_expression_p=*/false);
@@ -13491,6 +13499,10 @@ tsubst_omp_clauses (tree clauses, bool declare_simd,
 	case OMP_CLAUSE_PARALLEL:
 	case OMP_CLAUSE_SECTIONS:
 	case OMP_CLAUSE_TASKGROUP:
+	case OMP_CLAUSE_INDEPENDENT:
+	case OMP_CLAUSE_AUTO:
+	case OMP_CLAUSE_SEQ:
+	case OMP_CLAUSE_TILE:
 	  break;
 	default:
 	  gcc_unreachable ();
@@ -13499,7 +13511,7 @@ tsubst_omp_clauses (tree clauses, bool declare_simd,
 
   new_clauses = nreverse (new_clauses);
   if (!declare_simd)
-    new_clauses = finish_omp_clauses (new_clauses);
+    new_clauses = finish_omp_clauses (new_clauses, false);
   return new_clauses;
 }
 
@@ -13639,7 +13651,7 @@ tsubst_omp_for_iterator (tree t, int i, tree declv, tree initv,
 	{
 	  c = build_omp_clause (input_location, OMP_CLAUSE_PRIVATE);
 	  OMP_CLAUSE_DECL (c) = decl;
-	  c = finish_omp_clauses (c);
+	  c = finish_omp_clauses (c, false);
 	  if (c)
 	    {
 	      OMP_CLAUSE_CHAIN (c) = *clauses;
@@ -14108,6 +14120,22 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
 	= OMP_PARALLEL_COMBINED (t);
       break;
 
+    case OACC_PARALLEL:
+      tmp = tsubst_omp_clauses (OACC_PARALLEL_CLAUSES (t), false,
+				args, complain, in_decl);
+      stmt = begin_omp_parallel ();
+      RECUR (OACC_PARALLEL_BODY (t));
+      finish_oacc_parallel (tmp, stmt);
+      break;
+
+    case OACC_KERNELS:
+      tmp = tsubst_omp_clauses (OACC_KERNELS_CLAUSES (t), false,
+				args, complain, in_decl);
+      stmt = begin_omp_parallel ();
+      RECUR (OACC_KERNELS_BODY (t));
+      finish_oacc_kernels (tmp, stmt);
+      break;
+
     case OMP_TASK:
       tmp = tsubst_omp_clauses (OMP_TASK_CLAUSES (t), false,
 				args, complain, in_decl);
@@ -14121,6 +14149,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
     case CILK_SIMD:
     case CILK_FOR:
     case OMP_DISTRIBUTE:
+    case OACC_LOOP:
       {
 	tree clauses, body, pre_body;
 	tree declv = NULL_TREE, initv = NULL_TREE, condv = NULL_TREE;
@@ -14186,6 +14215,7 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
       add_stmt (t);
       break;
 
+    case OACC_DATA:
     case OMP_TARGET_DATA:
     case OMP_TARGET:
       tmp = tsubst_omp_clauses (OMP_CLAUSES (t), false,
@@ -14203,10 +14233,13 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl,
       break;
 
     case OMP_TARGET_UPDATE:
-      tmp = tsubst_omp_clauses (OMP_TARGET_UPDATE_CLAUSES (t), false,
+    case OACC_ENTER_DATA:
+    case OACC_EXIT_DATA:
+    case OACC_UPDATE:
+      tmp = tsubst_omp_clauses (OMP_STANDALONE_CLAUSES (t), false,
 				args, complain, in_decl);
       t = copy_node (t);
-      OMP_TARGET_UPDATE_CLAUSES (t) = tmp;
+      OMP_STANDALONE_CLAUSES (t) = tmp;
       add_stmt (t);
       break;
 
diff --git gcc/cp/semantics.c gcc/cp/semantics.c
index 0fc08b5f..ada1203 100644
--- gcc/cp/semantics.c
+++ gcc/cp/semantics.c
@@ -5294,19 +5294,21 @@ finish_omp_reduction_clause (tree c, bool *need_default_ctor, bool *need_dtor)
    Remove any elements from the list that are invalid.  */
 
 tree
-finish_omp_clauses (tree clauses)
+finish_omp_clauses (tree clauses, bool oacc)
 {
   bitmap_head generic_head, firstprivate_head, lastprivate_head;
-  bitmap_head aligned_head;
+  bitmap_head aligned_head, oacc_data_head;
   tree c, t, *pc;
   bool branch_seen = false;
   bool copyprivate_seen = false;
+  bool oacc_data = false;
 
   bitmap_obstack_initialize (NULL);
   bitmap_initialize (&generic_head, &bitmap_default_obstack);
   bitmap_initialize (&firstprivate_head, &bitmap_default_obstack);
   bitmap_initialize (&lastprivate_head, &bitmap_default_obstack);
   bitmap_initialize (&aligned_head, &bitmap_default_obstack);
+  bitmap_initialize (&oacc_data_head, &bitmap_default_obstack);
 
   for (pc = &clauses, c = clauses; c ; c = *pc)
     {
@@ -5317,9 +5319,21 @@ finish_omp_clauses (tree clauses)
 	case OMP_CLAUSE_SHARED:
 	  goto check_dup_generic;
 	case OMP_CLAUSE_PRIVATE:
-	  goto check_dup_generic;
+	  if (oacc)
+	    {
+	      oacc_data = true;
+	      goto check_dup_oacc;
+	    }
+	  else
+	    goto check_dup_generic;
 	case OMP_CLAUSE_REDUCTION:
-	  goto check_dup_generic;
+	  if (oacc)
+	    {
+	      oacc_data = false;
+	      goto check_dup_oacc;
+	    }
+	  else
+	    goto check_dup_generic;
 	case OMP_CLAUSE_COPYPRIVATE:
 	  copyprivate_seen = true;
 	  goto check_dup_generic;
@@ -5403,6 +5417,44 @@ finish_omp_clauses (tree clauses)
 	  else
 	    bitmap_set_bit (&generic_head, DECL_UID (t));
 	  break;
+	check_dup_oacc:
+	  t = OMP_CLAUSE_DECL (c);
+	  if (!VAR_P (t) && TREE_CODE (t) != PARM_DECL)
+	    {
+	      if (processing_template_decl)
+		break;
+	      if (DECL_P (t))
+		error ("%qD is not a variable in clause %qs", t,
+		       omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
+	      else
+		error ("%qE is not a variable in clause %qs", t,
+		       omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
+	      remove = true;
+	    }
+	  else if (oacc_data)
+	    {
+	      if (bitmap_bit_p (&oacc_data_head, DECL_UID (t)))
+		{
+		  error_at (OMP_CLAUSE_LOCATION (c),
+			    "%qE appears more than once in data clauses", t);
+		  remove = true;
+		}
+	      else
+		bitmap_set_bit (&oacc_data_head, DECL_UID (t));
+	    }
+	  else
+	    {
+	      if (bitmap_bit_p (&generic_head, DECL_UID (t)))
+		{
+		  error_at (OMP_CLAUSE_LOCATION (c),
+			    "%qE appears more than once in data clauses", t);
+		  remove = true;
+		}
+	      else
+		bitmap_set_bit (&generic_head, DECL_UID (t));
+	    }
+	  break;
+
 
 	case OMP_CLAUSE_FIRSTPRIVATE:
 	  t = OMP_CLAUSE_DECL (c);
@@ -5426,6 +5478,37 @@ finish_omp_clauses (tree clauses)
 	    bitmap_set_bit (&firstprivate_head, DECL_UID (t));
 	  break;
 
+	case OMP_CLAUSE_GANG:
+	case OMP_CLAUSE_VECTOR:
+	case OMP_CLAUSE_WORKER:
+	  /* Operand 0 is the num: or length: argument.  */
+	  t = OMP_CLAUSE_OPERAND (c, 0);
+	  if (t == NULL_TREE)
+	    break;
+
+	  t = maybe_convert_cond (t);
+	  if (t == error_mark_node)
+	    remove = true;
+	  else if (!processing_template_decl)
+	    t = fold_build_cleanup_point_expr (TREE_TYPE (t), t);
+	  OMP_CLAUSE_OPERAND (c, 0) = t;
+
+	  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_GANG)
+	    break;
+
+	  /* Ooperand 1 is the gang static: argument.  */
+	  t = OMP_CLAUSE_OPERAND (c, 1);
+	  if (t == NULL_TREE)
+	    break;
+
+	  t = maybe_convert_cond (t);
+	  if (t == error_mark_node)
+	    remove = true;
+	  else if (!processing_template_decl)
+	    t = fold_build_cleanup_point_expr (TREE_TYPE (t), t);
+	  OMP_CLAUSE_OPERAND (c, 1) = t;
+	  break;
+
 	case OMP_CLAUSE_LASTPRIVATE:
 	  t = OMP_CLAUSE_DECL (c);
 	  if (!VAR_P (t) && TREE_CODE (t) != PARM_DECL)
@@ -5469,13 +5552,30 @@ finish_omp_clauses (tree clauses)
 	  break;
 
 	case OMP_CLAUSE_NUM_THREADS:
-	  t = OMP_CLAUSE_NUM_THREADS_EXPR (c);
+	case OMP_CLAUSE_NUM_GANGS:
+	case OMP_CLAUSE_NUM_WORKERS:
+	case OMP_CLAUSE_VECTOR_LENGTH:
+	  t = OMP_CLAUSE_OPERAND (c, 0);
 	  if (t == error_mark_node)
 	    remove = true;
 	  else if (!type_dependent_expression_p (t)
 		   && !INTEGRAL_TYPE_P (TREE_TYPE (t)))
 	    {
-	      error ("num_threads expression must be integral");
+	     switch (OMP_CLAUSE_CODE (c))
+		{
+		case OMP_CLAUSE_NUM_THREADS:
+		  error ("num_threads expression must be integral"); break;
+		case OMP_CLAUSE_NUM_GANGS:
+		  error ("%<num_gangs%> expression must be integral"); break;
+		case OMP_CLAUSE_NUM_WORKERS:
+		  error ("%<num_workers%> expression must be integral");
+		  break;
+		case OMP_CLAUSE_VECTOR_LENGTH:
+		  error ("%<vector_length%> expression must be integral");
+		  break;
+		default:
+		  error ("invalid argument");
+		}
 	      remove = true;
 	    }
 	  else
@@ -5483,7 +5583,7 @@ finish_omp_clauses (tree clauses)
 	      t = mark_rvalue_use (t);
 	      if (!processing_template_decl)
 		t = fold_build_cleanup_point_expr (TREE_TYPE (t), t);
-	      OMP_CLAUSE_NUM_THREADS_EXPR (c) = t;
+	      OMP_CLAUSE_OPERAND (c, 0) = t;
 	    }
 	  break;
 
@@ -5591,16 +5691,6 @@ finish_omp_clauses (tree clauses)
 	    }
 	  break;
 
-	case OMP_CLAUSE_VECTOR_LENGTH:
-	  t = OMP_CLAUSE_VECTOR_LENGTH_EXPR (c);
-	  t = maybe_convert_cond (t);
-	  if (t == error_mark_node)
-	    remove = true;
-	  else if (!processing_template_decl)
-	    t = fold_build_cleanup_point_expr (TREE_TYPE (t), t);
-	  OMP_CLAUSE_VECTOR_LENGTH_EXPR (c) = t;
-	  break;
-
 	case OMP_CLAUSE_WAIT:
 	  t = OMP_CLAUSE_WAIT_EXPR (c);
 	  if (t == error_mark_node)
@@ -5861,6 +5951,13 @@ finish_omp_clauses (tree clauses)
 	case OMP_CLAUSE_TASKGROUP:
 	case OMP_CLAUSE_PROC_BIND:
 	case OMP_CLAUSE__CILK_FOR_COUNT_:
+	case OMP_CLAUSE_USE_DEVICE:
+	case OMP_CLAUSE_AUTO:
+	case OMP_CLAUSE_INDEPENDENT:
+	case OMP_CLAUSE_SEQ:
+	case OMP_CLAUSE_BIND:
+	case OMP_CLAUSE_NOHOST:
+	case OMP_CLAUSE_TILE:
 	  break;
 
 	case OMP_CLAUSE_INBRANCH:
@@ -6146,6 +6243,24 @@ finish_oacc_data (tree clauses, tree block)
   return add_stmt (stmt);
 }
 
+/* Generate OACC_HOST_DATA, with CLAUSES and BLOCK as its compound
+   statement.  */
+
+tree
+finish_oacc_host_data (tree clauses, tree block)
+{
+  tree stmt;
+  
+  block = finish_omp_structured_block (block);
+  
+  stmt = make_node (OACC_HOST_DATA);
+  TREE_TYPE (stmt) = void_type_node;
+  OACC_HOST_DATA_CLAUSES (stmt) = clauses;
+  OACC_HOST_DATA_BODY (stmt) = block;
+
+  return add_stmt (stmt);
+}
+
 /* Generate OACC_KERNELS, with CLAUSES and BLOCK as its compound
    statement.  LOC is the location of the OACC_KERNELS.  */
 
@@ -6805,7 +6920,7 @@ finish_omp_for (location_t locus, enum tree_code code, tree declv, tree initv,
       OMP_CLAUSE_OPERAND (c, 0)
 	= cilk_for_number_of_iterations (omp_for);
       OMP_CLAUSE_CHAIN (c) = clauses;
-      OMP_PARALLEL_CLAUSES (omp_par) = finish_omp_clauses (c);
+      OMP_PARALLEL_CLAUSES (omp_par) = finish_omp_clauses (c, false);
       add_stmt (omp_par);
       return omp_par;
     }


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Next set of OpenACC changes: Fortran
  2015-05-05  8:54 Next set of OpenACC changes Thomas Schwinge
  2015-05-05  8:56 ` Next set of OpenACC changes: middle end, libgomp Thomas Schwinge
  2015-05-05  8:58 ` Next set of OpenACC changes: C family Thomas Schwinge
@ 2015-05-05  8:59 ` Thomas Schwinge
  2015-05-05 10:42   ` Bernhard Reutner-Fischer
  2015-05-05  9:00 ` Next set of OpenACC changes: Testsuite Thomas Schwinge
  2015-05-11 16:35 ` [gomp4] Next set of OpenACC changes Thomas Schwinge
  4 siblings, 1 reply; 11+ messages in thread
From: Thomas Schwinge @ 2015-05-05  8:59 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, fortran
  Cc: Bernd Schmidt, Cesar Philippidis, Chung-Lin Tang, James Norris,
	Joseph Myers, Julian Brown, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 53630 bytes --]

Hi!

On Tue, 05 May 2015 10:54:02 +0200, I wrote:
> In follow-up messages, I'll be posting the separated parts (for easier
> review) of a next set of OpenACC changes that we'd like to commit.
> ChangeLog updates not yet written; will do that before commit, obviously.

 gcc/fortran/dump-parse-tree.c                      |   12 +-
 gcc/fortran/gfortran.h                             |   50 +-
 gcc/fortran/match.h                                |    1 +
 gcc/fortran/openmp.c                               |  581 +++++--
 gcc/fortran/parse.c                                |   65 +-
 gcc/fortran/parse.h                                |    2 +-
 gcc/fortran/resolve.c                              |    5 +
 gcc/fortran/st.c                                   |    7 +
 gcc/fortran/trans-decl.c                           |   62 +-
 gcc/fortran/trans-openmp.c                         |   66 +-
 gcc/fortran/trans-stmt.c                           |    7 +-
 gcc/fortran/trans-stmt.h                           |    2 +-
 gcc/fortran/trans.c                                |    2 +

diff --git gcc/fortran/dump-parse-tree.c gcc/fortran/dump-parse-tree.c
index 83ecbaa..48476af 100644
--- gcc/fortran/dump-parse-tree.c
+++ gcc/fortran/dump-parse-tree.c
@@ -2570,12 +2570,16 @@ show_namespace (gfc_namespace *ns)
   for (eq = ns->equiv; eq; eq = eq->next)
     show_equiv (eq);
 
-  if (ns->oacc_declare_clauses)
+  if (ns->oacc_declare)
     {
+      struct gfc_oacc_declare *decl;
       /* Dump !$ACC DECLARE clauses.  */
-      show_indent ();
-      fprintf (dumpfile, "!$ACC DECLARE");
-      show_omp_clauses (ns->oacc_declare_clauses);
+      for (decl = ns->oacc_declare; decl; decl = decl->next)
+	{
+	  show_indent ();
+	  fprintf (dumpfile, "!$ACC DECLARE");
+	  show_omp_clauses (decl->clauses);
+	}
     }
 
   fputc ('\n', dumpfile);
diff --git gcc/fortran/gfortran.h gcc/fortran/gfortran.h
index 832a6ce..9258786 100644
--- gcc/fortran/gfortran.h
+++ gcc/fortran/gfortran.h
@@ -222,6 +222,7 @@ typedef enum
   ST_OACC_END_LOOP, ST_OACC_DECLARE, ST_OACC_UPDATE, ST_OACC_WAIT,
   ST_OACC_CACHE, ST_OACC_KERNELS_LOOP, ST_OACC_END_KERNELS_LOOP,
   ST_OACC_ENTER_DATA, ST_OACC_EXIT_DATA, ST_OACC_ROUTINE,
+  ST_OACC_ATOMIC, ST_OACC_END_ATOMIC,
   ST_OMP_ATOMIC, ST_OMP_BARRIER, ST_OMP_CRITICAL, ST_OMP_END_ATOMIC,
   ST_OMP_END_CRITICAL, ST_OMP_END_DO, ST_OMP_END_MASTER, ST_OMP_END_ORDERED,
   ST_OMP_END_PARALLEL, ST_OMP_END_PARALLEL_DO, ST_OMP_END_PARALLEL_SECTIONS,
@@ -1242,10 +1243,14 @@ typedef struct gfc_omp_clauses
   struct gfc_expr *num_gangs_expr;
   struct gfc_expr *num_workers_expr;
   struct gfc_expr *vector_length_expr;
+  struct gfc_symbol *routine_bind;
+  int dtype;
+  struct gfc_omp_clauses *dtype_clauses;
   gfc_expr_list *wait_list;
   gfc_expr_list *tile_list;
   unsigned async:1, gang:1, worker:1, vector:1, seq:1, independent:1;
-  unsigned wait:1, par_auto:1, gang_static:1;
+  unsigned wait:1, par_auto:1, gang_static:1, nohost:1, acc_collapse:1, bind:1;
+  unsigned num_gangs:1, num_workers:1, vector_length:1, tile:1;
   locus loc;
 
 }
@@ -1253,6 +1258,17 @@ gfc_omp_clauses;
 
 #define gfc_get_omp_clauses() XCNEW (gfc_omp_clauses)
 
+/* Node in the linked list used for storing !$oacc declare constructs.  */
+
+typedef struct gfc_oacc_declare
+{
+  struct gfc_oacc_declare *next;
+  locus where;
+  gfc_omp_clauses *clauses;
+}
+gfc_oacc_declare;
+#define gfc_get_oacc_declare() XCNEW (gfc_oacc_declare)
+
 
 /* Node in the linked list used for storing !$omp declare simd constructs.  */
 
@@ -1592,6 +1608,16 @@ gfc_dt_list;
   /* A list of all derived types.  */
   extern gfc_dt_list *gfc_derived_types;
 
+typedef struct gfc_oacc_routine_name
+{
+  struct gfc_symbol *sym;
+  struct gfc_omp_clauses *clauses;
+  struct gfc_oacc_routine_name *next;
+}
+gfc_oacc_routine_name;
+
+#define gfc_get_oacc_routine_name() XCNEW (gfc_oacc_routine_name)
+
 /* A namespace describes the contents of procedure, module, interface block
    or BLOCK construct.  */
 /* ??? Anything else use these?  */
@@ -1656,7 +1682,13 @@ typedef struct gfc_namespace
   struct gfc_data *data, *old_data;
 
   /* !$ACC DECLARE clauses.  */
-  gfc_omp_clauses *oacc_declare_clauses;
+  struct gfc_oacc_declare *oacc_declare;
+
+  /* !$ACC ROUTINE clauses.  */
+  gfc_omp_clauses *oacc_routine_clauses;
+
+  /* !$ACC ROUTINE names.  */
+  gfc_oacc_routine_name *oacc_routine_names;
 
   gfc_charlen *cl_list, *old_cl_list;
 
@@ -1703,6 +1735,9 @@ typedef struct gfc_namespace
 
   /* Set to 1 for !$OMP DECLARE REDUCTION namespaces.  */
   unsigned omp_udr_ns:1;
+
+  /* Set to 1 for !$ACC ROUTINE namespaces.  */
+  unsigned oacc_routine:1;
 }
 gfc_namespace;
 
@@ -2331,10 +2366,11 @@ typedef enum
   EXEC_READ, EXEC_WRITE, EXEC_IOLENGTH, EXEC_TRANSFER, EXEC_DT_END,
   EXEC_BACKSPACE, EXEC_ENDFILE, EXEC_INQUIRE, EXEC_REWIND, EXEC_FLUSH,
   EXEC_LOCK, EXEC_UNLOCK,
-  EXEC_OACC_KERNELS_LOOP, EXEC_OACC_PARALLEL_LOOP,
+  EXEC_OACC_KERNELS_LOOP, EXEC_OACC_PARALLEL_LOOP, EXEC_OACC_ROUTINE,
   EXEC_OACC_PARALLEL, EXEC_OACC_KERNELS, EXEC_OACC_DATA, EXEC_OACC_HOST_DATA,
   EXEC_OACC_LOOP, EXEC_OACC_UPDATE, EXEC_OACC_WAIT, EXEC_OACC_CACHE,
-  EXEC_OACC_ENTER_DATA, EXEC_OACC_EXIT_DATA,
+  EXEC_OACC_ENTER_DATA, EXEC_OACC_EXIT_DATA, EXEC_OACC_ATOMIC,
+  EXEC_OACC_DECLARE,
   EXEC_OMP_CRITICAL, EXEC_OMP_DO, EXEC_OMP_FLUSH, EXEC_OMP_MASTER,
   EXEC_OMP_ORDERED, EXEC_OMP_PARALLEL, EXEC_OMP_PARALLEL_DO,
   EXEC_OMP_PARALLEL_SECTIONS, EXEC_OMP_PARALLEL_WORKSHARE,
@@ -2416,6 +2452,7 @@ typedef struct gfc_code
     int stop_code;
     gfc_entry_list *entry;
     gfc_omp_clauses *omp_clauses;
+    gfc_oacc_declare *oacc_declare;
     const char *omp_name;
     gfc_omp_namelist *omp_namelist;
     bool omp_bool;
@@ -2923,6 +2960,7 @@ gfc_expr *gfc_get_parentheses (gfc_expr *);
 /* openmp.c */
 struct gfc_omp_saved_state { void *ptrs[2]; int ints[1]; };
 void gfc_free_omp_clauses (gfc_omp_clauses *);
+void gfc_free_oacc_declares (struct gfc_oacc_declare *);
 void gfc_free_omp_declare_simd (gfc_omp_declare_simd *);
 void gfc_free_omp_declare_simd_list (gfc_omp_declare_simd *);
 void gfc_free_omp_udr (gfc_omp_udr *);
@@ -3231,4 +3269,8 @@ int gfc_code_walker (gfc_code **, walk_code_fn_t, walk_expr_fn_t, void *);
 
 void gfc_convert_mpz_to_signed (mpz_t, int);
 
+/* trans-decl.c */
+
+void insert_oacc_declare (gfc_namespace *);
+
 #endif /* GCC_GFORTRAN_H  */
diff --git gcc/fortran/match.h gcc/fortran/match.h
index 96d3ec1..202e175 100644
--- gcc/fortran/match.h
+++ gcc/fortran/match.h
@@ -123,6 +123,7 @@ gfc_common_head *gfc_get_common (const char *, int);
 /* openmp.c.  */
 
 /* OpenACC directive matchers.  */
+match gfc_match_oacc_atomic (void);
 match gfc_match_oacc_cache (void);
 match gfc_match_oacc_wait (void);
 match gfc_match_oacc_update (void);
diff --git gcc/fortran/openmp.c gcc/fortran/openmp.c
index 21de607..883676e 100644
--- gcc/fortran/openmp.c
+++ gcc/fortran/openmp.c
@@ -92,6 +92,25 @@ gfc_free_omp_clauses (gfc_omp_clauses *c)
   free (c);
 }
 
+/* Free oacc_declare structures.  */
+
+void
+gfc_free_oacc_declares (struct gfc_oacc_declare *oc)
+{
+  struct gfc_oacc_declare *decl = oc;
+
+  do
+    {
+      struct gfc_oacc_declare *next;
+
+      next = decl->next;
+      gfc_free_omp_clauses (decl->clauses);
+      free (decl);
+      decl = next;
+    }
+  while (decl);
+}
+
 /* Free expression list. */
 void
 gfc_free_expr_list (gfc_expr_list *list)
@@ -447,21 +466,26 @@ match_oacc_clause_gang (gfc_omp_clauses *cp)
 #define OMP_CLAUSE_INDEPENDENT		((uint64_t) 1 << 49)
 #define OMP_CLAUSE_USE_DEVICE		((uint64_t) 1 << 50)
 #define OMP_CLAUSE_DEVICE_RESIDENT	((uint64_t) 1 << 51)
-#define OMP_CLAUSE_HOST_SELF		((uint64_t) 1 << 52)
+#define OMP_CLAUSE_HOST			((uint64_t) 1 << 52)
 #define OMP_CLAUSE_OACC_DEVICE		((uint64_t) 1 << 53)
 #define OMP_CLAUSE_WAIT			((uint64_t) 1 << 54)
 #define OMP_CLAUSE_DELETE		((uint64_t) 1 << 55)
 #define OMP_CLAUSE_AUTO			((uint64_t) 1 << 56)
 #define OMP_CLAUSE_TILE			((uint64_t) 1 << 57)
+#define OMP_CLAUSE_BIND			((uint64_t) 1 << 58)
+#define OMP_CLAUSE_NOHOST		((uint64_t) 1 << 59)
+#define OMP_CLAUSE_DEVICE_TYPE		((uint64_t) 1 << 60)
 
 /* Helper function for OpenACC and OpenMP clauses involving memory
    mapping.  */
 
 static bool
-gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op)
+gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op,
+			  bool allow_sections = true)
 {
   gfc_omp_namelist **head = NULL;
-  if (gfc_match_omp_variable_list ("", list, false, NULL, &head, true)
+  if (gfc_match_omp_variable_list ("", list, false, NULL, &head,
+				   allow_sections)
       == MATCH_YES)
     {
       gfc_omp_namelist *n;
@@ -478,11 +502,14 @@ gfc_match_omp_map_clause (gfc_omp_namelist **list, gfc_omp_map_op map_op)
 
 static match
 gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
-		       bool first = true, bool needs_space = true,
-		       bool openacc = false)
+		       uint64_t dtype_mask, bool first = true,
+		       bool needs_space = true, bool openacc = false)
 {
-  gfc_omp_clauses *c = gfc_get_omp_clauses ();
+  gfc_omp_clauses *base_clauses, *c = gfc_get_omp_clauses ();
   locus old_loc;
+  bool scan_dtype = false;
+
+  base_clauses = c;
 
   *cp = NULL;
   while (1)
@@ -531,7 +558,10 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
       if ((mask & OMP_CLAUSE_VECTOR_LENGTH) && c->vector_length_expr == NULL
 	  && gfc_match ("vector_length ( %e )", &c->vector_length_expr)
 	  == MATCH_YES)
-	continue;
+	{
+	  c->vector_length = 1;
+	  continue;
+	}
       if ((mask & OMP_CLAUSE_VECTOR) && !c->vector)
 	if (gfc_match ("vector") == MATCH_YES)
 	  {
@@ -596,11 +626,17 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 	}
       if ((mask & OMP_CLAUSE_NUM_GANGS) && c->num_gangs_expr == NULL
 	  && gfc_match ("num_gangs ( %e )", &c->num_gangs_expr) == MATCH_YES)
-	continue;
+	{
+	  c->num_gangs = 1;
+	  continue;
+	}
       if ((mask & OMP_CLAUSE_NUM_WORKERS) && c->num_workers_expr == NULL
 	  && gfc_match ("num_workers ( %e )", &c->num_workers_expr)
 	  == MATCH_YES)
-	continue;
+	{
+	  c->num_workers = 1;
+	  continue;
+	}
       if ((mask & OMP_CLAUSE_COPY)
 	  && gfc_match ("copy ( ") == MATCH_YES
 	  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
@@ -680,6 +716,18 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 	      continue;
 	    }
 	}
+      if ((mask & OMP_CLAUSE_BIND) && c->routine_bind == NULL
+	  && gfc_match ("bind ( %s )", &c->routine_bind) == MATCH_YES)
+	{
+	  c->bind = 1;
+	  continue;
+	}
+      if ((mask & OMP_CLAUSE_NOHOST) && !c->nohost
+	  && gfc_match ("nohost") == MATCH_YES)
+	{
+	  c->nohost = true;
+	  continue;
+	}
       if ((mask & OMP_CLAUSE_USE_DEVICE)
 	  && gfc_match_omp_variable_list ("use_device (",
 					  &c->lists[OMP_LIST_USE_DEVICE], true)
@@ -696,15 +744,20 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 	  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
 				       OMP_MAP_FORCE_TO))
 	continue;
-      if ((mask & OMP_CLAUSE_HOST_SELF)
+      if ((mask & OMP_CLAUSE_HOST)
 	  && (gfc_match ("host ( ") == MATCH_YES
-	      || gfc_match ("self ( ") == MATCH_YES)
+	      || gfc_match ("self ( ") == MATCH_YES) /* "self" is a synonym for
+							"host".  */
 	  && gfc_match_omp_map_clause (&c->lists[OMP_LIST_MAP],
 				       OMP_MAP_FORCE_FROM))
 	continue;
       if ((mask & OMP_CLAUSE_TILE)
+	  && !c->tile_list
 	  && match_oacc_expr_list ("tile (", &c->tile_list, true) == MATCH_YES)
-	continue;
+	{
+	  c->tile = 1;
+	  continue;
+	}
       if ((mask & OMP_CLAUSE_SEQ) && !c->seq
 	  && gfc_match ("seq") == MATCH_YES)
 	{
@@ -856,13 +909,14 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
       if ((mask & OMP_CLAUSE_DEFAULT)
 	  && c->default_sharing == OMP_DEFAULT_UNKNOWN)
 	{
-	  if (gfc_match ("default ( shared )") == MATCH_YES)
+	  if (!openacc && gfc_match ("default ( shared )") == MATCH_YES)
 	    c->default_sharing = OMP_DEFAULT_SHARED;
-	  else if (gfc_match ("default ( private )") == MATCH_YES)
+	  else if (!openacc && gfc_match ("default ( private )") == MATCH_YES)
 	    c->default_sharing = OMP_DEFAULT_PRIVATE;
 	  else if (gfc_match ("default ( none )") == MATCH_YES)
 	    c->default_sharing = OMP_DEFAULT_NONE;
-	  else if (gfc_match ("default ( firstprivate )") == MATCH_YES)
+	  else if (!openacc
+		   && gfc_match ("default ( firstprivate )") == MATCH_YES)
 	    c->default_sharing = OMP_DEFAULT_FIRSTPRIVATE;
 	  if (c->default_sharing != OMP_DEFAULT_UNKNOWN)
 	    continue;
@@ -938,6 +992,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 		}
 	      c->collapse = collapse;
 	      gfc_free_expr (cexpr);
+	      c->acc_collapse = 1;
 	      continue;
 	    }
 	}
@@ -1083,6 +1138,47 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
       if ((mask & OMP_CLAUSE_DEVICE) && c->device == NULL
 	  && gfc_match ("device ( %e )", &c->device) == MATCH_YES)
 	continue;
+      if (((mask & OMP_CLAUSE_DEVICE_TYPE) || scan_dtype)
+	  && (gfc_match ("device_type ( ") == MATCH_YES
+	      || gfc_match ("dtype ( ") == MATCH_YES))
+	{
+	  int device = GOMP_DEVICE_NONE;
+	  gfc_omp_clauses *t = gfc_get_omp_clauses ();
+
+	  c->dtype_clauses = t;
+	  c = t;
+
+	  if (gfc_match (" * ") == MATCH_YES)
+	    device = GOMP_DEVICE_DEFAULT;
+	  else
+	    {
+	      char n[GFC_MAX_SYMBOL_LEN + 1];
+
+	      while (gfc_match (" %n ", n) == MATCH_YES)
+		{
+		  if (!strcasecmp ("nvidia", n))
+		    device = GOMP_DEVICE_NVIDIA_PTX;
+		  else
+		    {
+		      /* The OpenACC technical committee advises compilers
+			 to silently ignore unknown devices.  */
+		    }
+		  gfc_match (" , ");
+		}
+	    }
+
+	  /* Consume the trailing ')'.  */
+	  if (gfc_match (" ) ") != MATCH_YES)
+	    {
+	      gfc_error ("expected %<)%>");
+	      continue;
+	    }
+
+	  c->dtype = device;
+	  mask = dtype_mask;
+	  scan_dtype = true;
+	  continue;
+	}
       if ((mask & OMP_CLAUSE_THREAD_LIMIT) && c->thread_limit == NULL
 	  && gfc_match ("thread_limit ( %e )", &c->thread_limit) == MATCH_YES)
 	continue;
@@ -1129,11 +1225,82 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
 
   if (gfc_match_omp_eos () != MATCH_YES)
     {
-      gfc_free_omp_clauses (c);
+      gfc_omp_clauses *t;
+      c = base_clauses->dtype_clauses;
+      while (c)
+	{
+	  t = c->dtype_clauses;
+	  gfc_free_omp_clauses (c);
+	  c = t;
+	}
+      gfc_free_omp_clauses (base_clauses);
       return MATCH_ERROR;
     }
 
-  *cp = c;
+  /* Filter out the device_type clauses.  */
+  if (base_clauses->dtype_clauses)
+    {
+      gfc_omp_clauses *t;
+      gfc_omp_clauses *seen_default = NULL;
+      gfc_omp_clauses *seen_nvidia = NULL;
+
+      /* Scan for device_type clauses.  */
+      c = base_clauses->dtype_clauses;
+      while (c)
+	{
+	  if (c->dtype == GOMP_DEVICE_DEFAULT)
+	    {
+	      if (seen_default)
+		gfc_error ("duplicate device_type (*)");
+	      else
+		seen_default = c;
+	    }
+	  else if (c->dtype == GOMP_DEVICE_NVIDIA_PTX)
+	    {
+	      if (seen_nvidia)
+		gfc_error ("duplicate device_type (nvidia)");
+	      else
+		seen_nvidia = c;
+	    }
+	  c = c->dtype_clauses;
+	}
+
+      /* Update the clauses in the original set of clauses.  */
+      c = seen_nvidia ? seen_nvidia : seen_default;
+      if (c)
+	{
+#define acc_clause0(mask) do if (c->mask) { base_clauses->mask = 1; } while (0)
+#define acc_clause1(mask, expr, type) do if (c->mask) { type t; \
+	      base_clauses->mask = 1; t = base_clauses->expr; \
+	      base_clauses->expr = c->expr; c->expr = t; } while (0)
+
+	  acc_clause1 (acc_collapse, collapse, int);
+	  acc_clause1 (gang, gang_expr, gfc_expr *);
+	  acc_clause1 (worker, worker_expr, gfc_expr *);
+	  acc_clause1 (vector, vector_expr, gfc_expr *);
+	  acc_clause0 (par_auto);
+	  acc_clause0 (independent);
+	  acc_clause0 (seq);
+	  acc_clause1 (tile, tile_list, gfc_expr_list *);
+	  acc_clause1 (async, async_expr, gfc_expr *);
+	  acc_clause1 (wait, wait_list, gfc_expr_list *);
+	  acc_clause1 (num_gangs, num_gangs_expr, gfc_expr *);
+	  acc_clause1 (num_workers, num_workers_expr, gfc_expr *);
+	  acc_clause1 (vector_length, vector_length_expr, gfc_expr *);
+	  acc_clause1 (bind, routine_bind, gfc_symbol *);
+	}
+
+      /* Remove the device_type clauses.  */
+      c = base_clauses->dtype_clauses;
+      while (c)
+	{
+	  t = c->dtype_clauses;
+	  gfc_free_omp_clauses (c);
+	  c = t;
+	}      
+    }
+
+  *cp = base_clauses;
   return MATCH_YES;
 }
 
@@ -1145,13 +1312,15 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
    | OMP_CLAUSE_CREATE | OMP_CLAUSE_PRESENT | OMP_CLAUSE_PRESENT_OR_COPY      \
    | OMP_CLAUSE_PRESENT_OR_COPYIN | OMP_CLAUSE_PRESENT_OR_COPYOUT             \
    | OMP_CLAUSE_PRESENT_OR_CREATE | OMP_CLAUSE_DEVICEPTR | OMP_CLAUSE_PRIVATE \
-   | OMP_CLAUSE_FIRSTPRIVATE | OMP_CLAUSE_DEFAULT | OMP_CLAUSE_WAIT)
+   | OMP_CLAUSE_FIRSTPRIVATE | OMP_CLAUSE_DEFAULT | OMP_CLAUSE_WAIT	      \
+   | OMP_CLAUSE_DEVICE_TYPE)
 #define OACC_KERNELS_CLAUSES \
   (OMP_CLAUSE_IF | OMP_CLAUSE_ASYNC | OMP_CLAUSE_DEVICEPTR                    \
    | OMP_CLAUSE_COPY | OMP_CLAUSE_COPYIN | OMP_CLAUSE_COPYOUT                 \
    | OMP_CLAUSE_CREATE | OMP_CLAUSE_PRESENT | OMP_CLAUSE_PRESENT_OR_COPY      \
    | OMP_CLAUSE_PRESENT_OR_COPYIN | OMP_CLAUSE_PRESENT_OR_COPYOUT             \
-   | OMP_CLAUSE_PRESENT_OR_CREATE | OMP_CLAUSE_DEFAULT | OMP_CLAUSE_WAIT)
+   | OMP_CLAUSE_PRESENT_OR_CREATE | OMP_CLAUSE_DEFAULT | OMP_CLAUSE_WAIT      \
+   | OMP_CLAUSE_DEVICE_TYPE)
 #define OACC_DATA_CLAUSES \
   (OMP_CLAUSE_IF | OMP_CLAUSE_DEVICEPTR  | OMP_CLAUSE_COPY                    \
    | OMP_CLAUSE_COPYIN | OMP_CLAUSE_COPYOUT | OMP_CLAUSE_CREATE               \
@@ -1162,7 +1331,7 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
   (OMP_CLAUSE_COLLAPSE | OMP_CLAUSE_GANG | OMP_CLAUSE_WORKER     \
    | OMP_CLAUSE_VECTOR | OMP_CLAUSE_SEQ | OMP_CLAUSE_INDEPENDENT \
    | OMP_CLAUSE_PRIVATE | OMP_CLAUSE_REDUCTION | OMP_CLAUSE_AUTO \
-   | OMP_CLAUSE_TILE)
+   | OMP_CLAUSE_TILE | OMP_CLAUSE_DEVICE_TYPE)
 #define OACC_PARALLEL_LOOP_CLAUSES \
   (OACC_LOOP_CLAUSES | OACC_PARALLEL_CLAUSES)
 #define OACC_KERNELS_LOOP_CLAUSES \
@@ -1175,8 +1344,8 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
    | OMP_CLAUSE_PRESENT_OR_COPYIN | OMP_CLAUSE_PRESENT_OR_COPYOUT             \
    | OMP_CLAUSE_PRESENT_OR_CREATE)
 #define OACC_UPDATE_CLAUSES \
-  (OMP_CLAUSE_IF | OMP_CLAUSE_ASYNC | OMP_CLAUSE_HOST_SELF \
-   | OMP_CLAUSE_OACC_DEVICE | OMP_CLAUSE_WAIT)
+  (OMP_CLAUSE_IF | OMP_CLAUSE_ASYNC | OMP_CLAUSE_HOST \
+   | OMP_CLAUSE_OACC_DEVICE | OMP_CLAUSE_WAIT | OMP_CLAUSE_DEVICE_TYPE)
 #define OACC_ENTER_DATA_CLAUSES \
   (OMP_CLAUSE_IF | OMP_CLAUSE_ASYNC | OMP_CLAUSE_WAIT | OMP_CLAUSE_COPYIN    \
    | OMP_CLAUSE_CREATE | OMP_CLAUSE_PRESENT_OR_COPYIN                          \
@@ -1186,14 +1355,35 @@ gfc_match_omp_clauses (gfc_omp_clauses **cp, uint64_t mask,
    | OMP_CLAUSE_DELETE)
 #define OACC_WAIT_CLAUSES \
   (OMP_CLAUSE_ASYNC)
+#define OACC_ROUTINE_CLAUSES \
+  (OMP_CLAUSE_GANG | OMP_CLAUSE_WORKER | OMP_CLAUSE_VECTOR | OMP_CLAUSE_SEQ \
+   | OMP_CLAUSE_BIND | OMP_CLAUSE_OACC_DEVICE | OMP_CLAUSE_NOHOST           \
+   | OMP_CLAUSE_DEVICE_TYPE)
+
+#define OACC_LOOP_CLAUSE_DEVICE_TYPE_MASK \
+  (OMP_CLAUSE_COLLAPSE | OMP_CLAUSE_GANG | OMP_CLAUSE_WORKER		    \
+   | OMP_CLAUSE_VECTOR | OMP_CLAUSE_AUTO | OMP_CLAUSE_INDEPENDENT	    \
+   | OMP_CLAUSE_SEQ | OMP_CLAUSE_TILE)
+#define OACC_KERNELS_CLAUSE_DEVICE_TYPE_MASK \
+  (OMP_CLAUSE_ASYNC | OMP_CLAUSE_WAIT)
+#define OACC_PARALLEL_CLAUSE_DEVICE_TYPE_MASK				   \
+  (OMP_CLAUSE_ASYNC | OMP_CLAUSE_NUM_GANGS | OMP_CLAUSE_NUM_WORKERS	   \
+   | OMP_CLAUSE_VECTOR_LENGTH | OMP_CLAUSE_WAIT)
+#define OACC_ROUTINE_CLAUSE_DEVICE_TYPE_MASK				   \
+   (OMP_CLAUSE_GANG | OMP_CLAUSE_WORKER | OMP_CLAUSE_VECTOR		   \
+    | OMP_CLAUSE_SEQ | OMP_CLAUSE_BIND)
+#define OACC_UPDATE_CLAUSE_DEVICE_TYPE_MASK				   \
+   (OMP_CLAUSE_ASYNC | OMP_CLAUSE_WAIT)
 
 
 match
 gfc_match_oacc_parallel_loop (void)
 {
   gfc_omp_clauses *c;
-  if (gfc_match_omp_clauses (&c, OACC_PARALLEL_LOOP_CLAUSES, false, false,
-			     true) != MATCH_YES)
+  if (gfc_match_omp_clauses (&c, OACC_PARALLEL_LOOP_CLAUSES,
+			     OACC_PARALLEL_CLAUSE_DEVICE_TYPE_MASK
+			     | OACC_LOOP_CLAUSE_DEVICE_TYPE_MASK, false,
+			     false, true) != MATCH_YES)
     return MATCH_ERROR;
 
   new_st.op = EXEC_OACC_PARALLEL_LOOP;
@@ -1206,7 +1396,9 @@ match
 gfc_match_oacc_parallel (void)
 {
   gfc_omp_clauses *c;
-  if (gfc_match_omp_clauses (&c, OACC_PARALLEL_CLAUSES, false, false, true)
+  if (gfc_match_omp_clauses (&c, OACC_PARALLEL_CLAUSES,
+			     OACC_PARALLEL_CLAUSE_DEVICE_TYPE_MASK, false,
+			     false, true)
       != MATCH_YES)
     return MATCH_ERROR;
 
@@ -1220,8 +1412,10 @@ match
 gfc_match_oacc_kernels_loop (void)
 {
   gfc_omp_clauses *c;
-  if (gfc_match_omp_clauses (&c, OACC_KERNELS_LOOP_CLAUSES, false, false,
-			     true) != MATCH_YES)
+  if (gfc_match_omp_clauses (&c, OACC_KERNELS_LOOP_CLAUSES,
+			     OACC_KERNELS_CLAUSE_DEVICE_TYPE_MASK
+			     | OACC_LOOP_CLAUSE_DEVICE_TYPE_MASK, false,
+			     false, true) != MATCH_YES)
     return MATCH_ERROR;
 
   new_st.op = EXEC_OACC_KERNELS_LOOP;
@@ -1234,7 +1428,9 @@ match
 gfc_match_oacc_kernels (void)
 {
   gfc_omp_clauses *c;
-  if (gfc_match_omp_clauses (&c, OACC_KERNELS_CLAUSES, false, false, true)
+  if (gfc_match_omp_clauses (&c, OACC_KERNELS_CLAUSES,
+			     OACC_KERNELS_CLAUSE_DEVICE_TYPE_MASK, false,
+			     false, true)
       != MATCH_YES)
     return MATCH_ERROR;
 
@@ -1248,7 +1444,7 @@ match
 gfc_match_oacc_data (void)
 {
   gfc_omp_clauses *c;
-  if (gfc_match_omp_clauses (&c, OACC_DATA_CLAUSES, false, false, true)
+  if (gfc_match_omp_clauses (&c, OACC_DATA_CLAUSES, 0, false, false, true)
       != MATCH_YES)
     return MATCH_ERROR;
 
@@ -1262,7 +1458,7 @@ match
 gfc_match_oacc_host_data (void)
 {
   gfc_omp_clauses *c;
-  if (gfc_match_omp_clauses (&c, OACC_HOST_DATA_CLAUSES, false, false, true)
+  if (gfc_match_omp_clauses (&c, OACC_HOST_DATA_CLAUSES, 0, false, false, true)
       != MATCH_YES)
     return MATCH_ERROR;
 
@@ -1276,7 +1472,9 @@ match
 gfc_match_oacc_loop (void)
 {
   gfc_omp_clauses *c;
-  if (gfc_match_omp_clauses (&c, OACC_LOOP_CLAUSES, false, false, true)
+  if (gfc_match_omp_clauses (&c, OACC_LOOP_CLAUSES,
+			     OACC_LOOP_CLAUSE_DEVICE_TYPE_MASK, false, false,
+			     true)
       != MATCH_YES)
     return MATCH_ERROR;
 
@@ -1290,12 +1488,90 @@ match
 gfc_match_oacc_declare (void)
 {
   gfc_omp_clauses *c;
-  if (gfc_match_omp_clauses (&c, OACC_DECLARE_CLAUSES, false, false, true)
+  gfc_omp_namelist *n;
+  gfc_namespace *ns = gfc_current_ns;
+  gfc_oacc_declare *new_oc, *oc;
+  locus where = gfc_current_locus;
+
+  if (gfc_match_omp_clauses (&c, OACC_DECLARE_CLAUSES, 0, false, false, true)
       != MATCH_YES)
     return MATCH_ERROR;
 
-  new_st.ext.omp_clauses = c;
-  new_st.ext.omp_clauses->loc = gfc_current_locus;
+  for (n = c->lists[OMP_LIST_MAP]; n != NULL; n = n->next)
+    {
+      gfc_symbol *s = n->sym;
+
+      if (s->ns->proc_name && s->ns->proc_name->attr.proc == PROC_MODULE)
+	{
+	  if (n->u.map_op != OMP_MAP_FORCE_ALLOC
+	      && n->u.map_op != OMP_MAP_FORCE_TO)
+	    {
+	      gfc_error ("Invalid clause in module with "
+			 "$!ACC DECLARE at %C");
+	      return MATCH_ERROR;
+	    }
+	}
+
+      if (s->attr.in_common)
+	{
+	  gfc_error ("Unsupported: variable in a common block with "
+		     "$!ACC DECLARE at %C");
+	  return MATCH_ERROR;
+	}
+
+      if (s->attr.use_assoc)
+	{
+	  gfc_error ("Unsupported: variable is USE-associated with "
+		     "$!ACC DECLARE at %C");
+	  return MATCH_ERROR;
+	}
+
+      if ((s->attr.dimension || s->attr.codimension)
+	  && s->attr.dummy && s->as->type != AS_EXPLICIT)
+	{
+	  gfc_error ("Unsupported: assumed-size dummy array with "
+		     "$!ACC DECLARE at %C");
+	  return MATCH_ERROR;
+	}
+    }
+
+  new_oc = gfc_get_oacc_declare ();
+  new_oc->next = ns->oacc_declare;
+  new_oc->where = where;
+  new_oc->clauses = c;
+
+  for (oc = new_oc; oc; oc = oc->next)
+    {
+      c = oc->clauses;
+      for (n = c->lists[OMP_LIST_MAP]; n != NULL; n = n->next)
+	n->sym->mark = 0;
+    }
+
+  for (oc = new_oc; oc; oc = oc->next)
+    {
+      c = oc->clauses;
+      for (n = c->lists[OMP_LIST_MAP]; n != NULL; n = n->next)
+	{
+	  if (n->sym->mark)
+	    {
+	      gfc_error ("Symbol %qs present on multiple clauses at %C",
+			 n->sym->name);
+	      return MATCH_ERROR;
+	    }
+	  else
+	    n->sym->mark = 1;
+	}
+    }
+
+  for (oc = new_oc; oc; oc = oc->next)
+    {
+      c = oc->clauses;
+      for (n = c->lists[OMP_LIST_MAP]; n != NULL; n = n->next)
+	n->sym->mark = 1;
+    }
+
+  ns->oacc_declare = new_oc;
+
   return MATCH_YES;
 }
 
@@ -1304,10 +1580,21 @@ match
 gfc_match_oacc_update (void)
 {
   gfc_omp_clauses *c;
-  if (gfc_match_omp_clauses (&c, OACC_UPDATE_CLAUSES, false, false, true)
+  locus here = gfc_current_locus;
+
+  if (gfc_match_omp_clauses (&c, OACC_UPDATE_CLAUSES,
+			     OACC_UPDATE_CLAUSE_DEVICE_TYPE_MASK, false,
+			     false, true)
       != MATCH_YES)
     return MATCH_ERROR;
 
+  if (!c->lists[OMP_LIST_MAP])
+    {
+      gfc_error ("%<acc update%> must contain at least one "
+		 "%<device%> or %<host/self%> clause at %L", &here);
+      return MATCH_ERROR;
+    }
+
   new_st.op = EXEC_OACC_UPDATE;
   new_st.ext.omp_clauses = c;
   return MATCH_YES;
@@ -1318,7 +1605,7 @@ match
 gfc_match_oacc_enter_data (void)
 {
   gfc_omp_clauses *c;
-  if (gfc_match_omp_clauses (&c, OACC_ENTER_DATA_CLAUSES, false, false, true)
+  if (gfc_match_omp_clauses (&c, OACC_ENTER_DATA_CLAUSES, 0, false, false, true)
       != MATCH_YES)
     return MATCH_ERROR;
 
@@ -1332,7 +1619,7 @@ match
 gfc_match_oacc_exit_data (void)
 {
   gfc_omp_clauses *c;
-  if (gfc_match_omp_clauses (&c, OACC_EXIT_DATA_CLAUSES, false, false, true)
+  if (gfc_match_omp_clauses (&c, OACC_EXIT_DATA_CLAUSES, 0, false, false, true)
       != MATCH_YES)
     return MATCH_ERROR;
 
@@ -1349,7 +1636,7 @@ gfc_match_oacc_wait (void)
   gfc_expr_list *wait_list = NULL, *el;
 
   match_oacc_expr_list (" (", &wait_list, true);
-  gfc_match_omp_clauses (&c, OACC_WAIT_CLAUSES, false, false, true);
+  gfc_match_omp_clauses (&c, OACC_WAIT_CLAUSES, 0, false, false, true);
 
   if (gfc_match_omp_eos () != MATCH_YES)
     {
@@ -1389,7 +1676,8 @@ gfc_match_oacc_cache (void)
 {
   gfc_omp_clauses *c = gfc_get_omp_clauses ();
   match m = gfc_match_omp_variable_list (" (",
-					 &c->lists[OMP_LIST_CACHE], true);
+					 &c->lists[OMP_LIST_CACHE], true,
+					 NULL, NULL, true);
   if (m != MATCH_YES)
     {
       gfc_free_omp_clauses(c);
@@ -1414,8 +1702,10 @@ match
 gfc_match_oacc_routine (void)
 {
   locus old_loc;
-  gfc_symbol *sym;
+  gfc_symbol *sym = NULL;
   match m;
+  gfc_omp_clauses *c = NULL;
+  gfc_oacc_routine_name *n = NULL;
 
   old_loc = gfc_current_locus;
 
@@ -1430,52 +1720,73 @@ gfc_match_oacc_routine (void)
       goto cleanup;
     }
 
-  if (m == MATCH_NO
-      && gfc_current_ns->proc_name
-      && gfc_match_omp_eos () == MATCH_YES)
+  if (m == MATCH_YES)
+    {
+      /* Scan for a function name/string.  */
+      m = gfc_match_symbol (&sym, 0);
+
+      if (m == MATCH_NO)
+	{
+	  gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %C");
+	  gfc_current_locus = old_loc;
+	  return MATCH_ERROR;
+	}
+
+      if (!sym->attr.external && !sym->attr.function && !sym->attr.subroutine)
+	{
+	  gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %C, invalid"
+		     " function name %qs", sym->name);
+	  gfc_current_locus = old_loc;
+	  return MATCH_ERROR;
+	}
+
+      if (gfc_match_char (')') != MATCH_YES)
+	{
+	  gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %C, expecting"
+		     " ')' after NAME");
+	  gfc_current_locus = old_loc;
+	  return MATCH_ERROR;
+      }
+    }
+
+  if (sym != NULL)
+    {
+      n = gfc_get_oacc_routine_name ();
+      n->sym = sym;
+      n->clauses = NULL;
+      n->next = NULL;
+      if (gfc_current_ns->oacc_routine_names != NULL)
+	n->next = gfc_current_ns->oacc_routine_names;
+
+      gfc_current_ns->oacc_routine_names = n;
+    }
+  else if (gfc_current_ns->proc_name)
     {
       if (!gfc_add_omp_declare_target (&gfc_current_ns->proc_name->attr,
 				       gfc_current_ns->proc_name->name,
 				       &old_loc))
 	goto cleanup;
-      return MATCH_YES;
     }
+  else
+    gcc_unreachable ();
 
-  if (m != MATCH_YES)
-    return m;
+  if (gfc_match_omp_eos () == MATCH_YES)
+    return MATCH_YES;
 
-  /* Scan for a function name.  */
-  m = gfc_match_symbol (&sym, 0);
+  if (gfc_match_omp_clauses (&c, OACC_ROUTINE_CLAUSES,
+			     OACC_ROUTINE_CLAUSE_DEVICE_TYPE_MASK, false,
+			     false, true)
+      != MATCH_YES)
+    return MATCH_ERROR;
 
-  if (m != MATCH_YES)
-    {
-      gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %C");
-      gfc_current_locus = old_loc;
-      return MATCH_ERROR;
-    }
-
-  if (!sym->attr.external && !sym->attr.function && !sym->attr.subroutine)
-    {
-      gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %C, invalid"
-		 " function name %qs", sym->name);
-      gfc_current_locus = old_loc;
-      return MATCH_ERROR;
-    }
+  if (n)
+    n->clauses = c;
+  else if (gfc_current_ns->oacc_routine)
+    gfc_current_ns->oacc_routine_clauses = c;
 
-  if (gfc_match_char (')') != MATCH_YES)
-    {
-      gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %C, expecting"
-		 " ')' after NAME");
-      gfc_current_locus = old_loc;
-      return MATCH_ERROR;
-    }
-
-  if (gfc_match_omp_eos () != MATCH_YES)
-    {
-      gfc_error ("Unexpected junk after !$ACC ROUTINE at %C");
-      goto cleanup;
-    }
-  return MATCH_YES;
+  new_st.op = EXEC_OACC_ROUTINE;
+  new_st.ext.omp_clauses = c;
+  return MATCH_YES;  
 
 cleanup:
   gfc_current_locus = old_loc;
@@ -1524,7 +1835,7 @@ static match
 match_omp (gfc_exec_op op, unsigned int mask)
 {
   gfc_omp_clauses *c;
-  if (gfc_match_omp_clauses (&c, mask) != MATCH_YES)
+  if (gfc_match_omp_clauses (&c, mask, 0) != MATCH_YES)
     return MATCH_ERROR;
   new_st.op = op;
   new_st.ext.omp_clauses = c;
@@ -1627,7 +1938,7 @@ gfc_match_omp_declare_simd (void)
   if (gfc_match (" ( %s ) ", &proc_name) != MATCH_YES)
     return MATCH_ERROR;
 
-  if (gfc_match_omp_clauses (&c, OMP_DECLARE_SIMD_CLAUSES, true,
+  if (gfc_match_omp_clauses (&c, OMP_DECLARE_SIMD_CLAUSES, 0, true,
 			     false) != MATCH_YES)
     return MATCH_ERROR;
 
@@ -2450,9 +2761,8 @@ gfc_match_omp_ordered (void)
   return MATCH_YES;
 }
 
-
-match
-gfc_match_omp_atomic (void)
+static match
+gfc_match_omp_oacc_atomic (bool omp_p)
 {
   gfc_omp_atomic_op op = GFC_OMP_ATOMIC_UPDATE;
   int seq_cst = 0;
@@ -2490,13 +2800,24 @@ gfc_match_omp_atomic (void)
       gfc_error ("Unexpected junk after $OMP ATOMIC statement at %C");
       return MATCH_ERROR;
     }
-  new_st.op = EXEC_OMP_ATOMIC;
+  new_st.op = (omp_p ? EXEC_OMP_ATOMIC : EXEC_OACC_ATOMIC);
   if (seq_cst)
     op = (gfc_omp_atomic_op) (op | GFC_OMP_ATOMIC_SEQ_CST);
   new_st.ext.omp_atomic = op;
   return MATCH_YES;
 }
 
+match
+gfc_match_oacc_atomic (void)
+{
+  return gfc_match_omp_oacc_atomic (false);
+}
+
+match
+gfc_match_omp_atomic (void)
+{
+  return gfc_match_omp_oacc_atomic (true);
+}
 
 match
 gfc_match_omp_barrier (void)
@@ -2549,7 +2870,7 @@ gfc_match_omp_cancel (void)
   enum gfc_omp_cancel_kind kind = gfc_match_omp_cancel_kind ();
   if (kind == OMP_CANCEL_UNKNOWN)
     return MATCH_ERROR;
-  if (gfc_match_omp_clauses (&c, OMP_CLAUSE_IF, false) != MATCH_YES)
+  if (gfc_match_omp_clauses (&c, OMP_CLAUSE_IF, 0, false) != MATCH_YES)
     return MATCH_ERROR;
   c->cancel = kind;
   new_st.op = EXEC_OMP_CANCEL;
@@ -2606,7 +2927,7 @@ gfc_match_omp_end_single (void)
       new_st.ext.omp_bool = true;
       return MATCH_YES;
     }
-  if (gfc_match_omp_clauses (&c, OMP_CLAUSE_COPYPRIVATE) != MATCH_YES)
+  if (gfc_match_omp_clauses (&c, OMP_CLAUSE_COPYPRIVATE, 0) != MATCH_YES)
     return MATCH_ERROR;
   new_st.op = EXEC_OMP_END_SINGLE;
   new_st.ext.omp_clauses = c;
@@ -2686,10 +3007,6 @@ check_array_not_assumed (gfc_symbol *sym, locus loc, const char *name)
   if (sym->as && sym->as->type == AS_ASSUMED_RANK)
     gfc_error ("Assumed rank array %qs in %s clause at %L",
 	       sym->name, name, &loc);
-  if (sym->as && sym->as->type == AS_DEFERRED && sym->attr.pointer
-      && !sym->attr.contiguous)
-    gfc_error ("Noncontiguous deferred shape array %qs in %s clause at %L",
-	       sym->name, name, &loc);
 }
 
 static void
@@ -4302,6 +4619,8 @@ oacc_code_to_statement (gfc_code *code)
 {
   switch (code->op)
     {
+    case EXEC_OACC_ATOMIC:
+      return ST_OACC_ATOMIC;
     case EXEC_OACC_PARALLEL:
       return ST_OACC_PARALLEL;
     case EXEC_OACC_KERNELS:
@@ -4514,22 +4833,8 @@ resolve_oacc_loop_blocks (gfc_code *code)
       if (code->ext.omp_clauses->vector)
 	gfc_error ("Clause AUTO conflicts with VECTOR at %L", &code->loc);
     }
-  if (!code->ext.omp_clauses->tile_list)
-    {
-      if (code->ext.omp_clauses->gang)
-	{
-	  if (code->ext.omp_clauses->worker)
-	    gfc_error ("Clause GANG conflicts with WORKER at %L", &code->loc);
-	  if (code->ext.omp_clauses->vector)
-	    gfc_error ("Clause GANG conflicts with VECTOR at %L", &code->loc);
-	}
-      if (code->ext.omp_clauses->worker)
-	if (code->ext.omp_clauses->vector)
-	  gfc_error ("Clause WORKER conflicts with VECTOR at %L", &code->loc);
-    }
-  else if (code->ext.omp_clauses->gang
-	   && code->ext.omp_clauses->worker
-	   && code->ext.omp_clauses->vector)
+  if (code->ext.omp_clauses->tile_list && code->ext.omp_clauses->gang
+      && code->ext.omp_clauses->worker && code->ext.omp_clauses->vector)
     gfc_error ("Tiled loop cannot be parallelized across gangs, workers and "
 	       "vectors at the same time at %L", &code->loc);
 
@@ -4599,48 +4904,52 @@ resolve_oacc_loop (gfc_code *code)
 }
 
 
-static void
-resolve_oacc_cache (gfc_code *code ATTRIBUTE_UNUSED)
-{
-  sorry ("Sorry, !$ACC cache unimplemented yet");
-}
-
-
 void
 gfc_resolve_oacc_declare (gfc_namespace *ns)
 {
   int list;
   gfc_omp_namelist *n;
   locus loc;
+  gfc_oacc_declare *oc;
 
-  if (ns->oacc_declare_clauses == NULL)
+  if (ns->oacc_declare == NULL)
     return;
 
-  loc = ns->oacc_declare_clauses->loc;
+  for (oc = ns->oacc_declare; oc; oc = oc->next)
+    {
+      loc = oc->where;
 
-  for (list = OMP_LIST_DEVICE_RESIDENT;
-       list <= OMP_LIST_DEVICE_RESIDENT; list++)
-    for (n = ns->oacc_declare_clauses->lists[list]; n; n = n->next)
-      {
-	n->sym->mark = 0;
-	if (n->sym->attr.flavor == FL_PARAMETER)
-	  gfc_error ("PARAMETER object %qs is not allowed at %L", n->sym->name, &loc);
-      }
+      for (list = OMP_LIST_DEVICE_RESIDENT;
+	   list <= OMP_LIST_DEVICE_RESIDENT; list++)
+	for (n = oc->clauses->lists[list]; n; n = n->next)
+	  {
+	    n->sym->mark = 0;
+	    if (n->sym->attr.flavor == FL_PARAMETER)
+	      gfc_error ("PARAMETER object %qs is not allowed at %L",
+			 n->sym->name, &loc);
+	  }
 
-  for (list = OMP_LIST_DEVICE_RESIDENT;
-       list <= OMP_LIST_DEVICE_RESIDENT; list++)
-    for (n = ns->oacc_declare_clauses->lists[list]; n; n = n->next)
-      {
-	if (n->sym->mark)
-	  gfc_error ("Symbol %qs present on multiple clauses at %L",
-		     n->sym->name, &loc);
-	else
-	  n->sym->mark = 1;
-      }
+      for (list = OMP_LIST_DEVICE_RESIDENT;
+	    list <= OMP_LIST_DEVICE_RESIDENT; list++)
+	for (n = oc->clauses->lists[list]; n; n = n->next)
+	  {
+	    if (n->sym->mark)
+	      gfc_error ("Symbol %qs present on multiple clauses at %L",
+			 n->sym->name, &loc);
+	    else
+	      n->sym->mark = 1;
+	  }
 
-  for (n = ns->oacc_declare_clauses->lists[OMP_LIST_DEVICE_RESIDENT]; n;
-       n = n->next)
-    check_array_not_assumed (n->sym, loc, "DEVICE_RESIDENT");
+      for (n = oc->clauses->lists[OMP_LIST_DEVICE_RESIDENT]; n; n = n->next)
+	check_array_not_assumed (n->sym, loc, "DEVICE_RESIDENT");
+
+      for (n = oc->clauses->lists[OMP_LIST_MAP]; n; n = n->next)
+	{
+	  if (n->expr && n->expr->ref->type == REF_ARRAY)
+	      gfc_error ("Subarray: %qs not allowed in $!ACC DECLARE at %L",
+			 n->sym->name, &loc);
+	}
+    }
 }
 
 
@@ -4667,8 +4976,8 @@ gfc_resolve_oacc_directive (gfc_code *code, gfc_namespace *ns ATTRIBUTE_UNUSED)
     case EXEC_OACC_LOOP:
       resolve_oacc_loop (code);
       break;
-    case EXEC_OACC_CACHE:
-      resolve_oacc_cache (code);
+    case EXEC_OACC_ATOMIC:
+      resolve_omp_atomic (code);
       break;
     default:
       break;
diff --git gcc/fortran/parse.c gcc/fortran/parse.c
index 2c7c554..69217c0 100644
--- gcc/fortran/parse.c
+++ gcc/fortran/parse.c
@@ -615,6 +615,9 @@ decode_oacc_directive (void)
 
   switch (c)
     {
+    case 'a':
+      match ("atomic", gfc_match_oacc_atomic, ST_OACC_ATOMIC);
+      break;
     case 'c':
       match ("cache", gfc_match_oacc_cache, ST_OACC_CACHE);
       break;
@@ -623,6 +626,7 @@ decode_oacc_directive (void)
       match ("declare", gfc_match_oacc_declare, ST_OACC_DECLARE);
       break;
     case 'e':
+      match ("end atomic", gfc_match_omp_eos, ST_OACC_END_ATOMIC);
       match ("end data", gfc_match_omp_eos, ST_OACC_END_DATA);
       match ("end host_data", gfc_match_omp_eos, ST_OACC_END_HOST_DATA);
       match ("end kernels loop", gfc_match_omp_eos, ST_OACC_END_KERNELS_LOOP);
@@ -1351,7 +1355,8 @@ next_statement (void)
   case ST_OMP_DISTRIBUTE_PARALLEL_DO_SIMD: \
   case ST_CRITICAL: \
   case ST_OACC_PARALLEL_LOOP: case ST_OACC_PARALLEL: case ST_OACC_KERNELS: \
-  case ST_OACC_DATA: case ST_OACC_HOST_DATA: case ST_OACC_LOOP: case ST_OACC_KERNELS_LOOP
+  case ST_OACC_DATA: case ST_OACC_HOST_DATA: case ST_OACC_LOOP: \
+  case ST_OACC_KERNELS_LOOP: case ST_OACC_ATOMIC
 
 /* Declaration statements */
 
@@ -1359,7 +1364,7 @@ next_statement (void)
   case ST_EQUIVALENCE: case ST_NAMELIST: case ST_STATEMENT_FUNCTION: \
   case ST_TYPE: case ST_INTERFACE: case ST_OMP_THREADPRIVATE: \
   case ST_PROCEDURE: case ST_OMP_DECLARE_SIMD: case ST_OMP_DECLARE_REDUCTION: \
-  case ST_OMP_DECLARE_TARGET: case ST_OACC_ROUTINE
+  case ST_OMP_DECLARE_TARGET: case ST_OACC_ROUTINE: case ST_OACC_DECLARE
 
 /* Block end statements.  Errors associated with interchanging these
    are detected in gfc_match_end().  */
@@ -1380,7 +1385,7 @@ push_state (gfc_state_data *p, gfc_compile_state new_state, gfc_symbol *sym)
   p->head = p->tail = NULL;
   p->do_variable = NULL;
   if (p->state != COMP_DO && p->state != COMP_DO_CONCURRENT)
-    p->ext.oacc_declare_clauses = NULL;
+    p->ext.oacc_declare = NULL;
 
   /* If this the state of a construct like BLOCK, DO or IF, the corresponding
      construct statement was accepted right before pushing the state.  Thus,
@@ -1909,6 +1914,12 @@ gfc_ascii_statement (gfc_statement st)
     case ST_OACC_ROUTINE:
       p = "!$ACC ROUTINE";
       break;
+    case ST_OACC_ATOMIC:
+      p = "!ACC ATOMIC";
+      break;
+    case ST_OACC_END_ATOMIC:
+      p = "!ACC END ATOMIC";
+      break;
     case ST_OMP_ATOMIC:
       p = "!$OMP ATOMIC";
       break;
@@ -2410,7 +2421,6 @@ verify_st_order (st_state *p, gfc_statement st, bool silent)
     case ST_PUBLIC:
     case ST_PRIVATE:
     case ST_DERIVED_DECL:
-    case ST_OACC_DECLARE:
     case_decl:
       if (p->state >= ORDER_EXEC)
 	goto order;
@@ -3312,19 +3322,6 @@ declSt:
       st = next_statement ();
       goto loop;
 
-    case ST_OACC_DECLARE:
-      if (!verify_st_order(&ss, st, false))
-	{
-	  reject_statement ();
-	  st = next_statement ();
-	  goto loop;
-	}
-      if (gfc_state_stack->ext.oacc_declare_clauses == NULL)
-	gfc_state_stack->ext.oacc_declare_clauses = new_st.ext.omp_clauses;
-      accept_statement (st);
-      st = next_statement ();
-      goto loop;
-
     default:
       break;
     }
@@ -4190,14 +4187,24 @@ parse_omp_do (gfc_statement omp_st)
 /* Parse the statements of OpenMP atomic directive.  */
 
 static gfc_statement
-parse_omp_atomic (void)
+parse_omp_oacc_atomic (bool omp_p)
 {
-  gfc_statement st;
+  gfc_statement st, st_atomic, st_end_atomic;
   gfc_code *cp, *np;
   gfc_state_data s;
   int count;
 
-  accept_statement (ST_OMP_ATOMIC);
+  if (omp_p)
+    {
+      st_atomic = ST_OMP_ATOMIC;
+      st_end_atomic = ST_OMP_END_ATOMIC;
+    }
+  else
+    {
+      st_atomic = ST_OACC_ATOMIC;
+      st_end_atomic = ST_OACC_END_ATOMIC;
+    }
+  accept_statement (st_atomic);
 
   cp = gfc_state_stack->tail;
   push_state (&s, COMP_OMP_STRUCTURED_BLOCK, NULL);
@@ -4224,7 +4231,7 @@ parse_omp_atomic (void)
   pop_state ();
 
   st = next_statement ();
-  if (st == ST_OMP_END_ATOMIC)
+  if (st == st_end_atomic)
     {
       gfc_clear_new_st ();
       gfc_commit_symbols ();
@@ -4518,7 +4525,7 @@ parse_omp_structured_block (gfc_statement omp_st, bool workshare_stmts_only)
 		  continue;
 
 		case ST_OMP_ATOMIC:
-		  st = parse_omp_atomic ();
+		  st = parse_omp_oacc_atomic (true);
 		  continue;
 
 		default:
@@ -4737,8 +4744,12 @@ parse_executable (gfc_statement st)
 	    return st;
 	  continue;
 
+	case ST_OACC_ATOMIC:
+	  st = parse_omp_oacc_atomic (false);
+	  continue;
+
 	case ST_OMP_ATOMIC:
-	  st = parse_omp_atomic ();
+	  st = parse_omp_oacc_atomic (true);
 	  continue;
 
 	default:
@@ -5024,13 +5035,6 @@ contains:
 
 done:
   gfc_current_ns->code = gfc_state_stack->head;
-  if (gfc_state_stack->state == COMP_PROGRAM
-      || gfc_state_stack->state == COMP_MODULE 
-      || gfc_state_stack->state == COMP_SUBROUTINE 
-      || gfc_state_stack->state == COMP_FUNCTION
-      || gfc_state_stack->state == COMP_BLOCK)
-    gfc_current_ns->oacc_declare_clauses 
-      = gfc_state_stack->ext.oacc_declare_clauses;
 }
 
 
@@ -5568,6 +5572,7 @@ is_oacc (gfc_state_data *sd)
     case EXEC_OACC_CACHE:
     case EXEC_OACC_ENTER_DATA:
     case EXEC_OACC_EXIT_DATA:
+    case EXEC_OACC_ROUTINE:
       return true;
 
     default:
diff --git gcc/fortran/parse.h gcc/fortran/parse.h
index 8a1613f..11f1e20 100644
--- gcc/fortran/parse.h
+++ gcc/fortran/parse.h
@@ -49,7 +49,7 @@ typedef struct gfc_state_data
   union
   {
     gfc_st_label *end_do_label;
-    gfc_omp_clauses *oacc_declare_clauses;
+    struct gfc_oacc_declare *oacc_declare;
   }
   ext;
 }
diff --git gcc/fortran/resolve.c gcc/fortran/resolve.c
index 316b413..bfcb6be 100644
--- gcc/fortran/resolve.c
+++ gcc/fortran/resolve.c
@@ -9209,6 +9209,9 @@ gfc_resolve_blocks (gfc_code *b, gfc_namespace *ns)
 	case EXEC_OACC_CACHE:
 	case EXEC_OACC_ENTER_DATA:
 	case EXEC_OACC_EXIT_DATA:
+	case EXEC_OACC_ATOMIC:
+	case EXEC_OACC_ROUTINE:
+	case EXEC_OACC_DECLARE:
 	case EXEC_OMP_ATOMIC:
 	case EXEC_OMP_CRITICAL:
 	case EXEC_OMP_DISTRIBUTE:
@@ -10385,6 +10388,7 @@ gfc_resolve_code (gfc_code *code, gfc_namespace *ns)
 		       "expression", &code->expr1->where);
 	  break;
 
+	case EXEC_OACC_ATOMIC:
 	case EXEC_OACC_PARALLEL_LOOP:
 	case EXEC_OACC_PARALLEL:
 	case EXEC_OACC_KERNELS_LOOP:
@@ -10397,6 +10401,7 @@ gfc_resolve_code (gfc_code *code, gfc_namespace *ns)
 	case EXEC_OACC_CACHE:
 	case EXEC_OACC_ENTER_DATA:
 	case EXEC_OACC_EXIT_DATA:
+	case EXEC_OACC_DECLARE:
 	  gfc_resolve_oacc_directive (code, ns);
 	  break;
 
diff --git gcc/fortran/st.c gcc/fortran/st.c
index 116af15..78099b8 100644
--- gcc/fortran/st.c
+++ gcc/fortran/st.c
@@ -185,6 +185,11 @@ gfc_free_statement (gfc_code *p)
       gfc_free_forall_iterator (p->ext.forall_iterator);
       break;
 
+    case EXEC_OACC_DECLARE:
+      if (p->ext.oacc_declare)
+	gfc_free_oacc_declares (p->ext.oacc_declare);
+      break;
+
     case EXEC_OACC_PARALLEL_LOOP:
     case EXEC_OACC_PARALLEL:
     case EXEC_OACC_KERNELS_LOOP:
@@ -197,6 +202,7 @@ gfc_free_statement (gfc_code *p)
     case EXEC_OACC_CACHE:
     case EXEC_OACC_ENTER_DATA:
     case EXEC_OACC_EXIT_DATA:
+    case EXEC_OACC_ROUTINE:
     case EXEC_OMP_CANCEL:
     case EXEC_OMP_CANCELLATION_POINT:
     case EXEC_OMP_DISTRIBUTE:
@@ -240,6 +246,7 @@ gfc_free_statement (gfc_code *p)
       gfc_free_omp_namelist (p->ext.omp_namelist);
       break;
 
+    case EXEC_OACC_ATOMIC:
     case EXEC_OMP_ATOMIC:
     case EXEC_OMP_BARRIER:
     case EXEC_OMP_MASTER:
diff --git gcc/fortran/trans-decl.c gcc/fortran/trans-decl.c
index 4c18920..3dbf128 100644
--- gcc/fortran/trans-decl.c
+++ gcc/fortran/trans-decl.c
@@ -5750,6 +5750,61 @@ is_ieee_module_used (gfc_namespace *ns)
 }
 
 
+static gfc_code *
+find_end (gfc_code *code)
+{
+  gcc_assert (code);
+
+  if (code->op == EXEC_END_PROCEDURE)
+    return code;
+
+  if (code->next)
+    {
+      if (code->next->op == EXEC_END_PROCEDURE)
+	return code;
+      else
+	return find_end (code->next);
+    }
+
+  return NULL;
+}
+
+
+void
+insert_oacc_declare (gfc_namespace *ns)
+{
+  gfc_code *code;
+
+  code = XCNEW (gfc_code);
+  code->op = EXEC_OACC_DECLARE;
+  code->loc = ns->oacc_declare->where;
+
+  code->ext.oacc_declare = ns->oacc_declare;
+
+  code->block = XCNEW (gfc_code);
+  code->block->op = EXEC_OACC_DECLARE;
+  code->block->loc = ns->oacc_declare->where;
+
+  if (ns->code)
+    {
+      gfc_code *c;
+
+      c = find_end (ns->code);
+      if (c)
+	{
+	  code->next = c->next;
+	  c->next = NULL;
+	}
+
+      code->block->next = ns->code;
+      code->block->ext.oacc_declare = NULL;
+    }
+
+  ns->code = code;
+  ns->oacc_declare = NULL;
+}
+
+
 /* Generate code for a function.  */
 
 void
@@ -5887,11 +5942,8 @@ gfc_generate_function_code (gfc_namespace * ns)
     add_argument_checking (&body, sym);
 
   /* Generate !$ACC DECLARE directive. */
-  if (ns->oacc_declare_clauses)
-    {
-      tree tmp = gfc_trans_oacc_declare (&body, ns);
-      gfc_add_expr_to_block (&body, tmp);
-    }
+  if (ns->oacc_declare)
+    insert_oacc_declare (ns);
 
   tmp = gfc_trans_code (ns->code);
   gfc_add_expr_to_block (&body, tmp);
diff --git gcc/fortran/trans-openmp.c gcc/fortran/trans-openmp.c
index 9642a7d..60e06d2 100644
--- gcc/fortran/trans-openmp.c
+++ gcc/fortran/trans-openmp.c
@@ -563,7 +563,8 @@ gfc_omp_clause_copy_ctor (tree clause, tree dest, tree src)
   stmtblock_t block, cond_block;
 
   gcc_assert (OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_FIRSTPRIVATE
-	      || OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_LINEAR);
+	      || OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_LINEAR
+	      || OMP_CLAUSE_CODE (clause) == OMP_CLAUSE_REDUCTION);
 
   if ((! GFC_DESCRIPTOR_TYPE_P (type)
        || GFC_TYPE_ARRAY_AKIND (type) != GFC_ARRAY_ALLOCATABLE)
@@ -1725,7 +1726,7 @@ gfc_convert_expr_to_tree (stmtblock_t *block, gfc_expr *expr)
   gfc_se se;
   tree result;
 
-  gfc_init_se (&se, NULL );
+  gfc_init_se (&se, NULL);
   gfc_conv_expr (&se, expr);
   gfc_add_block_to_block (block, &se.pre);
   result = gfc_evaluate_now (se.expr, block);
@@ -2528,7 +2529,12 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
     }
   if (clauses->seq)
     {
-      c = build_omp_clause (where.lb->location, OMP_CLAUSE_ORDERED);
+      c = build_omp_clause (where.lb->location, OMP_CLAUSE_SEQ);
+      omp_clauses = gfc_trans_add_clause (c, omp_clauses);
+    }
+  if (clauses->par_auto)
+    {
+      c = build_omp_clause (where.lb->location, OMP_CLAUSE_AUTO);
       omp_clauses = gfc_trans_add_clause (c, omp_clauses);
     }
   if (clauses->independent)
@@ -2572,6 +2578,21 @@ gfc_trans_omp_clauses (stmtblock_t *block, gfc_omp_clauses *clauses,
       OMP_CLAUSE_VECTOR_LENGTH_EXPR (c) = vector_length_var;
       omp_clauses = gfc_trans_add_clause (c, omp_clauses);
     }
+  if (clauses->tile_list)
+    {
+      vec<tree, va_gc> *tvec;
+      gfc_expr_list *el;
+
+      vec_alloc (tvec, 4);
+
+      for (el = clauses->tile_list; el; el = el->next)
+	vec_safe_push (tvec, gfc_convert_expr_to_tree (block, el->expr));
+
+      c = build_omp_clause (where.lb->location, OMP_CLAUSE_TILE);
+      OMP_CLAUSE_TILE_LIST (c) = build_tree_list_vec (tvec);
+      omp_clauses = gfc_trans_add_clause (c, omp_clauses);
+      tvec->truncate (0);
+    }
   if (clauses->vector)
     {
       if (clauses->vector_expr)
@@ -2714,7 +2735,7 @@ gfc_trans_oacc_executable_directive (gfc_code *code)
   gfc_start_block (&block);
   oacc_clauses = gfc_trans_omp_clauses (&block, code->ext.omp_clauses,
 					code->loc);
-  stmt = build1_loc (input_location, construct_code, void_type_node, 
+  stmt = build1_loc (input_location, construct_code, void_type_node,
 		     oacc_clauses);
   gfc_add_expr_to_block (&block, stmt);
   return gfc_finish_block (&block);
@@ -3465,10 +3486,6 @@ gfc_trans_oacc_combined_directive (gfc_code *code)
     poplevel (0, 0);
   stmt = build2_loc (input_location, construct_code, void_type_node, stmt,
 		     oacc_clauses);
-  if (code->op == EXEC_OACC_KERNELS_LOOP)
-    OACC_KERNELS_COMBINED (stmt) = 1;
-  else
-    OACC_PARALLEL_COMBINED (stmt) = 1;
   gfc_add_expr_to_block (&block, stmt);
   return gfc_finish_block (&block);
 }
@@ -4363,13 +4380,30 @@ gfc_trans_omp_workshare (gfc_code *code, gfc_omp_clauses *clauses)
 }
 
 tree
-gfc_trans_oacc_declare (stmtblock_t *block, gfc_namespace *ns)
+gfc_trans_oacc_declare (gfc_code *code)
 {
-  tree oacc_clauses;
-  oacc_clauses = gfc_trans_omp_clauses (block, ns->oacc_declare_clauses,
-					ns->oacc_declare_clauses->loc);
-  return build1_loc (ns->oacc_declare_clauses->loc.lb->location,
-		     OACC_DECLARE, void_type_node, oacc_clauses);
+  stmtblock_t block;
+  struct gfc_oacc_declare *d;
+  tree stmt, clauses = NULL_TREE;
+
+  gfc_start_block (&block);
+
+  for (d = code->ext.oacc_declare; d; d = d->next)
+    {
+      tree t;
+
+      t = gfc_trans_omp_clauses (&block, d->clauses, d->clauses->loc);
+
+      if (clauses)
+	OMP_CLAUSE_CHAIN (clauses) = t;
+      else
+	clauses = t;
+    }
+
+  stmt = gfc_trans_omp_code (code->block->next, true);
+  stmt = build2_loc (input_location, OACC_DATA, void_type_node, stmt, clauses);
+  gfc_add_expr_to_block (&block, stmt);
+  return gfc_finish_block (&block);
 }
 
 tree
@@ -4395,6 +4429,10 @@ gfc_trans_oacc_directive (gfc_code *code)
       return gfc_trans_oacc_executable_directive (code);
     case EXEC_OACC_WAIT:
       return gfc_trans_oacc_wait_directive (code);
+    case EXEC_OACC_ATOMIC:
+      return gfc_trans_omp_atomic (code);
+    case EXEC_OACC_DECLARE:
+      return gfc_trans_oacc_declare (code);
     default:
       gcc_unreachable ();
     }
diff --git gcc/fortran/trans-stmt.c gcc/fortran/trans-stmt.c
index 53e9bcc..2b988d0 100644
--- gcc/fortran/trans-stmt.c
+++ gcc/fortran/trans-stmt.c
@@ -1588,11 +1588,8 @@ gfc_trans_block_construct (gfc_code* code)
   code->exit_label = exit_label;
 
   /* Generate !$ACC DECLARE directive. */
-  if (ns->oacc_declare_clauses)
-    {
-      tree tmp = gfc_trans_oacc_declare (&body, ns);
-      gfc_add_expr_to_block (&body, tmp);
-    }
+  if (ns->oacc_declare)
+    insert_oacc_declare (ns);
 
   gfc_add_expr_to_block (&body, gfc_trans_code (ns->code));
   gfc_add_expr_to_block (&body, build1_v (LABEL_EXPR, exit_label));
diff --git gcc/fortran/trans-stmt.h gcc/fortran/trans-stmt.h
index 2f2a0b3..0ff93c4 100644
--- gcc/fortran/trans-stmt.h
+++ gcc/fortran/trans-stmt.h
@@ -67,7 +67,7 @@ void gfc_trans_omp_declare_simd (gfc_namespace *);
 
 /* trans-openacc.c */
 tree gfc_trans_oacc_directive (gfc_code *);
-tree gfc_trans_oacc_declare (stmtblock_t *block, gfc_namespace *);
+tree gfc_trans_oacc_declare (gfc_namespace *);
 
 /* trans-io.c */
 tree gfc_trans_open (gfc_code *);
diff --git gcc/fortran/trans.c gcc/fortran/trans.c
index 2dabf08..b20ec37 100644
--- gcc/fortran/trans.c
+++ gcc/fortran/trans.c
@@ -1932,6 +1932,7 @@ trans_code (gfc_code * code, tree cond)
 	  res = gfc_trans_omp_directive (code);
 	  break;
 
+	case EXEC_OACC_ATOMIC:
 	case EXEC_OACC_CACHE:
 	case EXEC_OACC_WAIT:
 	case EXEC_OACC_UPDATE:
@@ -1944,6 +1945,7 @@ trans_code (gfc_code * code, tree cond)
 	case EXEC_OACC_PARALLEL_LOOP:
 	case EXEC_OACC_ENTER_DATA:
 	case EXEC_OACC_EXIT_DATA:
+	case EXEC_OACC_DECLARE:
 	  res = gfc_trans_oacc_directive (code);
 	  break;
 


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Next set of OpenACC changes: Testsuite
  2015-05-05  8:54 Next set of OpenACC changes Thomas Schwinge
                   ` (2 preceding siblings ...)
  2015-05-05  8:59 ` Next set of OpenACC changes: Fortran Thomas Schwinge
@ 2015-05-05  9:00 ` Thomas Schwinge
  2015-05-11 16:35 ` [gomp4] Next set of OpenACC changes Thomas Schwinge
  4 siblings, 0 replies; 11+ messages in thread
From: Thomas Schwinge @ 2015-05-05  9:00 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek, fortran
  Cc: Bernd Schmidt, Cesar Philippidis, Chung-Lin Tang, James Norris,
	Joseph Myers, Julian Brown, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 255007 bytes --]

Hi!

On Tue, 05 May 2015 10:54:02 +0200, I wrote:
> In follow-up messages, I'll be posting the separated parts (for easier
> review) of a next set of OpenACC changes that we'd like to commit.
> ChangeLog updates not yet written; will do that before commit, obviously.

 gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c  |   46 +
 .../c-c++-common/goacc-gomp/nesting-fail-1.c       |   25 -
 gcc/testsuite/c-c++-common/goacc/asyncwait-1.c     |    4 +-
 gcc/testsuite/c-c++-common/goacc/data-2.c          |   12 +-
 gcc/testsuite/c-c++-common/goacc/declare-1.c       |   84 +
 gcc/testsuite/c-c++-common/goacc/declare-2.c       |   67 +
 gcc/testsuite/c-c++-common/goacc/dtype-1.c         |  113 ++
 gcc/testsuite/c-c++-common/goacc/dtype-2.c         |   31 +
 gcc/testsuite/c-c++-common/goacc/host_data-1.c     |   14 +
 gcc/testsuite/c-c++-common/goacc/host_data-2.c     |   14 +
 gcc/testsuite/c-c++-common/goacc/host_data-3.c     |   16 +
 gcc/testsuite/c-c++-common/goacc/host_data-4.c     |   15 +
 gcc/testsuite/c-c++-common/goacc/kernels-1.c       |    6 -
 gcc/testsuite/c-c++-common/goacc/kernels-empty.c   |    6 +
 gcc/testsuite/c-c++-common/goacc/kernels-eternal.c |   11 +
 .../c-c++-common/goacc/kernels-noreturn.c          |   12 +
 gcc/testsuite/c-c++-common/goacc/loop-1.c          |    2 -
 gcc/testsuite/c-c++-common/goacc/parallel-1.c      |    6 -
 gcc/testsuite/c-c++-common/goacc/parallel-empty.c  |    6 +
 .../c-c++-common/goacc/parallel-eternal.c          |   11 +
 .../c-c++-common/goacc/parallel-noreturn.c         |   12 +
 gcc/testsuite/c-c++-common/goacc/reduction-1.c     |   25 +-
 gcc/testsuite/c-c++-common/goacc/reduction-2.c     |   22 +-
 gcc/testsuite/c-c++-common/goacc/reduction-3.c     |   22 +-
 gcc/testsuite/c-c++-common/goacc/reduction-4.c     |   40 +-
 gcc/testsuite/c-c++-common/goacc/routine-1.c       |   35 +
 gcc/testsuite/c-c++-common/goacc/routine-2.c       |   36 +
 gcc/testsuite/c-c++-common/goacc/routine-3.c       |   52 +
 gcc/testsuite/c-c++-common/goacc/routine-4.c       |   87 ++
 gcc/testsuite/c-c++-common/goacc/tile.c            |   26 +
 gcc/testsuite/g++.dg/goacc/template-reduction.C    |  100 ++
 gcc/testsuite/g++.dg/goacc/template.C              |  131 ++
 gcc/testsuite/gfortran.dg/goacc/cache-1.f95        |    1 -
 gcc/testsuite/gfortran.dg/goacc/coarray.f95        |    2 +-
 gcc/testsuite/gfortran.dg/goacc/coarray_2.f90      |    1 +
 gcc/testsuite/gfortran.dg/goacc/combined_loop.f90  |    2 +-
 gcc/testsuite/gfortran.dg/goacc/cray.f95           |    1 -
 gcc/testsuite/gfortran.dg/goacc/declare-1.f95      |    3 +-
 gcc/testsuite/gfortran.dg/goacc/declare-2.f95      |   44 +
 gcc/testsuite/gfortran.dg/goacc/default.f95        |   17 +
 gcc/testsuite/gfortran.dg/goacc/dtype-1.f95        |  161 ++
 gcc/testsuite/gfortran.dg/goacc/dtype-2.f95        |   39 +
 gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95 |    2 +-
 gcc/testsuite/gfortran.dg/goacc/loop-1.f95         |    1 -
 gcc/testsuite/gfortran.dg/goacc/loop-2.f95         |   26 +-
 gcc/testsuite/gfortran.dg/goacc/modules.f95        |   55 +
 gcc/testsuite/gfortran.dg/goacc/parameter.f95      |    1 -
 gcc/testsuite/gfortran.dg/goacc/update.f95         |    5 +
 libgomp/testsuite/
 .../libgomp.oacc-c++/template-reduction.C          |  102 ++
 .../libgomp.oacc-c-c++-common/atomic_capture-1.c   |  866 +++++++++++
 .../libgomp.oacc-c-c++-common/atomic_capture-2.c   | 1626 ++++++++++++++++++++
 .../libgomp.oacc-c-c++-common/atomic_update-1.c    |  760 +++++++++
 .../libgomp.oacc-c-c++-common/clauses-1.c          |   26 +
 .../testsuite/libgomp.oacc-c-c++-common/data-2.c   |   44 +-
 .../testsuite/libgomp.oacc-c-c++-common/data-3.c   |   18 +-
 .../libgomp.oacc-c-c++-common/data-clauses.h       |  202 +++
 .../libgomp.oacc-c-c++-common/kernels-1.c          |  182 +--
 .../testsuite/libgomp.oacc-c-c++-common/lib-69.c   |   70 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-70.c   |   79 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-71.c   |   55 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-72.c   |   60 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-73.c   |   64 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-74.c   |   91 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-75.c   |   89 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-76.c   |   88 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-77.c   |   91 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-78.c   |   91 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-79.c   |   91 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-80.c   |   95 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-81.c   |  106 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-82.c   |   43 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-83.c   |   22 +-
 .../libgomp.oacc-c-c++-common/parallel-1.c         |  204 +--
 .../libgomp.oacc-c-c++-common/routine-1.c          |   40 +
 .../libgomp.oacc-c-c++-common/routine-2.c          |   41 +
 libgomp/testsuite/libgomp.oacc-c-c++-common/subr.h |   44 +-
 .../testsuite/libgomp.oacc-c-c++-common/subr.ptx   |  222 +--
 .../testsuite/libgomp.oacc-c-c++-common/timer.h    |  103 --
 .../libgomp.oacc-fortran/atomic_capture-1.f90      |  784 ++++++++++
 .../libgomp.oacc-fortran/atomic_update-1.f90       |  338 ++++
 libgomp/testsuite/libgomp.oacc-fortran/cache-1.f90 |   26 +
 .../testsuite/libgomp.oacc-fortran/clauses-1.f90   |  290 ++++
 libgomp/testsuite/libgomp.oacc-fortran/data-1.f90  |  231 ++-
 libgomp/testsuite/libgomp.oacc-fortran/data-2.f90  |   50 +
 libgomp/testsuite/libgomp.oacc-fortran/data-3.f90  |   34 +-
 .../testsuite/libgomp.oacc-fortran/data-4-2.f90    |   19 +-
 libgomp/testsuite/libgomp.oacc-fortran/data-4.f90  |   19 +-
 .../testsuite/libgomp.oacc-fortran/declare-1.f90   |  229 +++
 libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90  |   24 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90  |   28 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-14.f90  |   79 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-15.f90  |   52 +
 .../testsuite/libgomp.oacc-fortran/routine-5.f90   |   27 +

diff --git gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
index df45bcf..b38e181 100644
--- gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
+++ gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c
@@ -1,4 +1,50 @@
 void
+f_acc_data (void)
+{
+#pragma acc data
+  {
+    int i;
+#pragma omp atomic write
+    i = 0;
+  }
+}
+
+void
+f_acc_kernels (void)
+{
+#pragma acc kernels
+  {
+    int i;
+#pragma omp atomic write
+    i = 0;
+  }
+}
+
+void
+f_acc_loop (void)
+{
+  int i;
+
+#pragma acc loop
+  for (i = 0; i < 2; ++i)
+    {
+#pragma omp atomic write
+      i = 0;
+    }
+}
+
+void
+f_acc_parallel (void)
+{
+#pragma acc parallel
+  {
+    int i;
+#pragma omp atomic write
+    i = 0;
+  }
+}
+
+void
 f_omp_parallel (void)
 {
 #pragma omp parallel
diff --git gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
index 411fb5f..14c6aa6 100644
--- gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
+++ gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
@@ -216,12 +216,6 @@ f_acc_parallel (void)
 
 #pragma acc parallel
   {
-#pragma omp atomic write
-    i = 0; /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
-  }
-
-#pragma acc parallel
-  {
 #pragma omp ordered /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
     ;
   }
@@ -286,12 +280,6 @@ f_acc_kernels (void)
 
 #pragma acc kernels
   {
-#pragma omp atomic write
-    i = 0; /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
-  }
-
-#pragma acc kernels
-  {
 #pragma omp ordered /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
     ;
   }
@@ -356,12 +344,6 @@ f_acc_data (void)
 
 #pragma acc data
   {
-#pragma omp atomic write
-    i = 0; /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
-  }
-
-#pragma acc data
-  {
 #pragma omp ordered /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
     ;
   }
@@ -434,13 +416,6 @@ f_acc_loop (void)
 #pragma acc loop
   for (i = 0; i < 2; ++i)
     {
-#pragma omp atomic write
-      i = 0; /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
-    }
-
-#pragma acc loop
-  for (i = 0; i < 2; ++i)
-    {
 #pragma omp ordered /* { dg-error "non-OpenACC construct inside of OpenACC region" } */
       ;
     }
diff --git gcc/testsuite/c-c++-common/goacc/asyncwait-1.c gcc/testsuite/c-c++-common/goacc/asyncwait-1.c
index ccc0106..c6b81b1 100644
--- gcc/testsuite/c-c++-common/goacc/asyncwait-1.c
+++ gcc/testsuite/c-c++-common/goacc/asyncwait-1.c
@@ -116,7 +116,7 @@ f (int N, float *a, float *b)
     }
 
 #pragma acc parallel copyin (a[0:N]) copy (b[0:N]) wait (1 /* { dg-error "expected '\\\)' before end of line" } */
-    /* { dg-error "expected integer expression before '\\\)'" "" { target c++ } 118 } */
+    /* { dg-error "expected integer expression list before" "" { target c++ } 118 } */
     {
         for (ii = 0; ii < N; ii++)
             b[ii] = a[ii];
@@ -171,7 +171,7 @@ f (int N, float *a, float *b)
 #pragma acc wait (1,2,,) /* { dg-error "expected (primary-|)expression before" } */
 
 #pragma acc wait (1 /* { dg-error "expected '\\\)' before end of line" } */
-    /* { dg-error "expected integer expression before '\\\)'" "" { target c++ } 173 } */
+    /* { dg-error "expected integer expression list before" "" { target c++ } 173 } */
 
 #pragma acc wait (1,*) /* { dg-error "expected (primary-|)expression before" } */
 
diff --git gcc/testsuite/c-c++-common/goacc/data-2.c gcc/testsuite/c-c++-common/goacc/data-2.c
index a67d8a4..1043bf8a 100644
--- gcc/testsuite/c-c++-common/goacc/data-2.c
+++ gcc/testsuite/c-c++-common/goacc/data-2.c
@@ -10,12 +10,14 @@ foo (void)
 #pragma acc exit data delete (a) if (0)
 #pragma acc exit data copyout (b) if (a)
 #pragma acc exit data delete (b)
-#pragma acc enter /* { dg-error "expected 'data' in" } */
-#pragma acc exit /* { dg-error "expected 'data' in" } */
+#pragma acc enter /* { dg-error "expected 'data' after" } */
+#pragma acc exit /* { dg-error "expected 'data' after" } */
 #pragma acc enter data /* { dg-error "has no data movement clause" } */
-#pragma acc exit data /* { dg-error "has no data movement clause" } */
-#pragma acc enter Data /* { dg-error "invalid pragma before" } */
-#pragma acc exit copyout (b) /* { dg-error "invalid pragma before" } */
+#pragma acc exit data /* { dg-error "no data movement clause" } */
+#pragma acc enter Data /* { dg-error "expected 'data' after" } */
+#pragma acc exit copyout (b) /* { dg-error "expected 'data' after" } */
+#pragma acc enter for /* { dg-error "expected 'data' after" } */
+#pragma acc enter data2 /* { dg-error "expected 'data' after" } */
 }
 
 /* { dg-error "has no data movement clause" "" { target *-*-* } 8 } */
diff --git gcc/testsuite/c-c++-common/goacc/declare-1.c gcc/testsuite/c-c++-common/goacc/declare-1.c
new file mode 100644
index 0000000..cf50f02
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/declare-1.c
@@ -0,0 +1,84 @@
+/* Test valid uses of declare directive.  */
+/* { dg-do compile } */
+/* { dg-skip-if "not yet" { c++ } } */
+
+int v0;
+#pragma acc declare create(v0)
+
+int v1;
+#pragma acc declare copyin(v1)
+
+int *v2;
+#pragma acc declare deviceptr(v2)
+
+int v3;
+#pragma acc declare device_resident(v3)
+
+int v4;
+#pragma acc declare link(v4)
+
+int v5, v6, v7, v8;
+#pragma acc declare create(v5, v6) copyin(v7, v8)
+
+void
+f (void)
+{
+  int va0;
+#pragma acc declare create(va0)
+
+  int va1;
+#pragma acc declare copyin(va1)
+
+  int *va2;
+#pragma acc declare deviceptr(va2)
+
+  int va3;
+#pragma acc declare device_resident(va3)
+
+  extern int ve0;
+#pragma acc declare create(ve0)
+
+  extern int ve1;
+#pragma acc declare copyin(ve1)
+
+  extern int *ve2;
+#pragma acc declare deviceptr(ve2)
+
+  extern int ve3;
+#pragma acc declare device_resident(ve3)
+
+  extern int ve4;
+#pragma acc declare link(ve4)
+
+  int va5;
+#pragma acc declare copy(va5)
+
+  int va6;
+#pragma acc declare copyout(va6)
+
+  int va7;
+#pragma acc declare present(va7)
+
+  int va8;
+#pragma acc declare present_or_copy(va8)
+
+  int va9;
+#pragma acc declare present_or_copyin(va9)
+
+  int va10;
+#pragma acc declare present_or_copyout(va10)
+
+  int va11;
+#pragma acc declare present_or_create(va11)
+
+ a:
+  {
+    int va0;
+#pragma acc declare create(va0)
+    if (v1)
+      goto a;
+    else
+      goto b;
+  }
+ b:;
+}
diff --git gcc/testsuite/c-c++-common/goacc/declare-2.c gcc/testsuite/c-c++-common/goacc/declare-2.c
new file mode 100644
index 0000000..a2b5d6f
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/declare-2.c
@@ -0,0 +1,67 @@
+/* Test invalid uses of declare directive.  */
+/* { dg-do compile } */
+/* { dg-skip-if "not yet" { c++ } } */
+
+#pragma acc declare /* { dg-error "no valid clauses" } */
+
+#pragma acc declare create(undeclared) /* { dg-error "undeclared" } */
+/* { dg-error "no valid clauses" "second error" { target *-*-* } 7 } */
+
+int v0[10];
+#pragma acc declare create(v0[1:3]) /* { dg-error "subarray" } */
+
+int v1;
+#pragma acc declare create(v1, v1) /* { dg-error "more than once" } */
+
+int v2;
+#pragma acc declare create(v2) /* { dg-message "previous directive" } */
+#pragma acc declare copyin(v2) /* { dg-error "more than once" } */
+
+int v3;
+#pragma acc declare copy(v3) /* { dg-error "at file scope" } */
+
+int v4;
+#pragma acc declare copyout(v4) /* { dg-error "at file scope" } */
+
+int v5;
+#pragma acc declare present(v5) /* { dg-error "at file scope" } */
+
+int v6;
+#pragma acc declare present_or_copy(v6) /* { dg-error "at file scope" } */
+
+int v7;
+#pragma acc declare present_or_copyin(v7) /* { dg-error "at file scope" } */
+
+int v8;
+#pragma acc declare present_or_copyout(v8) /* { dg-error "at file scope" } */
+
+int v9;
+#pragma acc declare present_or_create(v9) /* { dg-error "at file scope" } */
+
+void
+f (void)
+{
+  int va0;
+#pragma acc declare link(va0) /* { dg-error "invalid variable" } */
+
+  extern int ve0;
+#pragma acc declare copy(ve0) /* { dg-error "invalid use of" } */
+
+  extern int ve1;
+#pragma acc declare copyout(ve1) /* { dg-error "invalid use of" } */
+
+  extern int ve2;
+#pragma acc declare present(ve2) /* { dg-error "invalid use of" } */
+
+  extern int ve3;
+#pragma acc declare present_or_copy(ve3) /* { dg-error "invalid use of" } */
+
+  extern int ve4;
+#pragma acc declare present_or_copyin(ve4) /* { dg-error "invalid use of" } */
+
+  extern int ve5;
+#pragma acc declare present_or_copyout(ve5) /* { dg-error "invalid use of" } */
+
+  extern int ve6;
+#pragma acc declare present_or_create(ve6) /* { dg-error "invalid use of" } */
+}
diff --git gcc/testsuite/c-c++-common/goacc/dtype-1.c gcc/testsuite/c-c++-common/goacc/dtype-1.c
new file mode 100644
index 0000000..2b4569e
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/dtype-1.c
@@ -0,0 +1,113 @@
+/* { dg-do compile } */
+/* { dg-options "-fopenacc -fdump-tree-omplower" } */
+
+void
+test ()
+{
+  int i1;
+
+  /* ACC PARALLEL DEVICE_TYPE: */
+
+#pragma acc parallel device_type (nVidia) async (1) num_gangs (100) num_workers (100) vector_length (32) wait (1)
+  {
+  }
+
+#pragma acc parallel async (1) num_gangs (1) num_workers (1) vector_length (1) wait (1) dtype (nvidia) async (2) num_gangs (200) num_workers (200) vector_length (64) wait (2)
+  {
+  }
+
+#pragma acc parallel async (1) num_gangs (1) num_workers (1) vector_length (1) wait (1) dtype (nvidia) async (3) num_gangs (300) num_workers (300) vector_length (128) wait (3) device_type (*) async (10) num_gangs (10) num_workers (10) vector_length (10) wait (10)
+  {
+  }
+
+#pragma acc parallel async (1) num_gangs (1) num_workers (1) vector_length (1) wait (1) device_type (nvidia_ptx) async (3) num_gangs (300) num_workers (300) vector_length (128) wait (3) dtype (*) async (10) num_gangs (10) num_workers (10) vector_length (10) wait (10)
+  {
+  }
+
+  /* ACC KERNELS DEVICE_TYPE: */
+
+#pragma acc kernels device_type (nvidia) async wait
+  {
+  }
+
+#pragma acc kernels async wait dtype (nvidia) async (1) wait (1)
+  {
+  }
+
+#pragma acc kernels async wait dtype (nvidia) async (2) wait (2) device_type (*) async (0) wait (0)
+  {
+  }
+
+#pragma acc kernels async wait device_type (nvidia_ptx) async (1) wait (1) dtype (*) async (0) wait (0)
+  {
+  }
+
+  /* ACC LOOP DEVICE_TYPE: */
+
+#pragma acc parallel
+#pragma acc loop dtype (nVidia) gang
+  for (i1 = 1; i1 < 10; i1++)
+    {
+    }
+
+#pragma acc parallel
+#pragma acc loop device_type (nVidia) gang dtype (*) worker
+  for (i1 = 1; i1 < 10; i1++)
+    {
+    }
+
+#pragma acc parallel
+#pragma acc loop dtype (nVidiaGPU) gang device_type (*) vector
+  for (i1 = 1; i1 < 10; i1++)
+    {
+    }
+
+  /* ACC UPDATE DEVICE_TYPE: */
+
+#pragma acc update host(i1) async(1) wait (1)
+
+#pragma acc update host(i1) device_type(nvidia) async(2) wait (2)
+
+#pragma acc update host(i1) async(1) wait (1) device_type(nvidia) async(3) wait (3)
+
+#pragma acc update host(i1) async(4) wait (4) device_type(nvidia) async(5) wait (5) dtype (*) async (6) wait (6)
+
+#pragma acc update host(i1) async(4) wait (4) dtype(nvidia1) async(5) wait (5) dtype (*) async (6) wait (6)
+}
+
+/* ACC ROUTINE DEVICE_TYPE: */
+
+#pragma acc routine (foo1) device_type (nvidia) gang
+#pragma acc routine (foo2) device_type (nvidia) worker
+#pragma acc routine (foo3) dtype (nvidia) vector
+#pragma acc routine (foo5) device_type (nvidia) bind (foo)
+#pragma acc routine (foo6) device_type (nvidia) gang device_type (*) worker
+#pragma acc routine (foo7) dtype (nvidia) worker dtype (*) vector
+#pragma acc routine (foo8) dtype (nvidia) vector device_type (*) gang
+#pragma acc routine (foo9) device_type (nvidia) vector device_type (*) worker
+#pragma acc routine (foo10) device_type (nvidia) bind (foo) dtype (*) gang
+#pragma acc routine (foo11) device_type (gpu) gang device_type (*) worker
+#pragma acc routine (foo12) device_type (gpu) worker dtype (*) worker
+#pragma acc routine (foo13) device_type (gpu) vector device_type (*) worker
+#pragma acc routine (foo14) dtype (gpu) worker dtype (*) worker
+#pragma acc routine (foo15) dtype (gpu) bind (foo) dtype (*) gang
+
+/* { dg-final { scan-tree-dump-times "oacc_parallel wait\\(1\\) vector_length\\(32\\) num_workers\\(100\\) num_gangs\\(100\\) async\\(1\\)" 1 "omplower" } } */
+
+/* { dg-final { scan-tree-dump-times "oacc_parallel wait\\(1\\) vector_length\\(1\\) num_workers\\(1\\) num_gangs\\(1\\) async\\(1\\) wait\\(2\\) vector_length\\(64\\) num_workers\\(200\\) num_gangs\\(200\\) async\\(2\\)" 1 "omplower" } } */
+
+/* { dg-final { scan-tree-dump-times "acc_parallel wait\\(1\\) vector_length\\(1\\) num_workers\\(1\\) num_gangs\\(1\\) async\\(1\\) wait\\(3\\) vector_length\\(128\\) num_workers\\(300\\) num_gangs\\(300\\) async\\(3" 1 "omplower" } } */
+
+/* { dg-final { scan-tree-dump-times "oacc_kernels async\\(-1\\)" 4 "omplower" } } */
+
+/* { dg-final { scan-tree-dump-times "oacc_kernels async\\(-1\\) wait\\(2\\) async\\(2\\)" 1 "omplower" } } */
+
+/* { dg-final { scan-tree-dump-times "oacc_kernels async\\(-1\\) wait\\(0\\) async\\(0\\)" 1 "omplower" } } */
+
+/* { dg-final { scan-tree-dump-times "acc loop gang private\\(i1.0\\) private\\(i1\\)" 1 "omplower" } } */
+
+/* { dg-final { scan-tree-dump-times "acc loop gang private\\(i1.1\\) private\\(i1\\)" 1 "omplower" } } */
+
+/* { dg-final { scan-tree-dump-times "acc loop vector private\\(i1.2\\) private\\(i1\\)" 1 "omplower" } } */
+
+/* { dg-final { cleanup-tree-dump "omplower" } } */
diff --git gcc/testsuite/c-c++-common/goacc/dtype-2.c gcc/testsuite/c-c++-common/goacc/dtype-2.c
new file mode 100644
index 0000000..b0bd247
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/dtype-2.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+
+void
+test ()
+{
+  int i1, i2;
+
+  /* ACC PARALLEL DEVICE_TYPE: */
+
+#pragma acc parallel dtype (nVidia) async (1) num_gangs (100) num_workers (100) vector_length (32) wait (1) copy (i1) /* { dg-error "not valid" } */
+  {
+  }
+
+  /* ACC KERNELS DEVICE_TYPE: */
+
+#pragma acc kernels device_type (nvidia) async wait copy (i1) /* { dg-error "not valid" } */
+  {
+  }
+
+  /* ACC LOOP DEVICE_TYPE: */
+
+#pragma acc parallel
+#pragma acc loop device_type (nVidia) gang private (i2) /* { dg-error "not valid" } */
+  for (i1 = 1; i1 < 10; i1++)
+    {
+    }
+
+  /* ACC UPDATE DEVICE_TYPE: */
+
+#pragma acc update host(i1) dtype (nvidia) async(1) wait (1) self (i2) /* { dg-error "not valid" } */
+}
diff --git gcc/testsuite/c-c++-common/goacc/host_data-1.c gcc/testsuite/c-c++-common/goacc/host_data-1.c
new file mode 100644
index 0000000..5e8240f
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/host_data-1.c
@@ -0,0 +1,14 @@
+/* Test valid use of host_data directive.  */
+/* { dg-do compile } */
+
+int v0;
+int v1[3][3];
+
+void
+f (void)
+{
+  int v2 = 3;
+#pragma acc host_data use_device(v2, v0, v1)
+  ;
+}
+/* { dg-bogus "sorry, unimplemented: directive not yet implemented" "host_data" { xfail *-*-* } 11 } */
diff --git gcc/testsuite/c-c++-common/goacc/host_data-2.c gcc/testsuite/c-c++-common/goacc/host_data-2.c
new file mode 100644
index 0000000..92fa97b
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/host_data-2.c
@@ -0,0 +1,14 @@
+/* Test invalid use of host_data directive.  */
+/* { dg-do compile } */
+
+int v0;
+#pragma acc host_data use_device(v0) /* { dg-error "expected" } */
+
+void
+f (void)
+{
+  int v2 = 3;
+#pragma acc host_data copy(v2) /* { dg-error "not valid for" } */
+  ;
+}
+/* { dg-bogus "sorry, unimplemented: directive not yet implemented" "host_data" { xfail *-*-* } 11 } */
diff --git gcc/testsuite/c-c++-common/goacc/host_data-3.c gcc/testsuite/c-c++-common/goacc/host_data-3.c
new file mode 100644
index 0000000..580f566
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/host_data-3.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+
+int main (int argc, char* argv[])
+{
+  int x = 5, y;
+
+  #pragma acc enter data copyin (x)
+  #pragma acc host_data use_device (x)
+  {
+    y = x;
+  }
+  #pragma acc exit data delete (x)
+
+  return y - 5;
+}
+/* { dg-bogus "sorry, unimplemented: directive not yet implemented" "host_data" { xfail *-*-* } 8 } */
diff --git gcc/testsuite/c-c++-common/goacc/host_data-4.c gcc/testsuite/c-c++-common/goacc/host_data-4.c
new file mode 100644
index 0000000..61b1c5b
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/host_data-4.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+int main (int argc, char* argv[])
+{
+  int x[100];
+
+  #pragma acc enter data copyin (x)
+  /* Specifying an array index is not valid for host_data/use_device.  */
+  #pragma acc host_data use_device (x[4]) /* { dg-error "expected \\\')' before '\\\[' token" } */
+    ;
+  #pragma acc exit data delete (x)
+
+  return 0;
+}
+/* { dg-bogus "sorry, unimplemented: directive not yet implemented" "host_data" { xfail *-*-* } 9 } */
diff --git gcc/testsuite/c-c++-common/goacc/kernels-1.c gcc/testsuite/c-c++-common/goacc/kernels-1.c
deleted file mode 100644
index e91b81c..0000000
--- gcc/testsuite/c-c++-common/goacc/kernels-1.c
+++ /dev/null
@@ -1,6 +0,0 @@
-void
-foo (void)
-{
-#pragma acc kernels
-  ;
-}
diff --git gcc/testsuite/c-c++-common/goacc/kernels-empty.c gcc/testsuite/c-c++-common/goacc/kernels-empty.c
new file mode 100644
index 0000000..e91b81c
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-empty.c
@@ -0,0 +1,6 @@
+void
+foo (void)
+{
+#pragma acc kernels
+  ;
+}
diff --git gcc/testsuite/c-c++-common/goacc/kernels-eternal.c gcc/testsuite/c-c++-common/goacc/kernels-eternal.c
new file mode 100644
index 0000000..edc17d2
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-eternal.c
@@ -0,0 +1,11 @@
+int
+main (void)
+{
+#pragma acc kernels
+  {
+    while (1)
+      ;
+  }
+
+  return 0;
+}
diff --git gcc/testsuite/c-c++-common/goacc/kernels-noreturn.c gcc/testsuite/c-c++-common/goacc/kernels-noreturn.c
new file mode 100644
index 0000000..1a8cc67
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/kernels-noreturn.c
@@ -0,0 +1,12 @@
+int
+main (void)
+{
+
+#pragma acc kernels
+  {
+    __builtin_abort ();
+  }
+
+  return 0;
+}
+
diff --git gcc/testsuite/c-c++-common/goacc/loop-1.c gcc/testsuite/c-c++-common/goacc/loop-1.c
index fea40e0..5e1a248 100644
--- gcc/testsuite/c-c++-common/goacc/loop-1.c
+++ gcc/testsuite/c-c++-common/goacc/loop-1.c
@@ -1,5 +1,3 @@
-/* { dg-skip-if "not yet" { c++ } } */
-
 int test1()
 {
   int i, j, k, b[10];
diff --git gcc/testsuite/c-c++-common/goacc/parallel-1.c gcc/testsuite/c-c++-common/goacc/parallel-1.c
deleted file mode 100644
index a860526..0000000
--- gcc/testsuite/c-c++-common/goacc/parallel-1.c
+++ /dev/null
@@ -1,6 +0,0 @@
-void
-foo (void)
-{
-#pragma acc parallel
-  ;
-}
diff --git gcc/testsuite/c-c++-common/goacc/parallel-empty.c gcc/testsuite/c-c++-common/goacc/parallel-empty.c
new file mode 100644
index 0000000..a860526
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/parallel-empty.c
@@ -0,0 +1,6 @@
+void
+foo (void)
+{
+#pragma acc parallel
+  ;
+}
diff --git gcc/testsuite/c-c++-common/goacc/parallel-eternal.c gcc/testsuite/c-c++-common/goacc/parallel-eternal.c
new file mode 100644
index 0000000..51eac76
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/parallel-eternal.c
@@ -0,0 +1,11 @@
+int
+main (void)
+{
+#pragma acc parallel
+  {
+    while (1)
+      ;
+  }
+
+  return 0;
+}
diff --git gcc/testsuite/c-c++-common/goacc/parallel-noreturn.c gcc/testsuite/c-c++-common/goacc/parallel-noreturn.c
new file mode 100644
index 0000000..ec840bd
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/parallel-noreturn.c
@@ -0,0 +1,12 @@
+int
+main (void)
+{
+
+#pragma acc parallel
+  {
+    __builtin_abort ();
+  }
+
+  return 0;
+}
+
diff --git gcc/testsuite/c-c++-common/goacc/reduction-1.c gcc/testsuite/c-c++-common/goacc/reduction-1.c
index 0f50082..8f7c70d 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-1.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-1.c
@@ -22,20 +22,17 @@ main(void)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-//   result = 0;
-//   vresult = 0;
-// 
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-//
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
 
   /* '&' reductions.  */
 #pragma acc parallel vector_length (vl)
diff --git gcc/testsuite/c-c++-common/goacc/reduction-2.c gcc/testsuite/c-c++-common/goacc/reduction-2.c
index 1f95138..7ff125f 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-2.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-2.c
@@ -22,17 +22,17 @@ main(void)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-// 
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
diff --git gcc/testsuite/c-c++-common/goacc/reduction-3.c gcc/testsuite/c-c++-common/goacc/reduction-3.c
index 476e375..cd44559 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-3.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-3.c
@@ -22,17 +22,17 @@ main(void)
   for (i = 0; i < n; i++)
     result *= array[i];
 
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-// 
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+
+  /* 'min' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
diff --git gcc/testsuite/c-c++-common/goacc/reduction-4.c gcc/testsuite/c-c++-common/goacc/reduction-4.c
index 73dde86..ec3a9c9 100644
--- gcc/testsuite/c-c++-common/goacc/reduction-4.c
+++ gcc/testsuite/c-c++-common/goacc/reduction-4.c
@@ -16,25 +16,29 @@ main(void)
   for (i = 0; i < n; i++)
     result += array[i];
 
-  /* Needs support for complex multiplication.  */
+  /* '*' reductions.  */
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (*:result)
+  for (i = 0; i < n; i++)
+    result *= array[i];
 
-//   /* '*' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (*:result)
-//   for (i = 0; i < n; i++)
-//     result *= array[i];
-//
-//   /* 'max' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result > array[i] ? result : array[i];
-// 
-//   /* 'min' reductions.  */
-// #pragma acc parallel vector_length (vl)
-// #pragma acc loop reduction (+:result)
-//   for (i = 0; i < n; i++)
-//       result = result < array[i] ? result : array[i];
+  /* 'max' reductions.  */
+#if 0
+  // error: 'result' has invalid type for 'reduction(max)'
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (max:result)
+  for (i = 0; i < n; i++)
+    result = result > array[i] ? result : array[i];
+#endif
+
+  /* 'min' reductions.  */
+#if 0
+  // error: 'result' has invalid type for 'reduction(min)'
+#pragma acc parallel vector_length (vl)
+#pragma acc loop reduction (min:result)
+  for (i = 0; i < n; i++)
+    result = result < array[i] ? result : array[i];
+#endif
 
   /* '&&' reductions.  */
 #pragma acc parallel vector_length (vl)
diff --git gcc/testsuite/c-c++-common/goacc/routine-1.c gcc/testsuite/c-c++-common/goacc/routine-1.c
new file mode 100644
index 0000000..1f89fdb
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/routine-1.c
@@ -0,0 +1,35 @@
+void *malloc (__SIZE_TYPE__);
+void free (void *);
+
+#pragma acc routine
+int
+fact (int n)
+{
+  if (n == 0 || n == 1)
+    return 1;
+
+  return n * fact (n - 1);
+}
+
+int
+main(int argc, char **argv)
+{
+  int *a, i, n = 10;
+
+  a = (int *)malloc (sizeof (int) * n);
+
+#pragma acc parallel copy (a[0:n]) vector_length (5)
+  {
+#pragma acc loop
+    for (i = 0; i < n; i++)
+      a[i] = fact (i);
+  }
+
+  for (i = 0; i < n; i++)
+    if (fact (i) != a[i])
+      return -1;
+
+  free (a);
+
+  return 0;
+}
diff --git gcc/testsuite/c-c++-common/goacc/routine-2.c gcc/testsuite/c-c++-common/goacc/routine-2.c
new file mode 100644
index 0000000..fe2e7f7
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/routine-2.c
@@ -0,0 +1,36 @@
+void *malloc (__SIZE_TYPE__);
+void free (void *);
+
+#pragma acc routine (fact)
+
+int
+fact (int n)
+{
+  if (n == 0 || n == 1)
+    return 1;
+
+  return n * fact (n - 1);
+}
+
+int
+main(int argc, char **argv)
+{
+  int *a, i, n = 10;
+
+  a = (int *)malloc (sizeof (int) * n);
+
+#pragma acc parallel copy (a[0:n]) vector_length (5)
+  {
+#pragma acc loop
+    for (i = 0; i < n; i++)
+      a[i] = fact (i);
+  }
+
+  for (i = 0; i < n; i++)
+    if (fact (i) != a[i])
+      return -1;
+
+  free (a);
+
+  return 0;
+}
diff --git gcc/testsuite/c-c++-common/goacc/routine-3.c gcc/testsuite/c-c++-common/goacc/routine-3.c
new file mode 100644
index 0000000..e35dfc1
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/routine-3.c
@@ -0,0 +1,52 @@
+/* Test valid use of clauses with routine.  */
+/* { dg-do compile } */
+
+#pragma acc routine gang
+void
+f1 (void)
+{
+}
+
+#pragma acc routine worker
+void
+f2 (void)
+{
+}
+
+#pragma acc routine vector
+void
+f3 (void)
+{
+}
+
+#pragma acc routine seq
+void
+f4 (void)
+{
+}
+
+#pragma acc routine bind (f4a)
+void
+f5 (void)
+{
+}
+
+typedef int T;
+
+#pragma acc routine bind (T)
+void
+f6 (void)
+{
+}
+
+#pragma acc routine bind ("f7a")
+void
+f7 (void)
+{
+}
+
+#pragma acc routine nohost
+void
+f8 (void)
+{
+}
diff --git gcc/testsuite/c-c++-common/goacc/routine-4.c gcc/testsuite/c-c++-common/goacc/routine-4.c
new file mode 100644
index 0000000..682d901
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/routine-4.c
@@ -0,0 +1,87 @@
+/* Test invalid use of clauses with routine.  */
+/* { dg-do compile } */
+
+#pragma acc routine gang worker /* { dg-error "invalid combination" } */
+void
+f1 (void)
+{
+}
+
+#pragma acc routine worker gang /* { dg-error "invalid combination" } */
+void
+f1a (void)
+{
+}
+
+#pragma acc routine gang vector /* { dg-error "invalid combination" } */
+void
+f2 (void)
+{
+}
+
+#pragma acc routine vector gang /* { dg-error "invalid combination" } */
+void
+f2a (void)
+{
+}
+
+#pragma acc routine gang seq /* { dg-error "invalid combination" } */
+void
+f3 (void)
+{
+}
+
+#pragma acc routine seq gang /* { dg-error "invalid combination" } */
+void
+f3a (void)
+{
+}
+
+#pragma acc routine worker vector /* { dg-error "invalid combination" } */
+void
+f4 (void)
+{
+}
+
+#pragma acc routine vector worker /* { dg-error "invalid combination" } */
+void
+f4a (void)
+{
+}
+
+#pragma acc routine worker seq /* { dg-error "invalid combination" } */
+void
+f5 (void)
+{
+}
+
+#pragma acc routine seq worker /* { dg-error "invalid combination" } */
+void
+f5a (void)
+{
+}
+
+#pragma acc routine vector seq /* { dg-error "invalid combination" } */
+void
+f6 (void)
+{
+}
+
+#pragma acc routine seq vector /* { dg-error "invalid combination" } */
+void
+f6a (void)
+{
+}
+
+#pragma acc routine (g1) gang worker /* { dg-error "invalid combination" } */
+#pragma acc routine (g2) worker gang /* { dg-error "invalid combination" } */
+#pragma acc routine (g3) gang vector /* { dg-error "invalid combination" } */
+#pragma acc routine (g4) vector gang /* { dg-error "invalid combination" } */
+#pragma acc routine (g5) gang seq /* { dg-error "invalid combination" } */
+#pragma acc routine (g6) seq gang /* { dg-error "invalid combination" } */
+#pragma acc routine (g7) worker vector /* { dg-error "invalid combination" } */
+#pragma acc routine (g8) vector worker /* { dg-error "invalid combination" } */
+#pragma acc routine (g9) worker seq /* { dg-error "invalid combination" } */
+#pragma acc routine (g10) seq worker /* { dg-error "invalid combination" } */
+#pragma acc routine (g11) vector seq /* { dg-error "invalid combination" } */
+#pragma acc routine (g12) seq vector /* { dg-error "invalid combination" } */
diff --git gcc/testsuite/c-c++-common/goacc/tile.c gcc/testsuite/c-c++-common/goacc/tile.c
new file mode 100644
index 0000000..e127955
--- /dev/null
+++ gcc/testsuite/c-c++-common/goacc/tile.c
@@ -0,0 +1,26 @@
+int
+main ()
+{
+  int i;
+
+#pragma acc parallel loop tile (10)
+  for (i = 0; i < 100; i++)
+    ;
+
+#pragma acc parallel loop tile (*)
+  for (i = 0; i < 100; i++)
+    ;
+
+#pragma acc parallel loop tile (10, *)
+  for (i = 0; i < 100; i++)
+    ;
+
+#pragma acc parallel loop tile (10, *, i) /* { dg-error "positive constant integer expression" } */
+  for (i = 0; i < 100; i++)
+    ;
+
+  return 0;
+}
+/* { dg-bogus "sorry, unimplemented: Clause not supported yet" "tile" { xfail *-*-* } 6 } */
+/* { dg-bogus "sorry, unimplemented: Clause not supported yet" "tile" { xfail *-*-* } 10 } */
+/* { dg-bogus "sorry, unimplemented: Clause not supported yet" "tile" { xfail *-*-* } 14 } */
diff --git gcc/testsuite/g++.dg/goacc/template-reduction.C gcc/testsuite/g++.dg/goacc/template-reduction.C
new file mode 100644
index 0000000..3618c02
--- /dev/null
+++ gcc/testsuite/g++.dg/goacc/template-reduction.C
@@ -0,0 +1,100 @@
+extern void abort ();
+
+const int n = 100;
+
+// Check explicit template copy map
+
+template<typename T> T
+sum (T array[])
+{
+   T s = 0;
+
+#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy (s, array[0:n])
+  for (int i = 0; i < n; i++)
+    s += array[i];
+
+  return s;
+}
+
+// Check implicit template copy map
+
+template<typename T> T
+sum ()
+{
+  T s = 0;
+  T array[n];
+
+  for (int i = 0; i < n; i++)
+    array[i] = i+1;
+
+#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy (s)
+  for (int i = 0; i < n; i++)
+    s += array[i];
+
+  return s;
+}
+
+// Check present and async
+
+template<typename T> T
+async_sum (T array[])
+{
+   T s = 0;
+
+#pragma acc parallel loop num_gangs (10) gang async (1) present (array[0:n])
+   for (int i = 0; i < n; i++)
+     array[i] = i+1;
+
+#pragma acc parallel loop num_gangs (10) gang reduction (+:s) present (array[0:n]) copy (s) async wait (1)
+  for (int i = 0; i < n; i++)
+    s += array[i];
+
+#pragma acc wait
+
+  return s;
+}
+
+// Check present and async
+
+template<typename T> T
+async_sum (int c)
+{
+   T s = 0;
+
+#pragma acc parallel loop num_gangs (10) gang reduction (+:s) copy(s) async wait (1)
+  for (int i = 0; i < n; i++)
+    s += i;
+
+#pragma acc wait
+
+  return s;
+}
+
+int
+main()
+{
+  int a[n];
+  int result = 0;
+
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = i+1;
+      result += i+1;
+    }
+
+  if (sum (a) != result)
+    abort ();
+
+  if (sum<int> () != result)
+    abort ();
+
+#pragma acc enter data copyin (a)
+  if (async_sum (a) != result)
+    abort ();
+
+  if (async_sum<int> (1) != result)
+    abort ();
+#pragma acc exit data delete (a)
+
+  return 0;
+}
diff --git gcc/testsuite/g++.dg/goacc/template.C gcc/testsuite/g++.dg/goacc/template.C
new file mode 100644
index 0000000..497c004
--- /dev/null
+++ gcc/testsuite/g++.dg/goacc/template.C
@@ -0,0 +1,131 @@
+#include <cstdio>
+
+#pragma acc routine
+template <typename T> T
+accDouble(int val)
+{
+  return val * 2;
+}
+
+template<typename T> T
+oacc_parallel_copy (T a)
+{
+  T b = 0;
+  char w = 1;
+  int x = 2;
+  float y = 3;
+  double z = 4;
+
+#pragma acc parallel num_gangs (a) num_workers (a) vector_length (a) default (none) copyout (b) copyin (a)
+  {
+    b = a;
+  }
+
+#pragma acc parallel num_gangs (a) copy (w, x, y, z)
+  {
+    w = accDouble<char>(w);
+    x = accDouble<int>(x);
+    y = accDouble<float>(y);
+    z = accDouble<double>(z);
+  }
+
+#pragma acc parallel num_gangs (a) if (1)
+  {
+#pragma acc loop independent collapse (2) device_type (nvidia) gang
+  for (int i = 0; i < a; i++)
+    for (int j = 0; j < 5; j++)
+      b = a;
+  }
+
+  T c;
+
+#pragma acc parallel num_workers (10)
+  {
+#pragma acc atomic capture
+    c = b++;
+
+#pragma atomic update
+    c++;
+
+#pragma acc atomic read
+    b = a;
+
+#pragma acc atomic write
+    b = a;
+  }
+
+#pragma acc parallel reduction (+:c)
+  {
+    c = 1;
+  }
+
+#pragma acc data if (1) copy (b)
+  {
+    #pragma acc parallel
+    {
+      b = a;
+    }
+  }
+
+#pragma acc enter data copyin (b)
+#pragma acc parallel present (b)
+    {
+      b = a;
+    }
+
+#pragma acc update host (b)
+#pragma acc update self (b)
+#pragma acc update device (b)
+#pragma acc exit data delete (b)
+
+  return b;
+}
+
+template<typename T> T
+oacc_kernels_copy (T a)
+{
+  T b = 0;
+  T c = 0;
+  char w = 1;
+  int x = 2;
+  float y = 3;
+  double z = 4;
+
+#pragma acc kernels copy (w, x, y, z)
+  {
+    w = accDouble<char>(w);
+    x = accDouble<int>(x);
+    y = accDouble<float>(y);
+    z = accDouble<double>(z);
+  }
+
+#pragma acc kernels copyout (b) copyin (a)
+  b = a;
+
+#pragma acc data if (1) copy (b)
+  {
+    #pragma acc kernels
+    {
+      b = a;
+    }
+  }
+
+#pragma acc enter data copyin (b)
+#pragma acc kernels present (b)
+    {
+      b = a;
+    }
+  return b;
+}
+
+int
+main ()
+{
+  int b = oacc_parallel_copy<int> (5);
+  int c = oacc_kernels_copy<int> (5);
+
+  printf ("b = %d\n", b);
+  printf ("c = %d\n", c);
+
+  return 0;
+}
diff --git gcc/testsuite/gfortran.dg/goacc/cache-1.f95 gcc/testsuite/gfortran.dg/goacc/cache-1.f95
index 746cf02..74ab332 100644
--- gcc/testsuite/gfortran.dg/goacc/cache-1.f95
+++ gcc/testsuite/gfortran.dg/goacc/cache-1.f95
@@ -9,4 +9,3 @@ program test
     !$acc cache (d)
   enddo
 end
-! { dg-prune-output "unimplemented" }
diff --git gcc/testsuite/gfortran.dg/goacc/coarray.f95 gcc/testsuite/gfortran.dg/goacc/coarray.f95
index 4f1224e..08e4004 100644
--- gcc/testsuite/gfortran.dg/goacc/coarray.f95
+++ gcc/testsuite/gfortran.dg/goacc/coarray.f95
@@ -32,4 +32,4 @@ contains
     !$acc update self (a)
   end subroutine oacc1
 end module test
-! { dg-prune-output "ACC cache unimplemented" }
+! { dg-bogus "sorry, unimplemented: directive not yet implemented" "host_data" { xfail *-*-* } 19 }
diff --git gcc/testsuite/gfortran.dg/goacc/coarray_2.f90 gcc/testsuite/gfortran.dg/goacc/coarray_2.f90
index f35d4b9..06a2bed 100644
--- gcc/testsuite/gfortran.dg/goacc/coarray_2.f90
+++ gcc/testsuite/gfortran.dg/goacc/coarray_2.f90
@@ -2,6 +2,7 @@
 ! { dg-additional-options "-fcoarray=lib" }
 !
 ! PR fortran/63861
+! { dg-xfail-if "<http://gcc.gnu.org/PR63861>" { *-*-* } } */
 
 module test
 contains
diff --git gcc/testsuite/gfortran.dg/goacc/combined_loop.f90 gcc/testsuite/gfortran.dg/goacc/combined_loop.f90
index b8be649..58aaa4f 100644
--- gcc/testsuite/gfortran.dg/goacc/combined_loop.f90
+++ gcc/testsuite/gfortran.dg/goacc/combined_loop.f90
@@ -6,7 +6,7 @@ subroutine oacc1()
   implicit none
   integer :: i
   integer  :: a
-  !$acc parallel loop reduction(+:a) ! { dg-excess-errors "sorry, unimplemented: directive not yet implemented" }
+  !$acc parallel loop reduction(+:a)
   do i = 1,5
   enddo
 end subroutine oacc1
diff --git gcc/testsuite/gfortran.dg/goacc/cray.f95 gcc/testsuite/gfortran.dg/goacc/cray.f95
index 8f2c077..28294ee 100644
--- gcc/testsuite/gfortran.dg/goacc/cray.f95
+++ gcc/testsuite/gfortran.dg/goacc/cray.f95
@@ -53,4 +53,3 @@ contains
     !$acc update self (ptr)
   end subroutine oacc1
 end module test
-! { dg-prune-output "unimplemented" }
diff --git gcc/testsuite/gfortran.dg/goacc/declare-1.f95 gcc/testsuite/gfortran.dg/goacc/declare-1.f95
index 03540f1..14190a7 100644
--- gcc/testsuite/gfortran.dg/goacc/declare-1.f95
+++ gcc/testsuite/gfortran.dg/goacc/declare-1.f95
@@ -15,6 +15,5 @@ contains
     END BLOCK
   end function foo
 end program test
-! { dg-prune-output "unimplemented" }
-! { dg-final { scan-tree-dump-times "pragma acc declare map\\(force_tofrom:i\\)" 2 "original" } } 
+! { dg-final { scan-tree-dump-times "pragma acc data map\\(force_tofrom:i\\)" 2 "original" } }
 ! { dg-final { cleanup-tree-dump "original" } } 
diff --git gcc/testsuite/gfortran.dg/goacc/declare-2.f95 gcc/testsuite/gfortran.dg/goacc/declare-2.f95
new file mode 100644
index 0000000..afdbe2e
--- /dev/null
+++ gcc/testsuite/gfortran.dg/goacc/declare-2.f95
@@ -0,0 +1,44 @@
+
+module amod
+
+contains
+
+subroutine asubr (b)
+  implicit none
+  integer :: b(8)
+
+  !$acc declare copy (b) ! { dg-error "Invalid clause in module" }
+  !$acc declare copyout (b) ! { dg-error "Invalid clause in module" }
+  !$acc declare present (b) ! { dg-error "Invalid clause in module" }
+  !$acc declare present_or_copy (b) ! { dg-error "Invalid clause in module" }
+  !$acc declare present_or_copyin (b) ! { dg-error "Invalid clause in module" }
+  !$acc declare present_or_copyout (b) ! { dg-error "Invalid clause in module" }
+  !$acc declare present_or_create (b) ! { dg-error "Invalid clause in module" }
+  !$acc declare deviceptr (b) ! { dg-error "Invalid clause in module" }
+  !$acc declare create (b) copyin (b) ! { dg-error "present on multiple clauses" }
+
+end subroutine
+
+end module
+
+subroutine bsubr (foo)
+  implicit none
+
+  integer, dimension (:) :: foo
+
+  !$acc declare copy (foo) ! { dg-error "assumed-size dummy array" }
+  !$acc declare copy (foo(1:2)) ! { dg-error "assumed-size dummy array" }
+
+end subroutine
+
+program test
+  integer :: a(8)
+  integer :: b(8)
+  integer :: c(8)
+
+  !$acc declare create (a) copyin (a) ! { dg-error "present on multiple clauses" }
+  !$acc declare copyin (b)
+  !$acc declare copyin (b) ! { dg-error "present on multiple clauses" }
+  !$acc declare copy (c(1:2)) ! { dg-error "Subarray: 'c' not allowed" }
+
+end program
diff --git gcc/testsuite/gfortran.dg/goacc/default.f95 gcc/testsuite/gfortran.dg/goacc/default.f95
new file mode 100644
index 0000000..c1fc52e
--- /dev/null
+++ gcc/testsuite/gfortran.dg/goacc/default.f95
@@ -0,0 +1,17 @@
+! { dg-do compile }
+
+program tile
+  integer i, j, a
+
+  !$acc parallel default (shared) ! { dg-error "Unclassifiable OpenACC directive" }
+  !$acc end parallel ! { dg-error "Unexpected" }
+
+  !$acc parallel default (private) ! { dg-error "Unclassifiable OpenACC directive" }
+  !$acc end parallel ! { dg-error "Unexpected" }
+
+  !$acc parallel default (none)
+  !$acc end parallel
+
+  !$acc parallel default (firstprivate) ! { dg-error "Unclassifiable OpenACC directive" }
+  !$acc end parallel ! { dg-error "Unexpected" }
+end program tile
diff --git gcc/testsuite/gfortran.dg/goacc/dtype-1.f95 gcc/testsuite/gfortran.dg/goacc/dtype-1.f95
new file mode 100644
index 0000000..350e443
--- /dev/null
+++ gcc/testsuite/gfortran.dg/goacc/dtype-1.f95
@@ -0,0 +1,161 @@
+! { dg-do compile }
+! { dg-options "-fopenacc -fdump-tree-omplower" }
+
+program dtype
+  integer i1
+
+!! ACC PARALLEL DEVICE_TYPE:
+
+!$acc parallel dtype (nVidia) async (1) num_gangs (100) &
+!$acc&  num_workers (100) vector_length (32) wait (1)
+!$acc end parallel
+
+!$acc parallel async (1) num_gangs (1) num_workers (1) vector_length (1) &
+!$acc& wait (1) device_type (nvidia) async (2) num_gangs (200) &
+!$acc&  num_workers (200) vector_length (64) wait (2)
+!$acc end parallel
+
+!$acc parallel async (1) num_gangs (1) num_workers (1) vector_length (1) &
+!$acc& wait (1) device_type (nvidia) async (3) num_gangs (300) &
+!$acc& num_workers (300) vector_length (128) wait (3) dtype (*) &
+!$acc& async (10) num_gangs (10) num_workers (10) vector_length (10) wait (10)
+!$acc end parallel
+
+!$acc parallel async (1) num_gangs (1) num_workers (1) vector_length (1) &
+!$acc& wait (1) dtype (nvidia_ptx) async (3) num_gangs (300) &
+!$acc& num_workers (300) vector_length (128) wait (3) device_type (*) &
+!$acc& async (10) num_gangs (10) num_workers (10) vector_length (10) wait (10)
+!$acc end parallel
+
+!! ACC KERNELS DEVICE_TYPE:
+
+!$acc kernels device_type (nvidia) async wait
+!$acc end kernels
+
+!$acc kernels async wait dtype (nvidia) async (1) wait (1)
+!$acc end kernels
+
+!$acc kernels async wait dtype (nvidia) async (2) wait (2) &
+!$acc& device_type (*) async (0) wait (0)
+!$acc end kernels
+
+!$acc kernels async wait device_type (nvidia_ptx) async (1) wait (1) &
+!$acc& dtype (*) async (0) wait (0)
+!$acc end kernels
+
+!! ACC LOOP DEVICE_TYPE:
+
+!$acc parallel
+!$acc loop device_type (nVidia) gang
+  do i1 = 1, 10
+  end do
+!$acc end parallel
+
+!$acc parallel
+!$acc loop dtype (nVidia) gang dtype (*) worker
+  do i1 = 1, 10
+  end do
+!$acc end parallel
+
+!$acc parallel
+!$acc loop dtype (nVidiaGPU) gang dtype (*) vector
+  do i1 = 1, 10
+  end do
+!$acc end parallel
+
+!! ACC UPDATE:
+
+!$acc update host(i1) async(1) wait (1)
+
+!$acc update host(i1) device_type(nvidia) async(2) wait (2)
+
+!$acc update host(i1) async(1) wait (1) dtype(nvidia) async(3) wait (3)
+
+!$acc update host(i1) async(4) wait (4) device_type(nvidia) async(5) wait (5) &
+!$acc& dtype (*) async (6) wait (6)
+
+!$acc update host(i1) async(4) wait (4) dtype(nvidia1) async(5) &
+!$acc& wait (5) device_type (*) async (6) wait (6)
+end program dtype
+
+!! ACC ROUTINE:
+
+subroutine sr1 ()
+  !$acc routine device_type (nvidia) gang
+end subroutine sr1
+
+subroutine sr2 ()
+  !$acc routine dtype (nvidia) worker
+end subroutine sr2
+
+subroutine sr3 ()
+  !$acc routine device_type (nvidia) vector
+end subroutine sr3
+
+subroutine sr5 ()
+  !$acc routine dtype (nvidia) bind (foo)
+end subroutine sr5
+
+subroutine sr1a ()
+  !$acc routine device_type (nvidia) gang device_type (*) worker
+end subroutine sr1a
+
+subroutine sr2a ()
+  !$acc routine dtype (nvidia) worker dtype (*) vector
+end subroutine sr2a
+
+subroutine sr3a ()
+  !$acc routine dtype (nvidia) vector device_type (*) gang
+end subroutine sr3a
+
+subroutine sr4a ()
+  !$acc routine device_type (nvidia) vector device_type (*) worker
+end subroutine sr4a
+
+subroutine sr5a ()
+  !$acc routine device_type (nvidia) bind (foo) dtype (*) gang
+end subroutine sr5a
+
+subroutine sr1b ()
+  !$acc routine dtype (gpu) gang dtype (*) worker
+end subroutine sr1b
+
+subroutine sr2b ()
+  !$acc routine dtype (gpu) worker device_type (*) worker
+end subroutine sr2b
+
+subroutine sr3b ()
+  !$acc routine device_type (gpu) vector device_type (*) worker
+end subroutine sr3b
+
+subroutine sr4b ()
+  !$acc routine device_type (gpu) worker device_type (*) worker
+end subroutine sr4b
+
+subroutine sr5b ()
+  !$acc routine dtype (gpu) bind (foo) device_type (*) gang
+end subroutine sr5b
+
+! { dg-final { scan-tree-dump-times "oacc_parallel async\\(1\\) wait\\(1\\) num_gangs\\(100\\) num_workers\\(100\\) vector_length\\(32\\)" 1 "omplower" } }
+
+! { dg-final { scan-tree-dump-times "oacc_parallel async\\(2\\) wait\\(2\\) num_gangs\\(200\\) num_workers\\(200\\) vector_length\\(64\\)" 1 "omplower" } }
+
+! { dg-final { scan-tree-dump-times "oacc_parallel async\\(3\\) wait\\(3\\) num_gangs\\(300\\) num_workers\\(300\\) vector_length\\(128\\)" 1 "omplower" } }
+
+! { dg-final { scan-tree-dump-times "oacc_parallel async\\(10\\) wait\\(10\\) num_gangs\\(10\\) num_workers\\(10\\) vector_length\\(10\\)" 1 "omplower" } }
+
+! { dg-final { scan-tree-dump-times "oacc_kernels async\\(-1\\)" 1 "omplower" } }
+
+! { dg-final { scan-tree-dump-times "oacc_kernels async\\(1\\) wait\\(1\\)" 1 "omplower" } }
+
+! { dg-final { scan-tree-dump-times "oacc_kernels async\\(2\\) wait\\(2\\)" 1 "omplower" } }
+
+! { dg-final { scan-tree-dump-times "oacc_kernels async\\(0\\) wait\\(0\\)" 1 "omplower" } }
+
+! { dg-final { scan-tree-dump-times "acc loop private\\(i1\\) gang private\\(i1\\.1\\)" 1 "omplower" } }
+
+! { dg-final { scan-tree-dump-times "acc loop private\\(i1\\) gang private\\(i1\\.2\\)" 1 "omplower" } }
+
+! { dg-final { scan-tree-dump-times "acc loop private\\(i1\\) vector private\\(i1\\.3\\)" 1 "omplower" } }
+
+! { dg-final { cleanup-tree-dump "omplower" } }
diff --git gcc/testsuite/gfortran.dg/goacc/dtype-2.f95 gcc/testsuite/gfortran.dg/goacc/dtype-2.f95
new file mode 100644
index 0000000..a4573e9
--- /dev/null
+++ gcc/testsuite/gfortran.dg/goacc/dtype-2.f95
@@ -0,0 +1,39 @@
+! { dg-do compile }
+
+program dtype
+  integer i1, i2, i3, i4, i5, i6
+
+!! ACC PARALLEL DEVICE_TYPE:
+
+!$acc parallel device_type (nVidia) async (1) num_gangs (100) &
+!$acc&  num_workers (100) vector_length (32) wait (1) copy (i1)
+!$acc end parallel
+
+!! ACC KERNELS DEVICE_TYPE:
+
+!$acc kernels dtype (nvidia) async wait copy (i1)
+!$acc end kernels
+
+!! ACC LOOP DEVICE_TYPE:
+
+!$acc parallel
+!$acc loop dtype (nVidia) gang tile (1) private (i1)
+  do i1 = 1, 10
+  end do
+!$acc end parallel
+
+!! ACC UPDATE:
+
+!$acc update host(i1) device_type(nvidia) async(2) wait (2) self(i2)
+
+end program dtype
+
+! { dg-error "Invalid character" "" { target *-*-* } 8 }
+! { dg-error "Unexpected" "" { target *-*-* } 10 }
+
+! { dg-error "Invalid character" "" { target *-*-* } 14 }
+! { dg-error "Unexpected" "" { target *-*-* } 15 }
+
+! { dg-error "Invalid character" "" { target *-*-* } 20 }
+
+! { dg-error "Invalid character" "" { target *-*-* } 27 }
diff --git gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95 gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95
index 19e7411..8a25829 100644
--- gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95
+++ gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95
@@ -8,6 +8,6 @@ program test
   !$acc host_data use_device(i)
   !$acc end host_data
 end program test
-! { dg-prune-output "unimplemented" }
+! { dg-bogus "sorry, unimplemented: directive not yet implemented" "host_data" { xfail *-*-* } 8 }
 ! { dg-final { scan-tree-dump-times "pragma acc host_data use_device\\(i\\)" 1 "original" } } 
 ! { dg-final { cleanup-tree-dump "original" } } 
diff --git gcc/testsuite/gfortran.dg/goacc/loop-1.f95 gcc/testsuite/gfortran.dg/goacc/loop-1.f95
index e1b2dfd..817039f 100644
--- gcc/testsuite/gfortran.dg/goacc/loop-1.f95
+++ gcc/testsuite/gfortran.dg/goacc/loop-1.f95
@@ -168,4 +168,3 @@ subroutine test1
 end subroutine test1
 end module test
 ! { dg-prune-output "Deleted" }
-! { dg-prune-output "ACC cache unimplemented" }
diff --git gcc/testsuite/gfortran.dg/goacc/loop-2.f95 gcc/testsuite/gfortran.dg/goacc/loop-2.f95
index f85691e..b5e6368 100644
--- gcc/testsuite/gfortran.dg/goacc/loop-2.f95
+++ gcc/testsuite/gfortran.dg/goacc/loop-2.f95
@@ -66,7 +66,7 @@ program test
     !$acc loop seq worker ! { dg-error "conflicts with" }
     DO i = 1,10
     ENDDO
-    !$acc loop gang worker ! { dg-error "conflicts with" }
+    !$acc loop gang worker
     DO i = 1,10
     ENDDO
 
@@ -94,10 +94,10 @@ program test
     !$acc loop seq vector ! { dg-error "conflicts with" }
     DO i = 1,10
     ENDDO
-    !$acc loop gang vector ! { dg-error "conflicts with" }
+    !$acc loop gang vector
     DO i = 1,10
     ENDDO
-    !$acc loop worker vector ! { dg-error "conflicts with" }
+    !$acc loop worker vector
     DO i = 1,10
     ENDDO
 
@@ -239,7 +239,7 @@ program test
     !$acc loop seq worker ! { dg-error "conflicts with" }
     DO i = 1,10
     ENDDO
-    !$acc loop gang worker ! { dg-error "conflicts with" }
+    !$acc loop gang worker
     DO i = 1,10
     ENDDO
 
@@ -267,10 +267,10 @@ program test
     !$acc loop seq vector ! { dg-error "conflicts with" }
     DO i = 1,10
     ENDDO
-    !$acc loop gang vector ! { dg-error "conflicts with" }
+    !$acc loop gang vector
     DO i = 1,10
     ENDDO
-    !$acc loop worker vector ! { dg-error "conflicts with" }
+    !$acc loop worker vector
     DO i = 1,10
     ENDDO
 
@@ -392,7 +392,7 @@ program test
   !$acc kernels loop seq worker ! { dg-error "conflicts with" }
   DO i = 1,10
   ENDDO
-  !$acc kernels loop gang worker ! { dg-error "conflicts with" }
+  !$acc kernels loop gang worker
   DO i = 1,10
   ENDDO
 
@@ -420,10 +420,10 @@ program test
   !$acc kernels loop seq vector ! { dg-error "conflicts with" }
   DO i = 1,10
   ENDDO
-  !$acc kernels loop gang vector ! { dg-error "conflicts with" }
+  !$acc kernels loop gang vector
   DO i = 1,10
   ENDDO
-  !$acc kernels loop worker vector ! { dg-error "conflicts with" }
+  !$acc kernels loop worker vector
   DO i = 1,10
   ENDDO
 
@@ -544,7 +544,7 @@ program test
   !$acc parallel loop seq worker ! { dg-error "conflicts with" }
   DO i = 1,10
   ENDDO
-  !$acc parallel loop gang worker ! { dg-error "conflicts with" }
+  !$acc parallel loop gang worker
   DO i = 1,10
   ENDDO
 
@@ -572,10 +572,10 @@ program test
   !$acc parallel loop seq vector ! { dg-error "conflicts with" }
   DO i = 1,10
   ENDDO
-  !$acc parallel loop gang vector ! { dg-error "conflicts with" }
+  !$acc parallel loop gang vector
   DO i = 1,10
   ENDDO
-  !$acc parallel loop worker vector ! { dg-error "conflicts with" }
+  !$acc parallel loop worker vector
   DO i = 1,10
   ENDDO
 
@@ -646,4 +646,4 @@ program test
   !$acc parallel loop gang worker tile(*) 
   DO i = 1,10
   ENDDO
-end
\ No newline at end of file
+end
diff --git gcc/testsuite/gfortran.dg/goacc/modules.f95 gcc/testsuite/gfortran.dg/goacc/modules.f95
new file mode 100644
index 0000000..19a2abe
--- /dev/null
+++ gcc/testsuite/gfortran.dg/goacc/modules.f95
@@ -0,0 +1,55 @@
+! { dg-do compile } 
+
+MODULE reduction_test
+
+CONTAINS
+
+SUBROUTINE reduction_kernel(x_min,x_max,y_min,y_max,arr,sum)
+
+  IMPLICIT NONE
+
+  INTEGER      :: x_min,x_max,y_min,y_max
+  REAL(KIND=8), DIMENSION(x_min-2:x_max+2,y_min-2:y_max+2) :: arr
+  REAL(KIND=8) :: sum
+
+  INTEGER      :: j,k
+
+  sum=0.0
+
+!$ACC DATA PRESENT(arr) COPY(sum)
+!$ACC PARALLEL LOOP REDUCTION(+ : sum)
+  DO k=y_min,y_max
+    DO j=x_min,x_max
+      sum=sum*arr(j,k)
+    ENDDO
+  ENDDO
+!$ACC END PARALLEL LOOP
+!$ACC END DATA
+
+END SUBROUTINE reduction_kernel
+
+END MODULE reduction_test
+
+program main
+    use reduction_test
+
+    integer :: x_min,x_max,y_min,y_max
+    real(kind=8), dimension(1:10,1:10) :: arr
+    real(kind=8) :: sum
+
+    x_min = 5
+    x_max = 6
+    y_min = 5
+    y_max = 6
+
+    arr(:,:) = 1.0
+
+    sum = 1.0
+
+    !$acc data copy(arr)
+
+    call field_summary_kernel(x_min,x_max,y_min,y_max,arr,sum)
+
+    !$acc end data
+
+end program
diff --git gcc/testsuite/gfortran.dg/goacc/parameter.f95 gcc/testsuite/gfortran.dg/goacc/parameter.f95
index 1364181..82c25ba 100644
--- gcc/testsuite/gfortran.dg/goacc/parameter.f95
+++ gcc/testsuite/gfortran.dg/goacc/parameter.f95
@@ -29,4 +29,3 @@ contains
     !$acc update self (a) ! { dg-error "not a variable" }
   end subroutine oacc1
 end module test
-! { dg-prune-output "unimplemented" }
diff --git gcc/testsuite/gfortran.dg/goacc/update.f95 gcc/testsuite/gfortran.dg/goacc/update.f95
new file mode 100644
index 0000000..ae23dfc
--- /dev/null
+++ gcc/testsuite/gfortran.dg/goacc/update.f95
@@ -0,0 +1,5 @@
+! { dg-do compile } 
+
+program foo
+  !$acc update ! { dg-error "must contain at least one 'device' or 'host/self' clause" }
+end program foo
diff --git libgomp/testsuite/libgomp.oacc-c++/template-reduction.C libgomp/testsuite/libgomp.oacc-c++/template-reduction.C
new file mode 100644
index 0000000..c158b7a
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c++/template-reduction.C
@@ -0,0 +1,102 @@
+/* { dg-do run } */
+
+#include <cstdlib>
+
+const int n = 100;
+
+// Check explicit template copy map
+
+template<typename T> T
+sum (T array[])
+{
+   T s = 0;
+
+#pragma acc parallel loop vector_length (10) reduction (+:s) copy (s, array[0:n])
+  for (int i = 0; i < n; i++)
+    s += array[i];
+
+  return s;
+}
+
+// Check implicit template copy map
+
+template<typename T> T
+sum ()
+{
+  T s = 0;
+  T array[n];
+
+  for (int i = 0; i < n; i++)
+    array[i] = i+1;
+
+#pragma acc parallel loop vector_length (10) reduction (+:s) copy (s)
+  for (int i = 0; i < n; i++)
+    s += array[i];
+
+  return s;
+}
+
+// Check present and async
+
+template<typename T> T
+async_sum (T array[])
+{
+   T s = 0;
+
+#pragma acc parallel loop vector_length (10) async (1) present (array[0:n])
+   for (int i = 0; i < n; i++)
+     array[i] = i+1;
+
+#pragma acc parallel loop vector_length (10) reduction (+:s) present (array[0:n]) copy (s) async wait (1)
+  for (int i = 0; i < n; i++)
+    s += array[i];
+
+#pragma acc wait
+
+  return s;
+}
+
+// Check present and async
+
+template<typename T> T
+async_sum (int c)
+{
+   T s = 0;
+
+#pragma acc parallel loop vector_length (10) reduction (+:s) copy(s) async wait (1)
+  for (int i = 0; i < n; i++)
+    s += i+1;
+
+#pragma acc wait
+
+  return s;
+}
+
+int
+main()
+{
+  int a[n];
+  int result = 0;
+
+  for (int i = 0; i < n; i++)
+    {
+      a[i] = i+1;
+      result += i+1;
+    }
+
+  if (sum (a) != result)
+    abort ();
+
+  if (sum<int> () != result)
+    abort ();
+
+#pragma acc enter data copyin (a)
+  if (async_sum (a) != result)
+    abort ();
+
+  if (async_sum<int> (1) != result)
+    abort ();
+#pragma acc exit data delete (a)
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c
new file mode 100644
index 0000000..ad958cd
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c
@@ -0,0 +1,866 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+
+int
+main(int argc, char **argv)
+{
+  int   iexp, igot;
+  long long lexp, lgot;
+  int   N = 32;
+  int   idata[N];
+  long long   ldata[N];
+  float fexp, fgot;
+  float fdata[N];
+  int i;
+
+  igot = 0;
+  iexp = 32;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+#pragma acc atomic capture
+        idata[i] = igot++;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 32;
+  iexp = 0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+#pragma acc atomic capture
+        idata[i] = igot--;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 0;
+  iexp = 32;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+#pragma acc atomic capture
+        idata[i] = ++igot;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 32;
+  iexp = 0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+#pragma acc atomic capture
+        idata[i] = --igot;
+      }
+  }
+
+  /* BINOP = + */
+  igot = 0;
+  iexp = 32;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        idata[i] = igot += expr;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 0;
+  iexp = 32;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        idata[i] = igot = igot + expr;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 0;
+  iexp = 32;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        idata[i] = igot = expr + igot;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  /* BINOP = * */
+  lgot = 1LL;
+  lexp = 1LL << N;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 2LL;
+
+#pragma acc atomic capture
+        ldata[i] = lgot *= expr;
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 1LL << N;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 2LL;
+
+#pragma acc atomic capture
+        ldata[i] = lgot = lgot * expr;
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 1LL << N;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 2LL;
+
+#pragma acc atomic capture
+        ldata[i] = lgot = expr * lgot;
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  /* BINOP = - */
+  igot = 32;
+  iexp = 0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        idata[i] = igot -= expr;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 32;
+  iexp = 0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        idata[i] = igot = igot - expr;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 32;
+  iexp = 32;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        idata[i] = igot = expr - igot;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+
+  /* BINOP = / */
+  lgot = 1LL << 32;
+  lexp = 1LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+      long long expr = 2LL;
+
+#pragma acc atomic capture
+        ldata[i] = lgot /= expr;
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL << 32;
+  lexp = 1LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 2LL;
+
+#pragma acc atomic capture
+        ldata[i] = lgot = lgot / expr;
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 2LL;
+  lexp = 2LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+      long long expr = 1LL << N;
+
+#pragma acc atomic capture
+        ldata[i] = lgot = expr / lgot;
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  /* BINOP = & */
+  igot = ~0;
+  iexp = 0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1 << i;
+
+#pragma acc atomic capture
+        idata[i] = igot &= expr;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = ~0;
+  iexp = 0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1 << i;
+
+#pragma acc atomic capture
+        idata[i] = igot = igot & expr;
+    }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = ~0;
+  iexp = 0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1 << i;
+
+#pragma acc atomic capture
+        idata[i] = igot = expr & igot;
+     }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  /* BINOP = ^ */
+  igot = ~0;
+  iexp = 0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1 << i;
+
+#pragma acc atomic capture
+        idata[i] = igot ^= expr;
+     }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = ~0;
+  iexp = 0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1 << i;
+
+#pragma acc atomic capture
+        idata[i] = igot = igot ^ expr;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = ~0;
+  iexp = 0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1 << i;
+
+#pragma acc atomic capture
+        idata[i] = igot = expr ^ igot;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  /* BINOP = | */
+  igot = 0;
+  iexp = ~0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1 << i;
+
+#pragma acc atomic capture
+        idata[i] = igot |= expr;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 0;
+  iexp = ~0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1 << i;
+
+#pragma acc atomic capture
+        idata[i] = igot = igot | expr;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 0;
+  iexp = ~0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1 << i;
+
+#pragma acc atomic capture
+        idata[i] = igot = expr | igot;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  /* BINOP = << */
+  lgot = 1LL;
+  lexp = 1LL << N;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        ldata[i] = lgot <<= expr;
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 1LL << N;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        idata[i] = lgot = lgot << expr;
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 2LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel
+    {
+      long long expr = 1LL;
+
+#pragma acc atomic capture
+      ldata[0] = lgot = expr << lgot;
+    }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  /* BINOP = >> */
+  lgot = 1LL << N;
+  lexp = 1LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 1LL;
+
+#pragma acc atomic capture
+        ldata[i] = lgot >>= expr;
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL << N;
+  lexp = 1LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        ldata[i] = lgot = lgot >> expr;
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL << 63;
+  lexp = 1LL << 32;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel
+    {
+      long long expr = 1LL << 32;
+
+#pragma acc atomic capture
+      ldata[0] = lgot = expr >> lgot;
+    }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  fgot = 0.0;
+  fexp = 32.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+#pragma acc atomic capture
+        fdata[i] = fgot++;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 32.0;
+  fexp = 0.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+#pragma acc atomic capture
+        fdata[i] = fgot--;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 0.0;
+  fexp = 32.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+#pragma acc atomic capture
+        fdata[i] = ++fgot;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 32.0;
+  fexp = 0.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+#pragma acc atomic capture
+        fdata[i] = --fgot;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  /* BINOP = + */
+  fgot = 0.0;
+  fexp = 32.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+
+#pragma acc atomic capture
+        fdata[i] = fgot += expr;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 0.0;
+  fexp = 32.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+
+#pragma acc atomic capture
+        fdata[i] = fgot = fgot + expr;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 0.0;
+  fexp = 32.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+
+#pragma acc atomic capture
+        fdata[i] = fgot = expr + fgot;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  /* BINOP = * */
+  fgot = 1.0;
+  fexp = 8192.0*8192.0*64.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+
+#pragma acc atomic capture
+        fdata[i] = fgot *= expr;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1.0;
+  fexp = 8192.0*8192.0*64.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 2LL;
+
+#pragma acc atomic capture
+        fdata[i] = fgot = fgot * expr;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1.0;
+  fexp = 8192.0*8192.0*64.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+
+#pragma acc atomic capture
+        fdata[i] = fgot = expr * fgot;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  /* BINOP = - */
+  fgot = 32.0;
+  fexp = 0.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+
+#pragma acc atomic capture
+        fdata[i] = fgot -= expr;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 32.0;
+  fexp = 0.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+
+#pragma acc atomic capture
+        fdata[i] = fgot = fgot - expr;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1.0;
+  fexp = 0.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 32.0;
+
+#pragma acc atomic capture
+        fdata[i] = fgot = expr - fgot;
+      }
+  }
+
+  for (i = 0; i < N; i++)
+    if (i % 2 == 0)
+      {
+	if (fdata[i] != 31.0)
+	  abort ();
+      }
+    else
+      {
+	if (fdata[i] != 1.0)
+	  abort ();
+      }
+
+
+  /* BINOP = / */
+  fexp = 1.0;
+  fgot = 8192.0*8192.0*64.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+
+#pragma acc atomic capture
+        fdata[i] = fgot /= expr;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fexp = 1.0;
+  fgot = 8192.0*8192.0*64.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+
+#pragma acc atomic capture
+        fdata[i] = fgot = fgot / expr;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fexp = 1.0;
+  fgot = 8192.0*8192.0*64.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel
+    {
+      float expr = 8192.0*8192.0*64.0;
+
+#pragma acc atomic capture
+      fdata[0] = fgot = expr / fgot;
+    }
+  }
+
+  if (fexp != fgot)
+    abort ();
+  
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-2.c
new file mode 100644
index 0000000..842f2de
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_capture-2.c
@@ -0,0 +1,1626 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+
+int
+main(int argc, char **argv)
+{
+  int   iexp, igot, imax, imin;
+  long long lexp, lgot;
+  int   N = 32;
+  int	i;
+  int   idata[N];
+  long long ldata[N];
+  float fexp, fgot;
+  float fdata[N];
+
+  igot = 1234;
+  iexp = 31;
+
+  for (i = 0; i < N; i++)
+    idata[i] = i;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+#pragma acc atomic capture
+      { idata[i] = igot; igot = i; }
+  }
+
+  imax = 0;
+  imin = N;
+
+  for (i = 0; i < N; i++)
+    {
+      imax = idata[i] > imax ? idata[i] : imax;
+      imin = idata[i] < imin ? idata[i] : imin;
+    }
+
+  if (imax != 1234 || imin != 0)
+    abort ();
+
+  return 0;
+
+  igot = 0;
+  iexp = 32;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+#pragma acc atomic capture
+      { idata[i] = igot; igot++; }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 0;
+  iexp = 32;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+#pragma acc atomic capture
+      { idata[i] = igot; ++igot; }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 0;
+  iexp = 32;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+#pragma acc atomic capture
+      { ++igot; idata[i] = igot; }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 0;
+  iexp = 32;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+#pragma acc atomic capture
+      { igot++; idata[i] = igot; }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 32;
+  iexp = 0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+#pragma acc atomic capture
+      { idata[i] = igot; igot--; }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 32;
+  iexp = 0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+#pragma acc atomic capture
+      { idata[i] = igot; --igot; }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 32;
+  iexp = 0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+#pragma acc atomic capture
+      { --igot; idata[i] = igot; }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 32;
+  iexp = 0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+#pragma acc atomic capture
+      { igot--; idata[i] = igot; }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  /* BINOP = + */
+  igot = 0;
+  iexp = 32;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        { idata[i] = igot; igot += expr; }
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 0;
+  iexp = 32;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        { igot += expr; idata[i] = igot; }
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 0;
+  iexp = 32;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        { idata[i] = igot; igot = igot + expr; }
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 0;
+  iexp = 32;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        { idata[i] = igot; igot = expr + igot; }
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 0;
+  iexp = 32;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        { igot = igot + expr; idata[i] = igot; }
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+
+  igot = 0;
+  iexp = 32;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        { igot = expr + igot; idata[i] = igot; }
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  /* BINOP = * */
+  lgot = 1LL;
+  lexp = 1LL << 32;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+      long long expr = 2LL;
+
+#pragma acc atomic capture
+      { ldata[i] = lgot; lgot *= expr; }
+    }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 1LL << 32;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 2LL;
+
+#pragma acc atomic capture
+        { lgot *= expr; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 1LL << 32;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 2LL;
+
+#pragma acc atomic capture
+        { ldata[i] = lgot; lgot = lgot * expr; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 1LL << 32;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+      long long expr = 2LL;
+
+#pragma acc atomic capture
+      { ldata[i] = lgot; lgot = expr * lgot; }
+    }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 1LL << 32;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 2LL;
+
+#pragma acc atomic capture
+        { lgot = lgot * expr; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 1LL << 32;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+      long long expr = 2;
+
+#pragma acc atomic capture
+      { lgot = expr * lgot; ldata[i] = lgot; }
+    }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  /* BINOP = - */
+  igot = 32;
+  iexp = 0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+      int expr = 1;
+
+#pragma acc atomic capture
+      { idata[i] = igot; igot -= expr; }
+    }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 32;
+  iexp = 0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        { igot -= expr; idata[i] = igot; }
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 32;
+  iexp = 0;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        { idata[i] = igot; igot = igot - expr; }
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 1;
+  iexp = 1;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+      int expr = 1;
+
+#pragma acc atomic capture
+      { idata[i] = igot; igot = expr - igot; }
+    }
+  }
+
+  for (i = 0; i < N; i++)
+    if (i % 2 == 0)
+      {
+	if (idata[i] != 1)
+	  abort ();
+      }
+    else
+      {
+	if (idata[i] != 0)
+	  abort ();
+      }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 1;
+  iexp = -31;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        { igot = igot - expr; idata[i] = igot; }
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 1;
+  iexp = 1;
+
+#pragma acc data copy (igot, idata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = 1;
+
+#pragma acc atomic capture
+        { igot = expr - igot; idata[i] = igot; }
+      }
+  }
+
+  for (i = 0; i < N; i++)
+    if (i % 2 == 0)
+      {
+	if (idata[i] != 0)
+	  abort ();
+      }
+    else
+      {
+	if (idata[i] != 1)
+	  abort ();
+      }
+
+  if (iexp != igot)
+    abort ();
+
+  /* BINOP = / */
+  lgot = 1LL << 32;
+  lexp = 1LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 2LL;
+
+#pragma acc atomic capture
+        { ldata[i] = lgot; lgot /= expr; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL << 32;
+  lexp = 1LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 2LL;
+
+#pragma acc atomic capture
+        { lgot /= expr; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL << 32;
+  lexp = 1LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+      long long expr = 2LL;
+
+#pragma acc atomic capture
+      { ldata[i] = lgot; lgot = lgot / expr; }
+    }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 2LL;
+  lexp = 2LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 1LL << N;
+
+#pragma acc atomic capture
+        { ldata[i] = lgot; lgot = expr / lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 2LL;
+  lexp = 2LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 1LL << N;
+
+#pragma acc atomic capture
+        { lgot = lgot / expr; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 2LL;
+  lexp = 2LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 1LL << N;
+
+#pragma acc atomic capture
+        { lgot = expr / lgot; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  /* BINOP = & */
+  lgot = ~0LL;
+  lexp = 0LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = ~(1 << i);
+
+#pragma acc atomic capture
+        { ldata[i] = lgot; lgot &= expr; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = ~0LL;
+  iexp = 0LL; 
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = ~(1 << i);
+
+#pragma acc atomic capture
+        { lgot &= expr; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = ~0LL;
+  lexp = 0LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = ~(1 << i);
+
+#pragma acc atomic capture
+        { ldata[i] = lgot; lgot = lgot & expr; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = ~0LL;
+  lexp = 0LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = ~(1 << i);
+
+#pragma acc atomic capture
+        { ldata[i] = lgot; lgot = expr & lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = ~0LL;
+  iexp = 0LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = ~(1 << i);
+
+#pragma acc atomic capture
+        { lgot = lgot & expr; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = ~0LL;
+  lexp = 0LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+      long long expr = ~(1 << i);
+
+#pragma acc atomic capture
+      { lgot = expr & lgot; ldata[i] = lgot; }
+    }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  /* BINOP = ^ */
+  lgot = ~0LL;
+  lexp = 0LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+      long long expr = 1 << i;
+
+#pragma acc atomic capture
+      { ldata[i] = lgot; lgot ^= expr; }
+    }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = ~0LL;
+  iexp = 0LL; 
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = ~(1 << i);
+
+#pragma acc atomic capture
+        { lgot ^= expr; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = ~0LL;
+  lexp = 0LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = ~(1 << i);
+
+#pragma acc atomic capture
+        { ldata[i] = lgot; lgot = lgot ^ expr; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = ~0LL;
+  lexp = 0LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+      long long expr = ~(1 << i);
+
+#pragma acc atomic capture
+      { ldata[i] = lgot; lgot = expr ^ lgot; }
+    }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = ~0LL;
+  iexp = 0LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = ~(1 << i);
+
+#pragma acc atomic capture
+        { lgot = lgot ^ expr; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = ~0LL;
+  lexp = 0LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = ~(1 << i);
+
+#pragma acc atomic capture
+        { lgot = expr ^ lgot; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  /* BINOP = | */
+  lgot = 0LL;
+  lexp = ~0LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 1 << i;
+
+#pragma acc atomic capture
+        { ldata[i] = lgot; lgot |= expr; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 0LL;
+  iexp = ~0LL; 
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = ~(1 << i);
+
+#pragma acc atomic capture
+        { lgot |= expr; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 0LL;
+  lexp = ~0LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = ~(1 << i);
+
+#pragma acc atomic capture
+        { ldata[i] = lgot; lgot = lgot | expr; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 0LL;
+  lexp = ~0LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = ~(1 << i);
+
+#pragma acc atomic capture
+        { ldata[i] = lgot; lgot = expr | lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 0LL;
+  iexp = ~0LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = ~(1 << i);
+
+#pragma acc atomic capture
+        { lgot = lgot | expr; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 0LL;
+  lexp = ~0LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = ~(1 << i);
+
+#pragma acc atomic capture
+        { lgot = expr | lgot; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  /* BINOP = << */
+  lgot = 1LL;
+  lexp = 1LL << N;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 1LL;
+
+#pragma acc atomic capture
+        { ldata[i] = lgot; lgot <<= expr; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  iexp = 1LL << N; 
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 1LL;
+
+#pragma acc atomic capture
+        { lgot <<= expr; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 1LL << N;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 1LL;
+
+#pragma acc atomic capture
+        { ldata[i] = lgot; lgot = lgot << expr; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 2LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < 1; i++)
+      {
+        long long expr = 1LL;
+
+#pragma acc atomic capture
+        { ldata[i] = lgot; lgot = expr << lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 2LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < 1; i++)
+      {
+        long long expr = 1LL;
+
+#pragma acc atomic capture
+        { lgot = lgot << expr; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 2LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < 1; i++)
+      {
+        long long expr = 1LL;
+
+#pragma acc atomic capture
+        { lgot = expr << lgot; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  /* BINOP = >> */
+  lgot = 1LL << N;
+  lexp = 1LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 1LL;
+  
+#pragma acc atomic capture
+        { ldata[i] = lgot; lgot >>= expr; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL << N;
+  iexp = 1LL; 
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 1LL;
+
+#pragma acc atomic capture
+        { lgot >>= expr; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL << N;
+  lexp = 1LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 1LL;
+
+#pragma acc atomic capture
+        { ldata[i] = lgot; lgot = lgot >> expr; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 1LL << (N - 1);
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < 1; i++)
+      {
+        long long expr = 1LL << N;
+
+#pragma acc atomic capture
+        { ldata[i] = lgot; lgot = expr >> lgot; }
+    }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL << N;
+  lexp = 1LL;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 1LL;
+
+#pragma acc atomic capture
+        { lgot = lgot >> expr; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 1LL << (N - 1);
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < 1; i++)
+      {
+        long long expr = 1LL << N;
+
+#pragma acc atomic capture
+        { lgot = expr >> lgot; ldata[i] = lgot; }
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  // FLOAT FLOAT FLOAT
+
+  /* BINOP = + */
+  fgot = 0.0;
+  fexp = 32.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+      float expr = 1.0;
+
+#pragma acc atomic capture
+      { fdata[i] = fgot; fgot += expr; }
+    }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 0.0;
+  fexp = 32.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+
+#pragma acc atomic capture
+        { fgot += expr; fdata[i] = fgot; }
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 0.0;
+  fexp = 32.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+
+#pragma acc atomic capture
+        { idata[i] = fgot; fgot = fgot + expr; }
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 0.0;
+  fexp = 32.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+      float expr = 1.0;
+
+#pragma acc atomic capture
+      { fdata[i] = fgot; fgot = expr + fgot; }
+    }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 0.0;
+  fexp = 32.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+
+#pragma acc atomic capture
+        { fgot = fgot + expr; fdata[i] = fgot; }
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 0.0;
+  fexp = 32.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+
+#pragma acc atomic capture
+        { fgot = expr + fgot; fdata[i] = fgot; }
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  /* BINOP = * */
+  fgot = 1.0;
+  fexp = 8192.0*8192.0*64.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+      float expr = 2.0;
+
+#pragma acc atomic capture
+      { fdata[i] = fgot; fgot *= expr; }
+    }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1.0;
+  fexp = 8192.0*8192.0*64.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+
+#pragma acc atomic capture
+        { fgot *= expr; fdata[i] = fgot; }
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1.0;
+  fexp = 8192.0*8192.0*64.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+
+#pragma acc atomic capture
+        { fdata[i] = fgot; fgot = fgot * expr; }
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1.0;
+  fexp = 8192.0*8192.0*64.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+
+#pragma acc atomic capture
+        { fdata[i] = fgot; fgot = expr * fgot; }
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 1LL << 32;
+
+#pragma acc data copy (lgot, ldata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+      long long expr = 2LL;
+
+#pragma acc atomic capture
+      { lgot = lgot * expr; ldata[i] = lgot; }
+    }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  fgot = 1.0;
+  fexp = 8192.0*8192.0*64.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 2;
+
+#pragma acc atomic capture
+        { fgot = expr * fgot; fdata[i] = fgot; }
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  /* BINOP = - */
+  fgot = 32.0;
+  fexp = 0.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+  
+#pragma acc atomic capture
+        { fdata[i] = fgot; fgot -= expr; }
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 32.0;
+  fexp = 0.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+      float expr = 1.0;
+
+#pragma acc atomic capture
+      { fgot -= expr; fdata[i] = fgot; }
+    }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 32.0;
+  fexp = 0.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+
+#pragma acc atomic capture
+        { fdata[i] = fgot; fgot = fgot - expr; }
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1.0;
+  fexp = 1.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+
+#pragma acc atomic capture
+        { fdata[i] = fgot; fgot = expr - fgot; }
+      }
+  }
+
+  for (i = 0; i < N; i++)
+    if (i % 2 == 0)
+      {
+	if (fdata[i] != 1.0)
+	  abort ();
+      }
+    else
+      {
+	if (fdata[i] != 0.0)
+	  abort ();
+      }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1.0;
+  fexp = -31.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+
+#pragma acc atomic capture
+        { fgot = fgot - expr; fdata[i] = fgot; }
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1.0;
+  fexp = 1.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+
+#pragma acc atomic capture
+        { fgot = expr - fgot; fdata[i] = fgot; }
+      }
+  }
+
+  for (i = 0; i < N; i++)
+    if (i % 2 == 0)
+      {
+	if (fdata[i] != 0.0)
+	  abort ();
+      }
+    else
+      {
+	if (fdata[i] != 1.0)
+	  abort ();
+      }
+
+  if (fexp != fgot)
+    abort ();
+
+  /* BINOP = / */
+  fgot = 8192.0*8192.0*64.0;
+  fexp = 1.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+
+#pragma acc atomic capture
+        { fdata[i] = fgot; fgot /= expr; }
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 8192.0*8192.0*64.0;
+  fexp = 1.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+
+#pragma acc atomic capture
+        { fgot /= expr; fdata[i] = fgot; }
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 8192.0*8192.0*64.0;
+  fexp = 1.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+
+#pragma acc atomic capture
+        { fdata[i] = fgot; fgot = fgot / expr; }
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 8192.0*8192.0*64.0;
+  fexp = 1.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+
+#pragma acc atomic capture
+        { fdata[i] = fgot; fgot = expr / fgot; }
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 4.0;
+  fexp = 4.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 1LL << N;
+
+#pragma acc atomic capture
+        { fgot = fgot / expr; fdata[i] = fgot; }
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 4.0;
+  fexp = 4.0;
+
+#pragma acc data copy (fgot, fdata[0:N])
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+
+#pragma acc atomic capture
+        { fgot = expr / fgot; fdata[i] = fgot; }
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_update-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_update-1.c
new file mode 100644
index 0000000..18ee3aa
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/atomic_update-1.c
@@ -0,0 +1,760 @@
+/* { dg-do run } */
+
+#include <stdlib.h>
+
+int
+main(int argc, char **argv)
+{
+  float fexp, fgot;
+  int   iexp, igot;
+  long long lexp, lgot;
+  int   N = 32;
+  int	i;
+
+  fgot = 1234.0;
+  fexp = 1235.0;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < 1; i++)
+#pragma acc atomic update
+      fgot++;
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1234.0;
+  fexp = fgot - N;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+#pragma acc atomic update
+        fgot--;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1234.0;
+  fexp = fgot + N;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+#pragma acc atomic update
+        ++fgot;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1234.0;
+  fexp = fgot - N;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+#pragma acc atomic update
+        --fgot;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  /* BINOP = + */
+
+  fgot = 1234.0;
+  fexp = fgot + N;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+#pragma acc atomic update
+        fgot += expr;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1234.0;
+  fexp = fgot + N;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+#pragma acc atomic update
+        fgot = fgot + expr;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1234.0;
+  fexp = fgot + N;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+#pragma acc atomic update
+        fgot = expr + fgot;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1234.0;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 0.5;
+#pragma acc atomic update
+        fgot = (expr + expr) + fgot;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  /* BINOP = * */
+
+  fgot = 1234.0;
+  fexp = 1234.0;
+
+  for (i = 0; i < N; i++)
+    fexp *= 2.0;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+#pragma acc atomic update
+        fgot *= expr;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1234.0;
+  fexp = 1234.0;
+
+  for (i = 0; i < N; i++)
+    fexp = fexp * 2.0;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+#pragma acc atomic update
+        fgot = fgot * expr;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1234.0;
+  fexp = 1234.0;
+
+  for (i = 0; i < N; i++)
+    fexp = 2.0 * fexp;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+#pragma acc atomic update
+        fgot = expr * fgot;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1234.0;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+#pragma acc atomic update
+        fgot = (expr + expr) * fgot;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  /* BINOP = - */
+
+  fgot = 1234.0;
+  fexp = 1234.0;
+
+  for (i = 0; i < N; i++)
+    fexp -= 2.0;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+#pragma acc atomic update
+        fgot -= expr;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1234.0;
+  fexp = 1234.0;
+
+  for (i = 0; i < N; i++)
+    fexp = fexp - 2.0;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+#pragma acc atomic update
+        fgot = fgot - expr;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1234.0;
+  fexp = 1234.0;
+
+  for (i = 0; i < N; i++)
+    fexp = 2.0 - fexp;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+
+#pragma acc atomic update
+        fgot = expr - fgot;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1234.0;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+#pragma acc atomic update
+        fgot = (expr + expr) - fgot;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  /* BINOP = / */
+
+  fgot = 1234.0;
+  fexp = 1234.0;
+
+  for (i = 0; i < N; i++)
+    fexp /= 2.0;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+#pragma acc atomic update
+        fgot /= expr;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1234.0;
+  fexp = 1234.0;
+
+  for (i = 0; i < N; i++)
+    fexp = fexp / 2.0;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+	
+#pragma acc atomic update
+        fgot = fgot / expr;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1234.0;
+  fexp = 1234.0;
+
+  for (i = 0; i < N; i++)
+    fexp = 2.0 / fexp;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 2.0;
+
+#pragma acc atomic update
+        fgot = expr / fgot;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  fgot = 1234.0;
+  fexp = 1234.0;
+
+  for (i = 0; i < N; i++)
+    fexp = 2.0 / fexp;
+
+#pragma acc data copy (fgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        float expr = 1.0;
+#pragma acc atomic update
+        fgot = (expr + expr) / fgot;
+      }
+  }
+
+  if (fexp != fgot)
+    abort ();
+
+  /* BINOP = & */
+
+  igot = ~0;
+  iexp = 0;
+
+#pragma acc data copy (igot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = ~(1 << i);
+
+#pragma acc atomic update
+        igot &= expr;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = ~0;
+  iexp = 0;
+
+#pragma acc data copy (igot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = ~(1 << i);
+#pragma acc atomic update
+        igot = igot / expr;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = ~0;
+  iexp = 0;
+
+#pragma acc data copy (igot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = ~(1 << i);
+#pragma acc atomic update
+        igot = expr & igot;
+     }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = ~0;
+  iexp = 0;
+
+#pragma acc data copy (igot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = ~(1 << i);
+        int zero = 0;
+
+#pragma acc atomic update
+        igot = (expr + zero) & igot;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  /* BINOP = ^ */
+
+  igot = ~0;
+  iexp = 0;
+
+#pragma acc data copy (igot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = (1 << i);
+
+#pragma acc atomic update
+        igot ^= expr;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = ~0;
+  iexp = 0;
+
+#pragma acc data copy (igot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = (1 << i);
+
+#pragma acc atomic update
+        igot = igot ^ expr;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = ~0;
+  iexp = 0;
+
+#pragma acc data copy (igot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = (1 << i);
+
+#pragma acc atomic update
+        igot = expr ^ igot;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = ~0;
+  iexp = 0;
+
+#pragma acc data copy (igot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = (1 << i);
+        int zero = 0;
+
+#pragma acc atomic update
+        igot = (expr + zero) ^ igot;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  /* BINOP = | */
+
+  igot = 0;
+  iexp = ~0;
+
+#pragma acc data copy (igot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = (1 << i);
+
+#pragma acc atomic update
+        igot |= expr;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 0;
+  iexp = ~0;
+
+#pragma acc data copy (igot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = (1 << i);
+
+#pragma acc atomic update
+        igot = igot | expr;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 0;
+  iexp = ~0;
+
+#pragma acc data copy (igot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = (1 << i);
+
+#pragma acc atomic update
+        igot = expr | igot;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  igot = 0;
+  iexp = ~0;
+
+#pragma acc data copy (igot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        int expr = (1 << i);
+        int zero = 0;
+
+#pragma acc atomic update
+        igot = (expr + zero) | igot;
+      }
+  }
+
+  if (iexp != igot)
+    abort ();
+
+  /* BINOP = << */
+
+  lgot = 1LL;
+  lexp = 1LL << N;
+
+#pragma acc data copy (lgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 1LL;
+
+#pragma acc atomic update
+        lgot <<= expr;
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 1LL << N;
+
+#pragma acc data copy (lgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 1LL;
+
+#pragma acc atomic update
+        lgot = lgot << expr;
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 2LL;
+
+#pragma acc data copy (lgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < 1; i++)
+      {
+        long long expr = 1LL;
+
+#pragma acc atomic update
+        lgot = expr << lgot;
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 2LL;
+
+#pragma acc data copy (lgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < 1; i++)
+      {
+        long long expr = 1LL;
+        long long zero = 0LL;
+
+#pragma acc atomic update
+        lgot = (expr + zero) << lgot;
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  /* BINOP = >> */
+
+  lgot = 1LL << N;
+  lexp = 1LL;
+
+#pragma acc data copy (lgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 1LL;
+
+#pragma acc atomic update
+        lgot >>= expr;
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL << N;
+  lexp = 1LL;
+
+#pragma acc data copy (lgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < N; i++)
+      {
+        long long expr = 1LL;
+
+#pragma acc atomic update
+        lgot = lgot >> expr;
+      }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 1LL << (N - 1);
+
+#pragma acc data copy (lgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < 1; i++)
+      {
+        long long expr = 1LL << N;
+
+#pragma acc atomic update
+        lgot = expr >> lgot;
+    }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  lgot = 1LL;
+  lexp = 1LL << (N - 1);
+
+#pragma acc data copy (lgot)
+  {
+#pragma acc parallel loop
+    for (i = 0; i < 1; i++)
+      {
+        long long expr = 1LL << N;
+        long long zero = 0LL;
+
+#pragma acc atomic update
+        lgot = (expr + zero) >> lgot;
+    }
+  }
+
+  if (lexp != lgot)
+    abort ();
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-1.c
index 51c0cf5..410c46c 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/clauses-1.c
@@ -586,6 +586,32 @@ main (int argc, char **argv)
 
     for (i = 0; i < N; i++)
     {
+        a[i] = 6.0;
+        b[i] = 0.0;
+    }
+
+#pragma acc parallel pcopy (a[0:N], b[0:N])
+    {
+        int ii;
+
+        for (ii = 0; ii < N; ii++)
+            b[ii] = a[ii];
+    }
+
+    for (i = 0; i < N; i++)
+    {
+        if (b[i] != 6.0)
+            abort ();
+    }
+
+    if (acc_is_present (&a[0], (N * sizeof (float))))
+      abort ();
+
+    if (acc_is_present (&b[0], (N * sizeof (float))))
+      abort ();
+
+    for (i = 0; i < N; i++)
+    {
         a[i] = 5.0;
         b[i] = 7.0;
     }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c
index f867a66..5fc9fb6 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c
@@ -25,7 +25,33 @@ main (int argc, char **argv)
     }
 
 #pragma acc enter data copyin (a[0:N]) copyin (b[0:N]) copyin (N) async
-#pragma acc parallel async wait
+#pragma acc parallel async wait present (a[0:N]) present (b[0:N]) present (N)
+#pragma acc loop
+  for (i = 0; i < N; i++)
+    b[i] = a[i];
+
+#pragma acc exit data copyout (a[0:N]) copyout (b[0:N]) delete (N) wait async
+#pragma acc wait
+
+  for (i = 0; i < N; i++)
+    {
+      if (a[i] != 3.0)
+	abort ();
+
+      if (b[i] != 3.0)
+	abort ();
+    }
+
+  for (i = 0; i < N; i++)
+    {
+      a[i] = 3.0;
+      b[i] = 0.0;
+    }
+
+#pragma acc enter data copyin (a[0:N]) async 
+#pragma acc enter data copyin (b[0:N]) async wait
+#pragma acc enter data copyin (N) async wait
+#pragma acc parallel async wait present (a[0:N]) present (b[0:N]) present (N)
 #pragma acc loop
   for (i = 0; i < N; i++)
     b[i] = a[i];
@@ -49,7 +75,7 @@ main (int argc, char **argv)
     }
 
 #pragma acc enter data copyin (a[0:N]) copyin (b[0:N]) copyin (N) async (1)
-#pragma acc parallel async (1)
+#pragma acc parallel async (1) present (a[0:N]) present (b[0:N]) present (N)
 #pragma acc loop
   for (i = 0; i < N; i++)
     b[i] = a[i];
@@ -76,17 +102,17 @@ main (int argc, char **argv)
 
 #pragma acc enter data copyin (a[0:N]) copyin (b[0:N]) copyin (c[0:N]) copyin (d[0:N]) copyin (N) async (1)
 
-#pragma acc parallel async (1) wait (1)
+#pragma acc parallel async (1) wait (1) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (N)
 #pragma acc loop
   for (i = 0; i < N; i++)
     b[i] = (a[i] * a[i] * a[i]) / a[i];
 
-#pragma acc parallel async (2) wait (1)
+#pragma acc parallel async (2) wait (1) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (N)
 #pragma acc loop
   for (i = 0; i < N; i++)
     c[i] = (a[i] + a[i] + a[i] + a[i]) / a[i];
 
-#pragma acc parallel async (3) wait (1)
+#pragma acc parallel async (3) wait (1) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (N)
 #pragma acc loop
   for (i = 0; i < N; i++)
     d[i] = ((a[i] * a[i] + a[i]) / a[i]) - a[i];
@@ -120,19 +146,19 @@ main (int argc, char **argv)
 
 #pragma acc enter data copyin (a[0:N]) copyin (b[0:N]) copyin (c[0:N]) copyin (d[0:N]) copyin (e[0:N]) copyin (N) async (1)
 
-#pragma acc parallel async (1) wait (1)
+#pragma acc parallel async (1) wait (1) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N)
   for (int ii = 0; ii < N; ii++)
     b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii];
 
-#pragma acc parallel async (2) wait (1)
+#pragma acc parallel async (2) wait (1) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N)
   for (int ii = 0; ii < N; ii++)
     c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii];
 
-#pragma acc parallel async (3) wait (1)
+#pragma acc parallel async (3) wait (1) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N)
   for (int ii = 0; ii < N; ii++)
     d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii];
 
-#pragma acc parallel wait (1) async (4)
+#pragma acc parallel wait (1) async (4) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N)
   for (int ii = 0; ii < N; ii++)
     e[ii] = a[ii] + b[ii] + c[ii] + d[ii];
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c
index 747109f..6e173d3 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c
@@ -25,7 +25,7 @@ main (int argc, char **argv)
     }
 
 #pragma acc enter data copyin (a[0:N]) copyin (b[0:N]) copyin (N) async
-#pragma acc parallel async wait
+#pragma acc parallel async wait present (a[0:N]) present (b[0:N]) present (N)
 #pragma acc loop
   for (i = 0; i < N; i++)
     b[i] = a[i];
@@ -49,7 +49,7 @@ main (int argc, char **argv)
     }
 
 #pragma acc update device (a[0:N], b[0:N]) async (1)
-#pragma acc parallel async (1)
+#pragma acc parallel async (1) present (a[0:N]) present (b[0:N]) present (N)
 #pragma acc loop
   for (i = 0; i < N; i++)
     b[i] = a[i];
@@ -78,17 +78,17 @@ main (int argc, char **argv)
 #pragma acc update device (b[0:N]) async (2)
 #pragma acc enter data copyin (c[0:N], d[0:N]) async (3)
 
-#pragma acc parallel async (1) wait (1,2)
+#pragma acc parallel async (1) wait (1,2) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (N)
 #pragma acc loop
   for (i = 0; i < N; i++)
     b[i] = (a[i] * a[i] * a[i]) / a[i];
 
-#pragma acc parallel async (2) wait (1,3)
+#pragma acc parallel async (2) wait (1,3) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (N)
 #pragma acc loop
   for (i = 0; i < N; i++)
     c[i] = (a[i] + a[i] + a[i] + a[i]) / a[i];
 
-#pragma acc parallel async (3) wait (1,3)
+#pragma acc parallel async (3) wait (1,3) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (N)
 #pragma acc loop
   for (i = 0; i < N; i++)
     d[i] = ((a[i] * a[i] + a[i]) / a[i]) - a[i];
@@ -123,19 +123,19 @@ main (int argc, char **argv)
 #pragma acc update device (a[0:N], b[0:N], c[0:N], d[0:N]) async (1)
 #pragma acc enter data copyin (e[0:N]) async (5)
 
-#pragma acc parallel async (1) wait (1)
+#pragma acc parallel async (1) wait (1) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N)
   for (int ii = 0; ii < N; ii++)
     b[ii] = (a[ii] * a[ii] * a[ii]) / a[ii];
 
-#pragma acc parallel async (2) wait (1)
+#pragma acc parallel async (2) wait (1) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N)
   for (int ii = 0; ii < N; ii++)
     c[ii] = (a[ii] + a[ii] + a[ii] + a[ii]) / a[ii];
 
-#pragma acc parallel async (3) wait (1)
+#pragma acc parallel async (3) wait (1) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N)
   for (int ii = 0; ii < N; ii++)
     d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii];
 
-#pragma acc parallel wait (1,5) async (4)
+#pragma acc parallel wait (1,5) async (4) present (a[0:N]) present (b[0:N]) present (c[0:N]) present (d[0:N]) present (e[0:N]) present (N)
   for (int ii = 0; ii < N; ii++)
     e[ii] = a[ii] + b[ii] + c[ii] + d[ii];
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
new file mode 100644
index 0000000..8341053
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
@@ -0,0 +1,202 @@
+int i;
+
+int main(void)
+{
+  int j, v;
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) copyin (i, j)
+  {
+    if (i != -1 || j != -2)
+      abort ();
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) copyout (i, j)
+  {
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) copy (i, j)
+  {
+    if (i != -1 || j != -2)
+      abort ();
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) create (i, j)
+  {
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_copyin (i, j)
+  {
+    if (i != -1 || j != -2)
+      abort ();
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1)
+    abort ();
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_copyout (i, j)
+  {
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_copy (i, j)
+  {
+    if (i != -1 || j != -2)
+      abort ();
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+
+  i = -1;
+  j = -2;
+  v = 0;
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present_or_create (i, j)
+  {
+    i = 2;
+    j = 1;
+    if (i != 2 || j != 1)
+      abort ();
+    v = 1;
+  }
+  if (v != 1)
+    abort ();
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+
+#pragma acc data copyin (i, j)
+  {
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v) present (i, j)
+    {
+      if (i != -1 || j != -2)
+	abort ();
+      i = 2;
+      j = 1;
+      if (i != 2 || j != 1)
+	abort ();
+      v = 1;
+    }
+  }
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  i = -1;
+  j = -2;
+  v = 0;
+
+#pragma acc data copyin(i, j)
+  {
+#pragma acc EXEC_DIRECTIVE /* copyout */ present_or_copyout (v)
+    {
+      if (i != -1 || j != -2)
+	abort ();
+      i = 2;
+      j = 1;
+      if (i != 2 || j != 1)
+	abort ();
+      v = 1;
+    }
+  }
+#if ACC_MEM_SHARED
+  if (v != 1 || i != 2 || j != 1)
+    abort ();
+#else
+  if (v != 1 || i != -1 || j != -2)
+    abort ();
+#endif
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
index 3acfdf5..aeb0142 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/kernels-1.c
@@ -2,183 +2,5 @@
 
 #include <stdlib.h>
 
-int i;
-
-int main (void)
-{
-  int j, v;
-
-#if 0
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) copyin (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) copyout (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) copy (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) create (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyin (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1)
-    abort ();
-#if ACC_MEM_SHARED
-  if (i != 2 || j != 1)
-    abort ();
-#else
-  if (i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copyout (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_copy (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present_or_create (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1)
-    abort ();
-#if ACC_MEM_SHARED
-  if (i != 2 || j != 1)
-    abort ();
-#else
-  if (i != -1 || j != -2)
-    abort ();
-#endif
-
-#if 0
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v) present (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#endif
-
-#if 0
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc kernels /* copyout */ present_or_copyout (v)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#endif
-
-  return 0;
-}
+#define EXEC_DIRECTIVE kernels
+#include "data-clauses.h"
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-69.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-69.c
index 5462f12..78c834a 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-69.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-69.c
@@ -9,46 +9,14 @@
 int
 main (int argc, char **argv)
 {
-  CUdevice dev;
   CUfunction delay;
   CUmodule module;
   CUresult r;
   CUstream stream;
-  unsigned long *a, *d_a, dticks;
-  int nbytes;
-  float dtime;
-  void *kargs[2];
-  int clkrate;
-  int devnum, nprocs;
 
   acc_init (acc_device_nvidia);
 
-  devnum = acc_get_device_num (acc_device_nvidia);
-
-  r = cuDeviceGet (&dev, devnum);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
-      abort ();
-    }
-
-  r =
-    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
-			  dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
-  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
-  r = cuModuleLoad (&module, "subr.ptx");
+  r = cuModuleLoad (&module, "./subr.ptx");
   if (r != CUDA_SUCCESS)
     {
       fprintf (stderr, "cuModuleLoad failed: %d\n", r);
@@ -62,20 +30,6 @@ main (int argc, char **argv)
       abort ();
     }
 
-  nbytes = nprocs * sizeof (unsigned long);
-
-  dtime = 200.0;
-
-  dticks = (unsigned long) (dtime * clkrate);
-
-  a = (unsigned long *) malloc (nbytes);
-  d_a = (unsigned long *) acc_malloc (nbytes);
-
-  acc_map_data (a, d_a, nbytes);
-
-  kargs[0] = (void *) &d_a;
-  kargs[1] = (void *) &dticks;
-
   stream = (CUstream) acc_get_cuda_stream (0);
   if (stream != NULL)
     abort ();
@@ -90,31 +44,21 @@ main (int argc, char **argv)
   if (!acc_set_cuda_stream (0, stream))
     abort ();
 
-  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, NULL, 0);
   if (r != CUDA_SUCCESS)
     {
       fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
       abort ();
     }
 
-  if (acc_async_test (0) != 0)
-    {
-      fprintf (stderr, "asynchronous operation not running\n");
-      abort ();
-    }
+  if (acc_async_test (0) == 1)
+    fprintf (stderr, "expected asynchronous operation to be running\n");
 
-  sleep (1);
+  acc_wait_all ();
 
-  if (acc_async_test (0) != 1)
-    {
-      fprintf (stderr, "found asynchronous operation still running\n");
-      abort ();
-    }
+  if (acc_async_test (0) == 0)
+    fprintf (stderr, "expected asynchronous operation to be running\n");
 
-  acc_unmap_data (a);
-
-  free (a);
-  acc_free (d_a);
 
   acc_shutdown (acc_device_nvidia);
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-70.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-70.c
index 912b266..ee06898 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-70.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-70.c
@@ -1,6 +1,7 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
 /* { dg-additional-options "-lcuda" } */
 
+#include <sys/time.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>
@@ -10,47 +11,17 @@
 int
 main (int argc, char **argv)
 {
-  CUdevice dev;
   CUfunction delay;
   CUmodule module;
   CUresult r;
-  const int N = 10;
+  const int N = 3;
   int i;
   CUstream streams[N];
-  unsigned long *a, *d_a, dticks;
-  int nbytes;
-  float dtime;
-  void *kargs[2];
-  int clkrate;
-  int devnum, nprocs;
+  struct timeval tv1, tv2;
+  time_t diff;
 
   acc_init (acc_device_nvidia);
 
-  devnum = acc_get_device_num (acc_device_nvidia);
-
-  r = cuDeviceGet (&dev, devnum);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
-      abort ();
-    }
-
-  r =
-    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
-			  dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
-  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
   r = cuModuleLoad (&module, "subr.ptx");
   if (r != CUDA_SUCCESS)
     {
@@ -65,20 +36,6 @@ main (int argc, char **argv)
       abort ();
     }
 
-  nbytes = nprocs * sizeof (unsigned long);
-
-  dtime = 200.0;
-
-  dticks = (unsigned long) (dtime * clkrate);
-
-  a = (unsigned long *) malloc (nbytes);
-  d_a = (unsigned long *) acc_malloc (nbytes);
-
-  acc_map_data (a, d_a, nbytes);
-
-  kargs[0] = (void *) &d_a;
-  kargs[1] = (void *) &dticks;
-
   for (i = 0; i < N; i++)
     {
       streams[i] = (CUstream) acc_get_cuda_stream (i);
@@ -96,9 +53,29 @@ main (int argc, char **argv)
 	  abort ();
     }
 
+  gettimeofday (&tv1, NULL);
+
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[0], NULL, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+      abort ();
+    }
+
+  r = cuCtxSynchronize ();
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuCtxLaunch failed: %d\n", r);
+      abort ();
+    }
+
+  gettimeofday (&tv2, NULL);
+
+  diff = tv2.tv_sec - tv1.tv_sec;
+
   for (i = 0; i < N; i++)
     {
-      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], kargs, 0);
+      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], NULL, 0);
       if (r != CUDA_SUCCESS)
 	{
 	  fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
@@ -112,7 +89,7 @@ main (int argc, char **argv)
 	}
     }
 
-  sleep ((int) (dtime / 1000.0f) + 1);
+  sleep ((diff + 1) * N);
 
   for (i = 0; i < N; i++)
     {
@@ -123,10 +100,6 @@ main (int argc, char **argv)
 	}
     }
 
-  acc_unmap_data (a);
-
-  free (a);
-  acc_free (d_a);
 
   acc_shutdown (acc_device_nvidia);
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-71.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-71.c
index e8584db..8db6bcb 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-71.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-71.c
@@ -9,45 +9,13 @@
 int
 main (int argc, char **argv)
 {
-  CUdevice dev;
   CUfunction delay;
   CUmodule module;
   CUresult r;
   CUstream stream;
-  unsigned long *a, *d_a, dticks;
-  int nbytes;
-  float dtime;
-  void *kargs[2];
-  int clkrate;
-  int devnum, nprocs;
 
   acc_init (acc_device_nvidia);
 
-  devnum = acc_get_device_num (acc_device_nvidia);
-
-  r = cuDeviceGet (&dev, devnum);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
-      abort ();
-    }
-
-  r =
-    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
-			  dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
-  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
   r = cuModuleLoad (&module, "subr.ptx");
   if (r != CUDA_SUCCESS)
     {
@@ -62,20 +30,6 @@ main (int argc, char **argv)
       abort ();
     }
 
-  nbytes = nprocs * sizeof (unsigned long);
-
-  dtime = 200.0;
-
-  dticks = (unsigned long) (dtime * clkrate);
-
-  a = (unsigned long *) malloc (nbytes);
-  d_a = (unsigned long *) acc_malloc (nbytes);
-
-  acc_map_data (a, d_a, nbytes);
-
-  kargs[0] = (void *) &d_a;
-  kargs[1] = (void *) &dticks;
-
   r = cuStreamCreate (&stream, CU_STREAM_DEFAULT);
   if (r != CUDA_SUCCESS)
 	{
@@ -85,7 +39,7 @@ main (int argc, char **argv)
 
   acc_set_cuda_stream (0, stream);
 
-  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, NULL, 0);
   if (r != CUDA_SUCCESS)
     {
       fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
@@ -98,7 +52,7 @@ main (int argc, char **argv)
       abort ();
     }
 
-  sleep ((int) (dtime / 1000.0f) + 1);
+  sleep (1);
 
   if (acc_async_test (1) != 1)
     {
@@ -106,11 +60,6 @@ main (int argc, char **argv)
       abort ();
     }
 
-  acc_unmap_data (a);
-
-  free (a);
-  acc_free (d_a);
-
   acc_shutdown (acc_device_nvidia);
 
   return 0;
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-72.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-72.c
index e383ba0..920ff5f 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-72.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-72.c
@@ -10,45 +10,13 @@
 int
 main (int argc, char **argv)
 {
-  CUdevice dev;
   CUfunction delay;
   CUmodule module;
   CUresult r;
   CUstream stream;
-  unsigned long *a, *d_a, dticks;
-  int nbytes;
-  float dtime;
-  void *kargs[2];
-  int clkrate;
-  int devnum, nprocs;
 
   acc_init (acc_device_nvidia);
 
-  devnum = acc_get_device_num (acc_device_nvidia);
-
-  r = cuDeviceGet (&dev, devnum);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
-      abort ();
-    }
-
-  r =
-    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
-			  dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
-  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
   r = cuModuleLoad (&module, "subr.ptx");
   if (r != CUDA_SUCCESS)
     {
@@ -63,20 +31,6 @@ main (int argc, char **argv)
       abort ();
     }
 
-  nbytes = nprocs * sizeof (unsigned long);
-
-  dtime = 200.0;
-
-  dticks = (unsigned long) (dtime * clkrate);
-
-  a = (unsigned long *) malloc (nbytes);
-  d_a = (unsigned long *) acc_malloc (nbytes);
-
-  acc_map_data (a, d_a, nbytes);
-
-  kargs[0] = (void *) &d_a;
-  kargs[1] = (void *) &dticks;
-
   r = cuStreamCreate (&stream, CU_STREAM_DEFAULT);
   if (r != CUDA_SUCCESS)
     {
@@ -87,7 +41,7 @@ main (int argc, char **argv)
   if (!acc_set_cuda_stream (0, stream))
     abort ();
     
-  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, NULL, 0);
   if (r != CUDA_SUCCESS)
     {
       fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
@@ -100,7 +54,12 @@ main (int argc, char **argv)
       abort ();
     }
 
-  sleep ((int) (dtime / 1000.f) + 1);
+  r = cuCtxSynchronize ();
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuCtxSynchronize () failed: %d\n", r);
+      abort ();
+    }
 
   if (acc_async_test_all () != 1)
     {
@@ -108,11 +67,6 @@ main (int argc, char **argv)
       abort ();
     }
 
-  acc_unmap_data (a);
-
-  free (a);
-  acc_free (d_a);
-
   acc_shutdown (acc_device_nvidia);
 
   exit (0);
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-73.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-73.c
index 43a8b7e..4fa9d5a 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-73.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-73.c
@@ -1,6 +1,7 @@
 /* { dg-do run { target openacc_nvidia_accel_selected } } */
 /* { dg-additional-options "-lcuda" } */
 
+#include <sys/time.h>
 #include <stdio.h>
 #include <unistd.h>
 #include <stdlib.h>
@@ -10,47 +11,15 @@
 int
 main (int argc, char **argv)
 {
-  CUdevice dev;
   CUfunction delay;
   CUmodule module;
   CUresult r;
-  const int N = 10;
+  const int N = 6;
   int i;
   CUstream streams[N];
-  unsigned long *a, *d_a, dticks;
-  int nbytes;
-  float dtime;
-  void *kargs[2];
-  int clkrate;
-  int devnum, nprocs;
 
   acc_init (acc_device_nvidia);
 
-  devnum = acc_get_device_num (acc_device_nvidia);
-
-  r = cuDeviceGet (&dev, devnum);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
-      abort ();
-    }
-
-  r =
-    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
-			  dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
-  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
   r = cuModuleLoad (&module, "subr.ptx");
   if (r != CUDA_SUCCESS)
     {
@@ -65,20 +34,6 @@ main (int argc, char **argv)
       abort ();
     }
 
-  nbytes = nprocs * sizeof (unsigned long);
-
-  dtime = 200.0;
-
-  dticks = (unsigned long) (dtime * clkrate);
-
-  a = (unsigned long *) malloc (nbytes);
-  d_a = (unsigned long *) acc_malloc (nbytes);
-
-  acc_map_data (a, d_a, nbytes);
-
-  kargs[0] = (void *) &d_a;
-  kargs[1] = (void *) &dticks;
-
   for (i = 0; i < N; i++)
     {
       streams[i] = (CUstream) acc_get_cuda_stream (i);
@@ -98,13 +53,12 @@ main (int argc, char **argv)
 
   for (i = 0; i < N; i++)
     {
-      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], kargs, 0);
+      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], NULL, 0);
       if (r != CUDA_SUCCESS)
 	{
 	  fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
 	  abort ();
 	}
-
     }
 
   if (acc_async_test_all () != 0)
@@ -113,7 +67,12 @@ main (int argc, char **argv)
       abort ();
     }
 
-  sleep ((int) (dtime / 1000.0f) + 1);
+  r = cuCtxSynchronize ();
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuCtxSynchronize failed: %d\n", r);
+      abort ();
+    }
 
   if (acc_async_test_all () != 1)
     {
@@ -121,11 +80,6 @@ main (int argc, char **argv)
       abort ();
     }
 
-  acc_unmap_data (a);
-
-  free (a);
-  acc_free (d_a);
-
   acc_shutdown (acc_device_nvidia);
 
   exit (0);
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-74.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-74.c
index 0726ee4..e25d894 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-74.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-74.c
@@ -5,50 +5,20 @@
 #include <stdlib.h>
 #include <openacc.h>
 #include <cuda.h>
-#include "timer.h"
+#include <sys/time.h>
 
 int
 main (int argc, char **argv)
 {
-  CUdevice dev;
   CUfunction delay;
   CUmodule module;
   CUresult r;
   CUstream stream;
-  unsigned long *a, *d_a, dticks;
-  int nbytes;
-  float atime, dtime;
-  void *kargs[2];
-  int clkrate;
-  int devnum, nprocs;
+  struct timeval tv1, tv2;
+  time_t t1, t2;
 
   acc_init (acc_device_nvidia);
 
-  devnum = acc_get_device_num (acc_device_nvidia);
-
-  r = cuDeviceGet (&dev, devnum);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
-      abort ();
-    }
-
-  r =
-    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
-			  dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
-  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
   r = cuModuleLoad (&module, "subr.ptx");
   if (r != CUDA_SUCCESS)
     {
@@ -63,19 +33,25 @@ main (int argc, char **argv)
       abort ();
     }
 
-  nbytes = nprocs * sizeof (unsigned long);
+  gettimeofday (&tv1, NULL);
 
-  dtime = 200.0;
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, NULL, NULL, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+      abort ();
+    }
 
-  dticks = (unsigned long) (dtime * clkrate);
+  r = cuCtxSynchronize ();
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuCtxSynchronize failed: %d\n", r);
+      abort ();
+    }
 
-  a = (unsigned long *) malloc (nbytes);
-  d_a = (unsigned long *) acc_malloc (nbytes);
+  gettimeofday (&tv2, NULL);
 
-  acc_map_data (a, d_a, nbytes);
-
-  kargs[0] = (void *) &d_a;
-  kargs[1] = (void *) &dticks;
+  t1 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
 
   stream = (CUstream) acc_get_cuda_stream (0);
   if (stream != NULL)
@@ -91,11 +67,9 @@ main (int argc, char **argv)
   if (!acc_set_cuda_stream (0, stream))
     abort ();
 
-  init_timers (1);
+  gettimeofday (&tv1, NULL);
 
-  start_timer (0);
-
-  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, NULL, 0);
   if (r != CUDA_SUCCESS)
     {
       fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
@@ -104,33 +78,30 @@ main (int argc, char **argv)
 
   acc_wait (0);
 
-  atime = stop_timer (0);
+  gettimeofday (&tv2, NULL);
 
-  if (atime < dtime)
+  t2 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
+
+  if (((abs (t2 - t1) / t1) * 100.0) > 1.0)
     {
-      fprintf (stderr, "actual time < delay time\n");
+      fprintf (stderr, "too long 1\n");
       abort ();
     }
 
-  start_timer (0);
+  gettimeofday (&tv1, NULL);
 
   acc_wait (0);
 
-  atime = stop_timer (0);
+  gettimeofday (&tv2, NULL);
 
-  if (0.010 < atime)
+  t2 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
+
+  if (t2 > 1000)
     {
-      fprintf (stderr, "actual time too long\n");
+      fprintf (stderr, "too long 2\n");
       abort ();
     }
 
-  acc_unmap_data (a);
-
-  fini_timers ();
-
-  free (a);
-  acc_free (d_a);
-
   acc_shutdown (acc_device_nvidia);
 
   exit (0);
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-75.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-75.c
index 1942211..53e285f 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-75.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-75.c
@@ -6,52 +6,22 @@
 #include <stdlib.h>
 #include <openacc.h>
 #include <cuda.h>
-#include "timer.h"
+#include <sys/time.h>
 
 int
 main (int argc, char **argv)
 {
-  CUdevice dev;
   CUfunction delay;
   CUmodule module;
   CUresult r;
-  int N;
+  const int N = 2;
   int i;
   CUstream stream;
-  unsigned long *a, *d_a, dticks;
-  int nbytes;
-  float atime, dtime, hitime, lotime;
-  void *kargs[2];
-  int clkrate;
-  int devnum, nprocs;
+  struct timeval tv1, tv2;
+  time_t t1, t2;
 
   acc_init (acc_device_nvidia);
 
-  devnum = acc_get_device_num (acc_device_nvidia);
-
-  r = cuDeviceGet (&dev, devnum);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
-      abort ();
-    }
-
-  r =
-    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
-			  dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
-  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
   r = cuModuleLoad (&module, "subr.ptx");
   if (r != CUDA_SUCCESS)
     {
@@ -66,18 +36,25 @@ main (int argc, char **argv)
       abort ();
     }
 
-  nbytes = nprocs * sizeof (unsigned long);
+  gettimeofday (&tv1, NULL);
 
-  dtime = 200.0;
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, NULL, NULL, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+      abort ();
+    }
 
-  dticks = (unsigned long) (dtime * clkrate);
+  r = cuCtxSynchronize ();
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuCtxSynchronize failed: %d\n", r);
+      abort ();
+    }
 
-  N = nprocs;
+  gettimeofday (&tv2, NULL);
 
-  a = (unsigned long *) malloc (nbytes);
-  d_a = (unsigned long *) acc_malloc (nbytes);
-
-  acc_map_data (a, d_a, nbytes);
+  t1 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
 
   stream = (CUstream) acc_get_cuda_stream (0);
   if (stream != NULL)
@@ -93,16 +70,11 @@ main (int argc, char **argv)
   if (!acc_set_cuda_stream (0, stream))
     abort ();
 
-  init_timers (1);
-
-  kargs[0] = (void *) &d_a;
-  kargs[1] = (void *) &dticks;
-
-  start_timer (0);
+  gettimeofday (&tv1, NULL);
 
   for (i = 0; i < N; i++)
     {
-      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
+      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, NULL, 0);
       if (r != CUDA_SUCCESS)
 	{
 	  fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
@@ -112,27 +84,18 @@ main (int argc, char **argv)
       acc_wait (0);
     }
 
-  atime = stop_timer (0);
+  gettimeofday (&tv2, NULL);
 
-  hitime = dtime * N;
-  hitime += hitime * 0.02;
+  t2 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
 
-  lotime = dtime * N;
-  lotime -= lotime * 0.02;
+  t1 *= N;
 
-  if (atime > hitime || atime < lotime)
+  if (((abs (t2 - t1) / t1) * 100.0) > 1.0)
     {
-      fprintf (stderr, "actual time < delay time\n");
+      fprintf (stderr, "too long\n");
       abort ();
     }
 
-  acc_unmap_data (a);
-
-  fini_timers ();
-
-  free (a);
-  acc_free (d_a);
-
   acc_shutdown (acc_device_nvidia);
 
   exit (0);
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-76.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-76.c
index 11d9d62..787dcb8 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-76.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-76.c
@@ -6,52 +6,22 @@
 #include <unistd.h>
 #include <openacc.h>
 #include <cuda.h>
-#include "timer.h"
+#include <sys/time.h>
 
 int
 main (int argc, char **argv)
 {
-  CUdevice dev;
   CUfunction delay;
   CUmodule module;
   CUresult r;
-  int N;
+  const int N = 2;
   int i;
   CUstream *streams;
-  unsigned long *a, *d_a, dticks;
-  int nbytes;
-  float atime, dtime, hitime, lotime;
-  void *kargs[2];
-  int clkrate;
-  int devnum, nprocs;
+  struct timeval tv1, tv2;
+  time_t t1, t2;
 
   acc_init (acc_device_nvidia);
 
-  devnum = acc_get_device_num (acc_device_nvidia);
-
-  r = cuDeviceGet (&dev, devnum);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
-      abort ();
-    }
-
-  r =
-    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
-			  dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
-  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
   r = cuModuleLoad (&module, "subr.ptx");
   if (r != CUDA_SUCCESS)
     {
@@ -66,18 +36,25 @@ main (int argc, char **argv)
       abort ();
     }
 
-  nbytes = nprocs * sizeof (unsigned long);
+  gettimeofday (&tv1, NULL);
 
-  dtime = 200.0;
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, NULL, NULL, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+      abort ();
+    }
 
-  dticks = (unsigned long) (dtime * clkrate);
+  r = cuCtxSynchronize ();
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuCtxSynchronize failed: %d\n", r);
+      abort ();
+    }
 
-  N = nprocs;
+  gettimeofday (&tv2, NULL);
 
-  a = (unsigned long *) malloc (nbytes);
-  d_a = (unsigned long *) acc_malloc (nbytes);
-
-  acc_map_data (a, d_a, nbytes);
+  t1 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
 
   streams = (CUstream *) malloc (N * sizeof (void *));
 
@@ -98,16 +75,11 @@ main (int argc, char **argv)
 	  abort ();
     }
 
-  init_timers (1);
-
-  kargs[0] = (void *) &d_a;
-  kargs[1] = (void *) &dticks;
-
-  start_timer (0);
+  gettimeofday (&tv1, NULL);
 
   for (i = 0; i < N; i++)
     {
-      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], kargs, 0);
+      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], NULL, 0);
       if (r != CUDA_SUCCESS)
 	{
 	  fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
@@ -117,27 +89,19 @@ main (int argc, char **argv)
       acc_wait (i);
     }
 
-  atime = stop_timer (0);
+  gettimeofday (&tv2, NULL);
 
-  hitime = dtime * N;
-  hitime += hitime * 0.02;
+  t2 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
 
-  lotime = dtime * N;
-  lotime -= lotime * 0.02;
+  t1 *= N;
 
-  if (atime > hitime || atime < lotime)
+  if (((abs (t2 - t1) / t1) * 100.0) > 1.0)
     {
-      fprintf (stderr, "actual time < delay time\n");
+      fprintf (stderr, "too long\n");
       abort ();
     }
 
-  acc_unmap_data (a);
-
-  fini_timers ();
-
   free (streams);
-  free (a);
-  acc_free (d_a);
 
   acc_shutdown (acc_device_nvidia);
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-77.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-77.c
index 35a0980..5ef6fd9 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-77.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-77.c
@@ -6,50 +6,20 @@
 #include <unistd.h>
 #include <openacc.h>
 #include <cuda.h>
-#include "timer.h"
+#include <sys/time.h>
 
 int
 main (int argc, char **argv)
 {
-  CUdevice dev;
   CUfunction delay;
   CUmodule module;
   CUresult r;
   CUstream stream;
-  unsigned long *a, *d_a, dticks;
-  int nbytes;
-  float atime, dtime;
-  void *kargs[2];
-  int clkrate;
-  int devnum, nprocs;
+  struct timeval tv1, tv2;
+  time_t t1, t2;
 
   acc_init (acc_device_nvidia);
 
-  devnum = acc_get_device_num (acc_device_nvidia);
-
-  r = cuDeviceGet (&dev, devnum);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
-      abort ();
-    }
-
-  r =
-    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
-			  dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
-  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
   r = cuModuleLoad (&module, "subr.ptx");
   if (r != CUDA_SUCCESS)
     {
@@ -64,19 +34,25 @@ main (int argc, char **argv)
       abort ();
     }
 
-  nbytes = nprocs * sizeof (unsigned long);
+  gettimeofday (&tv1, NULL);
 
-  dtime = 200.0;
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, NULL, NULL, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+      abort ();
+    }
 
-  dticks = (unsigned long) (dtime * clkrate);
+  r = cuCtxSynchronize();
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuCtxSynchronize failed: %d\n", r);
+      abort ();
+    }
 
-  a = (unsigned long *) malloc (nbytes);
-  d_a = (unsigned long *) acc_malloc (nbytes);
+  gettimeofday (&tv2, NULL);
 
-  acc_map_data (a, d_a, nbytes);
-
-  kargs[0] = (void *) &d_a;
-  kargs[1] = (void *) &dticks;
+  t1 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
 
   r = cuStreamCreate (&stream, CU_STREAM_DEFAULT);
   if (r != CUDA_SUCCESS)
@@ -87,11 +63,9 @@ main (int argc, char **argv)
 
   acc_set_cuda_stream (0, stream);
 
-  init_timers (1);
+  gettimeofday (&tv1, NULL);
 
-  start_timer (0);
-
-  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, NULL, 0);
   if (r != CUDA_SUCCESS)
     {
       fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
@@ -100,33 +74,30 @@ main (int argc, char **argv)
 
   acc_wait (1);
 
-  atime = stop_timer (0);
+  gettimeofday (&tv2, NULL);
 
-  if (atime < dtime)
+  t2 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
+
+  if (t2 > t1)
     {
-      fprintf (stderr, "actual time < delay time\n");
+      fprintf (stderr, "too long 1\n");
       abort ();
     }
 
-  start_timer (0);
+  gettimeofday (&tv1, NULL);
 
   acc_wait (1);
 
-  atime = stop_timer (0);
+  gettimeofday (&tv2, NULL);
 
-  if (0.010 < atime)
+  t2 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
+
+  if (t2 > 1000)
     {
-      fprintf (stderr, "actual time < delay time\n");
+      fprintf (stderr, "too long 2\n");
       abort ();
     }
 
-  acc_unmap_data (a);
-
-  fini_timers ();
-
-  free (a);
-  acc_free (d_a);
-
   acc_shutdown (acc_device_nvidia);
 
   return 0;
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-78.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-78.c
index 4f58fb2..0bed15f 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-78.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-78.c
@@ -6,50 +6,20 @@
 #include <unistd.h>
 #include <openacc.h>
 #include <cuda.h>
-#include "timer.h"
+#include <sys/time.h>
 
 int
 main (int argc, char **argv)
 {
-  CUdevice dev;
   CUfunction delay;
   CUmodule module;
   CUresult r;
   CUstream stream;
-  unsigned long *a, *d_a, dticks;
-  int nbytes;
-  float atime, dtime;
-  void *kargs[2];
-  int clkrate;
-  int devnum, nprocs;
+  struct timeval tv1, tv2;
+  time_t t1, t2;
 
   acc_init (acc_device_nvidia);
 
-  devnum = acc_get_device_num (acc_device_nvidia);
-
-  r = cuDeviceGet (&dev, devnum);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
-      abort ();
-    }
-
-  r =
-    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
-			  dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
-  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
   r = cuModuleLoad (&module, "subr.ptx");
   if (r != CUDA_SUCCESS)
     {
@@ -64,19 +34,25 @@ main (int argc, char **argv)
       abort ();
     }
 
-  nbytes = nprocs * sizeof (unsigned long);
+  gettimeofday (&tv1, NULL);
 
-  dtime = 200.0;
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, NULL, NULL, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+      abort ();
+    }
 
-  dticks = (unsigned long) (dtime * clkrate);
+  r = cuCtxSynchronize ();
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuCtxSynchronize failed: %d\n", r);
+      abort ();
+    }
 
-  a = (unsigned long *) malloc (nbytes);
-  d_a = (unsigned long *) acc_malloc (nbytes);
+  gettimeofday (&tv2, NULL);
 
-  acc_map_data (a, d_a, nbytes);
-
-  kargs[0] = (void *) &d_a;
-  kargs[1] = (void *) &dticks;
+  t1 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
 
   stream = (CUstream) acc_get_cuda_stream (0);
   if (stream != NULL)
@@ -92,11 +68,9 @@ main (int argc, char **argv)
   if (!acc_set_cuda_stream (0, stream))
     abort ();
 
-  init_timers (1);
+  gettimeofday (&tv1, NULL);
 
-  start_timer (0);
-
-  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, NULL, 0);
   if (r != CUDA_SUCCESS)
     {
       fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
@@ -105,33 +79,30 @@ main (int argc, char **argv)
 
   acc_wait_all ();
 
-  atime = stop_timer (0);
+  gettimeofday (&tv2, NULL);
 
-  if (atime < dtime)
+  t2 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
+
+  if (t2 > (t1 + (t1 * 0.10)))
     {
-      fprintf (stderr, "actual time < delay time\n");
+      fprintf (stderr, "too long 1\n");
       abort ();
     }
 
-  start_timer (0);
+  gettimeofday (&tv1, NULL);
 
   acc_wait_all ();
 
-  atime = stop_timer (0);
+  gettimeofday (&tv2, NULL);
 
-  if (0.010 < atime)
+  t2 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
+
+  if (t2 > 1000)
     {
-      fprintf (stderr, "actual time too long\n");
+      fprintf (stderr, "too long 2\n");
       abort ();
     }
 
-  acc_unmap_data (a);
-
-  fini_timers ();
-
-  free (a);
-  acc_free (d_a);
-
   acc_shutdown (acc_device_nvidia);
 
   exit (0);
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-79.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-79.c
index ef3df13..5723588 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-79.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-79.c
@@ -6,54 +6,22 @@
 #include <unistd.h>
 #include <openacc.h>
 #include <cuda.h>
-#include "timer.h"
+#include <sys/time.h>
 
 int
 main (int argc, char **argv)
 {
-  CUdevice dev;
   CUfunction delay;
   CUmodule module;
   CUresult r;
-  int N;
+  const int N = 2;
   int i;
   CUstream stream;
-  unsigned long *a, *d_a, dticks;
-  int nbytes;
-  float atime, dtime, hitime, lotime;
-  void *kargs[2];
-  int clkrate;
-  int devnum, nprocs;
-
-  devnum = 2;
+  struct timeval tv1, tv2;
+  time_t t1, t2;
 
   acc_init (acc_device_nvidia);
 
-  devnum = acc_get_device_num (acc_device_nvidia);
-
-  r = cuDeviceGet (&dev, devnum);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
-      abort ();
-    }
-
-  r =
-    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
-			  dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
-  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
   r = cuModuleLoad (&module, "subr.ptx");
   if (r != CUDA_SUCCESS)
     {
@@ -68,18 +36,25 @@ main (int argc, char **argv)
       abort ();
     }
 
-  nbytes = nprocs * sizeof (unsigned long);
+  gettimeofday (&tv1, NULL);
 
-  dtime = 200.0;
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, NULL, NULL, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+      abort ();
+    }
 
-  dticks = (unsigned long) (dtime * clkrate);
+  r = cuCtxSynchronize ();
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuCtxSynchronize failed: %d\n", r);
+      abort ();
+    }
 
-  N = nprocs;
+  gettimeofday (&tv2, NULL);
 
-  a = (unsigned long *) malloc (nbytes);
-  d_a = (unsigned long *) acc_malloc (nbytes);
-
-  acc_map_data (a, d_a, nbytes);
+  t1 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
 
   r = cuStreamCreate (&stream, CU_STREAM_DEFAULT);
   if (r != CUDA_SUCCESS)
@@ -105,16 +80,11 @@ main (int argc, char **argv)
   if (!acc_set_cuda_stream (0, stream))
     abort ();
 
-  init_timers (1);
-
-  kargs[0] = (void *) &d_a;
-  kargs[1] = (void *) &dticks;
-
-  start_timer (0);
+  gettimeofday (&tv1, NULL);
 
   for (i = 0; i < N; i++)
     {
-      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
+      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, NULL, 0);
       if (r != CUDA_SUCCESS)
 	{
 	  fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
@@ -132,7 +102,7 @@ main (int argc, char **argv)
 
   acc_wait (1);
 
-  atime = stop_timer (0);
+  gettimeofday (&tv2, NULL);
 
   if (acc_async_test (0) != 1)
     abort ();
@@ -140,25 +110,16 @@ main (int argc, char **argv)
   if (acc_async_test (1) != 1)
     abort ();
 
-  hitime = dtime * N;
-  hitime += hitime * 0.02;
+  t2 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
 
-  lotime = dtime * N;
-  lotime -= lotime * 0.02;
+  t1 *= N;
 
-  if (atime > hitime || atime < lotime)
+  if (((abs (t2 - t1) / t1) * 100.0) > 1.0)
     {
-      fprintf (stderr, "actual time < delay time\n");
+      fprintf (stderr, "too long\n");
       abort ();
     }
 
-  acc_unmap_data (a);
-
-  fini_timers ();
-
-  free (a);
-  acc_free (d_a);
-
   acc_shutdown (acc_device_nvidia);
 
   exit (0);
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-80.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-80.c
index d521331..ec98119 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-80.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-80.c
@@ -6,52 +6,22 @@
 #include <unistd.h>
 #include <openacc.h>
 #include <cuda.h>
-#include "timer.h"
+#include <sys/time.h>
 
 int
 main (int argc, char **argv)
 {
-  CUdevice dev;
   CUfunction delay;
   CUmodule module;
   CUresult r;
   CUstream stream;
-  int N;
+  const int N = 2;
   int i;
-  unsigned long *a, *d_a, dticks;
-  int nbytes;
-  float atime, dtime;
-  void *kargs[2];
-  int clkrate;
-  int devnum, nprocs;
+  struct timeval tv1, tv2;
+  time_t t1, t2;
 
   acc_init (acc_device_nvidia);
 
-  devnum = acc_get_device_num (acc_device_nvidia);
-
-  r = cuDeviceGet (&dev, devnum);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
-      abort ();
-    }
-
-  r =
-    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
-			  dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
-  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
   r = cuModuleLoad (&module, "subr.ptx");
   if (r != CUDA_SUCCESS)
     {
@@ -66,38 +36,40 @@ main (int argc, char **argv)
       abort ();
     }
 
-  nbytes = nprocs * sizeof (unsigned long);
+  gettimeofday (&tv1, NULL);
 
-  dtime = 200.0;
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, NULL, NULL, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+      abort ();
+    }
 
-  dticks = (unsigned long) (dtime * clkrate);
+  r = cuCtxSynchronize();
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuCtxSynchronize failed: %d\n", r);
+      abort ();
+    }
 
-  N = nprocs;
+  gettimeofday (&tv2, NULL);
 
-  a = (unsigned long *) malloc (nbytes);
-  d_a = (unsigned long *) acc_malloc (nbytes);
-
-  acc_map_data (a, d_a, nbytes);
+  t1 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
 
   r = cuStreamCreate (&stream, CU_STREAM_DEFAULT);
   if (r != CUDA_SUCCESS)
-	{
-	  fprintf (stderr, "cuStreamCreate failed: %d\n", r);
-	  abort ();
-	}
+    {
+      fprintf (stderr, "cuStreamCreate failed: %d\n", r);
+      abort ();
+    }
 
   acc_set_cuda_stream (1, stream);
 
-  init_timers (1);
-
-  kargs[0] = (void *) &d_a;
-  kargs[1] = (void *) &dticks;
-
-  start_timer (0);
+  gettimeofday (&tv1, NULL);
 
   for (i = 0; i < N; i++)
     {
-      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);
+      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, NULL, 0);
       if (r != CUDA_SUCCESS)
 	{
 	  fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
@@ -109,21 +81,18 @@ main (int argc, char **argv)
 
   acc_wait (1);
 
-  atime = stop_timer (0);
+  gettimeofday (&tv2, NULL);
 
-  if (atime < dtime)
+  t2 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
+
+  t1 *= N;
+
+  if (((abs (t2 - t1) / t1) * 100.0) > 1.0)
     {
-      fprintf (stderr, "actual time < delay time\n");
+      fprintf (stderr, "too long\n");
       abort ();
     }
 
-  acc_unmap_data (a);
-
-  fini_timers ();
-
-  free (a);
-  acc_free (d_a);
-
   acc_shutdown (acc_device_nvidia);
 
   return 0;
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-81.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-81.c
index d5f18f0..77de9ba 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-81.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-81.c
@@ -6,52 +6,22 @@
 #include <unistd.h>
 #include <openacc.h>
 #include <cuda.h>
-#include "timer.h"
+#include <sys/time.h>
 
 int
 main (int argc, char **argv)
 {
-  CUdevice dev;
   CUfunction delay;
   CUmodule module;
   CUresult r;
-  int N;
+  const int N = 2;
   int i;
   CUstream *streams, stream;
-  unsigned long *a, *d_a, dticks;
-  int nbytes;
-  float atime, dtime;
-  void *kargs[2];
-  int clkrate;
-  int devnum, nprocs;
+  struct timeval tv1, tv2;
+  time_t t1, t2;
 
   acc_init (acc_device_nvidia);
 
-  devnum = acc_get_device_num (acc_device_nvidia);
-
-  r = cuDeviceGet (&dev, devnum);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
-      abort ();
-    }
-
-  r =
-    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
-			  dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
-  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
   r = cuModuleLoad (&module, "subr.ptx");
   if (r != CUDA_SUCCESS)
     {
@@ -66,18 +36,25 @@ main (int argc, char **argv)
       abort ();
     }
 
-  nbytes = nprocs * sizeof (unsigned long);
+  gettimeofday (&tv1, NULL);
 
-  dtime = 500.0;
+  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, NULL, NULL, 0);
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
+	abort ();
+    }
 
-  dticks = (unsigned long) (dtime * clkrate);
+  r = cuCtxSynchronize ();
+  if (r != CUDA_SUCCESS)
+    {
+      fprintf (stderr, "cuCtxSynchronize failed: %d\n", r);
+	abort ();
+    }
 
-  N = nprocs;
+  gettimeofday (&tv2, NULL);
 
-  a = (unsigned long *) malloc (nbytes);
-  d_a = (unsigned long *) acc_malloc (nbytes);
-
-  acc_map_data (a, d_a, nbytes);
+  t1 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
 
   streams = (CUstream *) malloc (N * sizeof (void *));
 
@@ -98,11 +75,6 @@ main (int argc, char **argv)
 	  abort ();
     }
 
-  init_timers (1);
-
-  kargs[0] = (void *) &d_a;
-  kargs[1] = (void *) &dticks;
-
   stream = (CUstream) acc_get_cuda_stream (N);
   if (stream != NULL)
     abort ();
@@ -117,11 +89,11 @@ main (int argc, char **argv)
   if (!acc_set_cuda_stream (N, stream))
     abort ();
 
-  start_timer (0);
+  gettimeofday (&tv1, NULL);
 
   for (i = 0; i < N; i++)
     {
-      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], kargs, 0);
+      r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], NULL, 0);
       if (r != CUDA_SUCCESS)
 	{
 	  fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
@@ -129,6 +101,10 @@ main (int argc, char **argv)
 	}
     }
 
+  gettimeofday (&tv2, NULL);
+
+  t2 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
+
   acc_wait_all_async (N);
 
   for (i = 0; i <= N; i++)
@@ -145,15 +121,13 @@ main (int argc, char **argv)
 	abort ();
     }
 
-  atime = stop_timer (0);
-
-  if (atime < dtime)
+  if ((t1 * N) < t2)
     {
-      fprintf (stderr, "actual time < delay time\n");
+      fprintf (stderr, "too long 1\n");
       abort ();
     }
 
-  start_timer (0);
+  gettimeofday (&tv1, NULL);
 
   stream = (CUstream) acc_get_cuda_stream (N + 1);
   if (stream != NULL)
@@ -173,35 +147,33 @@ main (int argc, char **argv)
 
   acc_wait (N + 1);
 
-  atime = stop_timer (0);
+  gettimeofday (&tv2, NULL);
 
-  if (0.10 < atime)
+  t1 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
+
+  if (t1 > 1000)
     {
-      fprintf (stderr, "actual time too long\n");
+      fprintf (stderr, "too long 2\n");
       abort ();
     }
 
-  start_timer (0);
+  gettimeofday (&tv1, NULL);
 
   acc_wait_all_async (N);
 
   acc_wait (N);
 
-  atime = stop_timer (0);
+  gettimeofday (&tv2, NULL);
 
-  if (0.10 < atime)
+  t1 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
+
+  if (t1 > 1000)
     {
-      fprintf (stderr, "actual time too long\n");
+      fprintf (stderr, "too long 3\n");
       abort ();
     }
 
-  acc_unmap_data (a);
-
-  fini_timers ();
-
   free (streams);
-  free (a);
-  acc_free (d_a);
 
   acc_shutdown (acc_device_nvidia);
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-82.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-82.c
index be30a7f..ecf7488 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-82.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-82.c
@@ -10,46 +10,18 @@
 int
 main (int argc, char **argv)
 {
-  CUdevice dev;
   CUfunction delay2;
   CUmodule module;
   CUresult r;
-  int N;
+  const int N = 32;
   int i;
   CUstream *streams;
-  unsigned long **a, **d_a, *tid, ticks;
+  unsigned long **a, **d_a, *tid;
   int nbytes;
-  void *kargs[3];
-  int clkrate;
-  int devnum, nprocs;
+  void *kargs[2];
 
   acc_init (acc_device_nvidia);
 
-  devnum = acc_get_device_num (acc_device_nvidia);
-
-  r = cuDeviceGet (&dev, devnum);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGet failed: %d\n", r);
-      abort ();
-    }
-
-  r =
-    cuDeviceGetAttribute (&nprocs, CU_DEVICE_ATTRIBUTE_MULTIPROCESSOR_COUNT,
-			  dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
-  r = cuDeviceGetAttribute (&clkrate, CU_DEVICE_ATTRIBUTE_CLOCK_RATE, dev);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuDeviceGetAttribute failed: %d\n", r);
-      abort ();
-    }
-
   r = cuModuleLoad (&module, "subr.ptx");
   if (r != CUDA_SUCCESS)
     {
@@ -66,10 +38,6 @@ main (int argc, char **argv)
 
   nbytes = sizeof (int);
 
-  ticks = (unsigned long) (200.0 * clkrate);
-
-  N = nprocs;
-
   streams = (CUstream *) malloc (N * sizeof (void *));
 
   a = (unsigned long **) malloc (N * sizeof (unsigned long *));
@@ -103,8 +71,7 @@ main (int argc, char **argv)
   for (i = 0; i < N; i++)
     {
       kargs[0] = (void *) &d_a[i];
-      kargs[1] = (void *) &ticks;
-      kargs[2] = (void *) &tid[i];
+      kargs[1] = (void *) &tid[i];
 
       r = cuLaunchKernel (delay2, 1, 1, 1, 1, 1, 1, 0, streams[i], kargs, 0);
       if (r != CUDA_SUCCESS)
@@ -112,8 +79,6 @@ main (int argc, char **argv)
 	  fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
 	  abort ();
 	}
-
-      ticks = (unsigned long) (50.0 * clkrate);
     }
 
   acc_wait_all_async (0);
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/lib-83.c libgomp/testsuite/libgomp.oacc-c-c++-common/lib-83.c
index 1c2e52b..51b7ee7 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/lib-83.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/lib-83.c
@@ -5,21 +5,19 @@
 #include <stdlib.h>
 #include <unistd.h>
 #include <openacc.h>
-#include "timer.h"
+#include <cuda.h>
+#include <sys/time.h>
 
 int
 main (int argc, char **argv)
 {
-  float atime;
   CUstream stream;
   CUresult r;
+  struct timeval tv1, tv2;
+  time_t t1;
 
   acc_init (acc_device_nvidia);
 
-  (void) acc_get_device_num (acc_device_nvidia);
-
-  init_timers (1);
-
   stream = (CUstream) acc_get_cuda_stream (0);
   if (stream != NULL)
     abort ();
@@ -34,22 +32,22 @@ main (int argc, char **argv)
   if (!acc_set_cuda_stream (0, stream))
     abort ();
 
-  start_timer (0);
+  gettimeofday (&tv1, NULL);
 
   acc_wait_all_async (0);
 
   acc_wait (0);
 
-  atime = stop_timer (0);
+  gettimeofday (&tv2, NULL);
 
-  if (0.010 < atime)
+  t1 = ((tv2.tv_sec - tv1.tv_sec) * 1000000) + (tv2.tv_usec - tv1.tv_usec);
+
+  if (t1 > 1000)
     {
-      fprintf (stderr, "actual time too long\n");
+      fprintf (stderr, "too long\n");
       abort ();
     }
 
-  fini_timers ();
-
   acc_shutdown (acc_device_nvidia);
 
   exit (0);
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
index fd9df33..9a411fe 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-1.c
@@ -2,205 +2,5 @@
 
 #include <stdlib.h>
 
-int i;
-
-int main(void)
-{
-  int j, v;
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) copyin (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) copyout (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) copy (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) create (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copyin (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1)
-    abort ();
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copyout (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_copy (i, j)
-  {
-    if (i != -1 || j != -2)
-      abort ();
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-
-  i = -1;
-  j = -2;
-  v = 0;
-#pragma acc parallel /* copyout */ present_or_copyout (v) present_or_create (i, j)
-  {
-    i = 2;
-    j = 1;
-    if (i != 2 || j != 1)
-      abort ();
-    v = 1;
-  }
-  if (v != 1)
-    abort ();
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-
-#pragma acc data copyin (i, j)
-  {
-#pragma acc parallel /* copyout */ present_or_copyout (v) present (i, j)
-    {
-      if (i != -1 || j != -2)
-        abort ();
-      i = 2;
-      j = 1;
-      if (i != 2 || j != 1)
-        abort ();
-      v = 1;
-    }
-  }
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  i = -1;
-  j = -2;
-  v = 0;
-
-#pragma acc data copyin(i, j)
-  {
-#pragma acc parallel /* copyout */ present_or_copyout (v)
-    {
-      if (i != -1 || j != -2)
-        abort ();
-      i = 2;
-      j = 1;
-      if (i != 2 || j != 1)
-        abort ();
-      v = 1;
-    }
-  }
-#if ACC_MEM_SHARED
-  if (v != 1 || i != 2 || j != 1)
-    abort ();
-#else
-  if (v != 1 || i != -1 || j != -2)
-    abort ();
-#endif
-
-  return 0;
-}
+#define EXEC_DIRECTIVE parallel
+#include "data-clauses.h"
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/routine-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/routine-1.c
new file mode 100644
index 0000000..a27d076
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-1.c
@@ -0,0 +1,40 @@
+/* FIXME: remove -fno-var-tracking and -fno-exceptions from dg-options.  */
+
+/* { dg-do run } */
+/* { dg-options "-fno-inline -fno-var-tracking -fno-exceptions" } */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#pragma acc routine
+int
+fact (int n)
+{
+  if (n == 0 || n == 1)
+    return 1;
+
+  return n * fact (n - 1);
+}
+
+int
+main()
+{
+  int *a, i, n = 10;
+
+  a = (int *)malloc (sizeof (int) * n);
+
+#pragma acc parallel copy (a[0:n]) vector_length (5)
+  {
+#pragma acc loop
+    for (i = 0; i < n; i++)
+      a[i] = fact (i);
+  }
+
+  for (i = 0; i < n; i++)
+    if (a[i] != fact (i))
+      abort ();
+
+  free (a);
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/routine-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/routine-2.c
new file mode 100644
index 0000000..8ec4d8b
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/routine-2.c
@@ -0,0 +1,41 @@
+/* FIXME: remove -fno-var-tracking and -fno-exceptions from dg-options.  */
+
+/* { dg-do run } */
+/* { dg-options "-fno-inline -fno-var-tracking -fno-exceptions" } */
+
+#include <stdio.h>
+#include <stdlib.h>
+
+#pragma acc routine (fact)
+
+
+int fact (int n)
+{
+  if (n == 0 || n == 1)
+    return 1;
+
+  return n * fact (n - 1);
+}
+
+int
+main()
+{
+  int *a, i, n = 10;
+
+  a = (int *)malloc (sizeof (int) * n);
+
+#pragma acc parallel copy (a[0:n]) vector_length (5)
+  {
+#pragma acc loop
+    for (i = 0; i < n; i++)
+      a[i] = fact (i);
+  }
+
+  for (i = 0; i < n; i++)
+    if (a[i] != fact (i))
+      abort ();
+
+  free (a);
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/subr.h libgomp/testsuite/libgomp.oacc-c-c++-common/subr.h
index 9db236c..0c9096f 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/subr.h
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/subr.h
@@ -1,46 +1,24 @@
 
-#if ACC_DEVICE_TYPE_nvidia
-
 #pragma acc routine nohost
-static int clock (void)
-{
-  int thetime;
-
-  asm __volatile__ ("mov.u32 %0, %%clock;" : "=r"(thetime));
-
-  return thetime;
-}
-
-#endif
-
 void
-delay (unsigned long *d_o, unsigned long delay)
+delay ()
 {
-  int start, ticks;
+  int i, sum;
+  const int N = 500000;
 
-  start = clock ();
-
-  ticks = 0;
-
-  while (ticks < delay)
-    ticks = clock () - start;
-
-  return;
+  for (i = 0; i < N; i++)
+    sum = sum + 1;
 }
 
+#pragma acc routine nohost
 void
-delay2 (unsigned long *d_o, unsigned long delay, unsigned long tid)
+delay2 (unsigned long *d_o, unsigned long tid)
 {
-  int start, ticks;
-
-  start = clock ();
-
-  ticks = 0;
+  int i, sum;
+  const int N = 500000;
 
-  while (ticks < delay)
-    ticks = clock () - start;
+  for (i = 0; i < N; i++)
+    sum = sum + 1;
 
   d_o[0] = tid;
-
-  return;
 }
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/subr.ptx libgomp/testsuite/libgomp.oacc-c-c++-common/subr.ptx
index 6f748fc..88b63bf 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/subr.ptx
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/subr.ptx
@@ -1,148 +1,90 @@
-// BEGIN PREAMBLE
-	.version	3.1
-	.target	sm_30
+	.version 3.1
+	.target sm_30
 	.address_size 64
-// END PREAMBLE
 
-// BEGIN FUNCTION DEF: clock
-.func (.param.u32 %out_retval)clock
-{
-.reg.u32 %retval;
-	.reg.u64 %hr10;
-	.reg.u32 %r22;
-	.reg.u32 %r23;
-	.reg.u32 %r24;
-	.local.align 8 .b8 %frame[8];
-	// #APP 
-// 7 "subr.c" 1
-	mov.u32 %r24, %clock;
-// 0 "" 2
-	// #NO_APP 
-		st.local.u32	[%frame], %r24;
-		ld.local.u32	%r22, [%frame];
-		mov.u32	%r23, %r22;
-		mov.u32	%retval, %r23;
-	st.param.u32	[%out_retval], %retval;
-	ret;
-	}
-// END FUNCTION DEF
-// BEGIN GLOBAL FUNCTION DEF: delay
-.visible .entry delay(.param.u64 %in_ar1, .param.u64 %in_ar2)
-{
-	.reg.u64 %ar1;
-	.reg.u64 %ar2;
-	.reg.u64 %hr10;
-	.reg.u64 %r22;
-	.reg.u32 %r23;
-	.reg.u64 %r24;
-	.reg.u64 %r25;
-	.reg.u32 %r26;
-	.reg.u32 %r27;
-	.reg.u32 %r28;
-	.reg.u32 %r29;
-	.reg.u32 %r30;
-	.reg.u64 %r31;
-	.reg.pred %r32;
-	.local.align 8 .b8 %frame[24];
-	ld.param.u64 %ar1, [%in_ar1];
-	ld.param.u64 %ar2, [%in_ar2];
-		mov.u64	%r24, %ar1;
-		st.u64	[%frame+8], %r24;
-		mov.u64	%r25, %ar2;
-		st.local.u64	[%frame+16], %r25;
+	.visible .entry delay
 	{
-		.param.u32 %retval_in;
-	{
-		call (%retval_in), clock;
-	}
-		ld.param.u32	%r26, [%retval_in];
-}
-		st.local.u32	[%frame+4], %r26;
-		mov.u32	%r27, 0;
-		st.local.u32	[%frame], %r27;
-		bra	$L4;
-$L5:
-	{
-		.param.u32 %retval_in;
-	{
-		call (%retval_in), clock;
-	}
-		ld.param.u32	%r28, [%retval_in];
-}
-		mov.u32	%r23, %r28;
-		ld.local.u32	%r30, [%frame+4];
-		sub.u32	%r29, %r23, %r30;
-		st.local.u32	[%frame], %r29;
-$L4:
-		ld.local.s32	%r22, [%frame];
-		ld.local.u64	%r31, [%frame+16];
-		setp.lo.u64 %r32,%r22,%r31;
-	@%r32	bra	$L5;
+	.reg .u64 %hr10;
+	.reg .u32 %r22;
+	.reg .u32 %r23;
+	.reg .u32 %r24;
+	.reg .u32 %r25;
+	.reg .u32 %r26;
+	.reg .u32 %r27;
+	.reg .u32 %r28;
+	.reg .u32 %r29;
+	.reg .pred %r30;
+	.reg .u64 %frame;
+	.local .align 8 .b8 %farray[16];
+	cvta.local.u64 %frame,%farray;
+	mov.u32 %r22,500000;
+	st.u32 [%frame+8],%r22;
+	mov.u32 %r23,0;
+	st.u32 [%frame],%r23;
+	bra $L2;
+	$L3:
+	ld.u32 %r25,[%frame+4];
+	add.u32 %r24,%r25,1;
+	st.u32 [%frame+4],%r24;
+	ld.u32 %r27,[%frame];
+	add.u32 %r26,%r27,1;
+	st.u32 [%frame],%r26;
+	$L2:
+	ld.u32 %r28,[%frame];
+	ld.u32 %r29,[%frame+8];
+	setp.lt.s32 %r30,%r28,%r29;
+	@%r30 
+	bra $L3;
 	ret;
 	}
-// END FUNCTION DEF
-// BEGIN GLOBAL FUNCTION DEF: delay2
-.visible .entry delay2(.param.u64 %in_ar1, .param.u64 %in_ar2, .param.u64 %in_ar3)
-{
-	.reg.u64 %ar1;
-	.reg.u64 %ar2;
-	.reg.u64 %ar3;
-	.reg.u64 %hr10;
-	.reg.u64 %r22;
-	.reg.u32 %r23;
-	.reg.u64 %r24;
-	.reg.u64 %r25;
-	.reg.u64 %r26;
-	.reg.u32 %r27;
-	.reg.u32 %r28;
-	.reg.u32 %r29;
-	.reg.u32 %r30;
-	.reg.u32 %r31;
-	.reg.u64 %r32;
-	.reg.pred %r33;
-	.reg.u64 %r34;
-	.reg.u64 %r35;
-	.local.align 8 .b8 %frame[32];
-	ld.param.u64 %ar1, [%in_ar1];
-	ld.param.u64 %ar2, [%in_ar2];
-	ld.param.u64 %ar3, [%in_ar3];
-		mov.u64	%r24, %ar1;
-		st.local.u64	[%frame+8], %r24;
-		mov.u64	%r25, %ar2;
-		st.local.u64	[%frame+16], %r25;
-		mov.u64	%r26, %ar3;
-		st.local.u64	[%frame+24], %r26;
-	{
-		.param.u32 %retval_in;
-	{
-		call (%retval_in), clock;
-	}
-		ld.param.u32	%r27, [%retval_in];
-}
-		st.local.u32	[%frame+4], %r27;
-		mov.u32	%r28, 0;
-		st.local.u32	[%frame], %r28;
-		bra	$L8;
-$L9:
-	{
-		.param.u32 %retval_in;
+
+	.visible .entry delay2 (.param .u64 %in_ar1, .param .u64 %in_ar2)
 	{
-		call (%retval_in), clock;
-	}
-		ld.param.u32	%r29, [%retval_in];
-}
-		mov.u32	%r23, %r29;
-		ld.local.u32	%r31, [%frame+4];
-		sub.u32	%r30, %r23, %r31;
-		st.local.u32	[%frame], %r30;
-$L8:
-		ld.local.s32	%r22, [%frame];
-		ld.local.u64	%r32, [%frame+16];
-		setp.lo.u64 %r33,%r22,%r32;
-	@%r33	bra	$L9;
-		ld.local.u64	%r34, [%frame+8];
-		ld.local.u64	%r35, [%frame+24];
-		st.u64	[%r34], %r35;
+	.reg .u64 %ar1;
+	.reg .u64 %ar2;
+	.reg .u64 %hr10;
+	.reg .u64 %r22;
+	.reg .u64 %r23;
+	.reg .u32 %r24;
+	.reg .u32 %r25;
+	.reg .u32 %r26;
+	.reg .u32 %r27;
+	.reg .u32 %r28;
+	.reg .u32 %r29;
+	.reg .u32 %r30;
+	.reg .u32 %r31;
+	.reg .pred %r32;
+	.reg .u64 %r33;
+	.reg .u64 %r34;
+	.reg .u64 %frame;
+	.local .align 8 .b8 %farray[32];
+	cvta.local.u64 %frame,%farray;
+	ld.param.u64 %ar1,[%in_ar1];
+	ld.param.u64 %ar2,[%in_ar2];
+	mov.u64 %r22,%ar1;
+	st.u64 [%frame+16],%r22;
+	mov.u64 %r23,%ar2;
+	st.u64 [%frame+24],%r23;
+	mov.u32 %r24,500000;
+	st.u32 [%frame+8],%r24;
+	mov.u32 %r25,0;
+	st.u32 [%frame],%r25;
+	bra $L5;
+	$L6:
+	ld.u32 %r27,[%frame+4];
+	add.u32 %r26,%r27,1;
+	st.u32 [%frame+4],%r26;
+	ld.u32 %r29,[%frame];
+	add.u32 %r28,%r29,1;
+	st.u32 [%frame],%r28;
+	$L5:
+	ld.u32 %r30,[%frame];
+	ld.u32 %r31,[%frame+8];
+	setp.lt.s32 %r32,%r30,%r31;
+	@%r32 
+	bra $L6;
+	ld.u64 %r33,[%frame+16];
+	ld.u64 %r34,[%frame+24];
+	st.u64 [%r33],%r34;
 	ret;
 	}
-// END FUNCTION DEF
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/timer.h libgomp/testsuite/libgomp.oacc-c-c++-common/timer.h
deleted file mode 100644
index 53749da..0000000
--- libgomp/testsuite/libgomp.oacc-c-c++-common/timer.h
+++ /dev/null
@@ -1,103 +0,0 @@
-
-#include <stdio.h>
-#include <cuda.h>
-
-static int _Tnum_timers;
-static CUevent *_Tstart_events, *_Tstop_events;
-static CUstream _Tstream;
-
-void
-init_timers (int ntimers)
-{
-  int i;
-  CUresult r;
-
-  _Tnum_timers = ntimers;
-
-  _Tstart_events = (CUevent *) malloc (_Tnum_timers * sizeof (CUevent));
-  _Tstop_events = (CUevent *) malloc (_Tnum_timers * sizeof (CUevent));
-
-  r = cuStreamCreate (&_Tstream, CU_STREAM_DEFAULT);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuStreamCreate failed: %d\n", r);
-      abort ();
-    }
-
-  for (i = 0; i < _Tnum_timers; i++)
-    {
-      r = cuEventCreate (&_Tstart_events[i], CU_EVENT_DEFAULT);
-      if (r != CUDA_SUCCESS)
-	{
-	  fprintf (stderr, "cuEventCreate failed: %d\n", r);
-	  abort ();
-	}
-
-      r = cuEventCreate (&_Tstop_events[i], CU_EVENT_DEFAULT);
-      if (r != CUDA_SUCCESS)
-	{
-	  fprintf (stderr, "cuEventCreate failed: %d\n", r);
-	  abort ();
-	}
-    }
-}
-
-void
-fini_timers (void)
-{
-  int i;
-
-  for (i = 0; i < _Tnum_timers; i++)
-    {
-      cuEventDestroy (_Tstart_events[i]);
-      cuEventDestroy (_Tstop_events[i]);
-    }
-
-  cuStreamDestroy (_Tstream);
-
-  free (_Tstart_events);
-  free (_Tstop_events);
-}
-
-void
-start_timer (int timer)
-{
-  CUresult r;
-
-  r = cuEventRecord (_Tstart_events[timer], _Tstream);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuEventRecord failed: %d\n", r);
-      abort ();
-    }
-}
-
-float
-stop_timer (int timer)
-{
-  CUresult r;
-  float etime;
-
-  r = cuEventRecord (_Tstop_events[timer], _Tstream);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuEventRecord failed: %d\n", r);
-      abort ();
-    }
-
-  r = cuEventSynchronize (_Tstop_events[timer]);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuEventSynchronize failed: %d\n", r);
-      abort ();
-    }
-
-  r = cuEventElapsedTime (&etime, _Tstart_events[timer], _Tstop_events[timer]);
-  if (r != CUDA_SUCCESS)
-    {
-      fprintf (stderr, "cuEventElapsedTime failed: %d\n", r);
-      abort ();
-    }
-
-  return etime;
-}
diff --git libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90 libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90
new file mode 100644
index 0000000..27c5c9e
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/atomic_capture-1.f90
@@ -0,0 +1,784 @@
+! { dg-do run }
+
+program main
+  integer igot, iexp, itmp
+  real fgot, fexp, ftmp
+  logical lgot, lexp, ltmp
+  integer, parameter :: N = 32
+
+  igot = 0
+  iexp = N * 2
+
+  !$acc parallel copy (igot, itmp)
+    do i = 1, N
+  !$acc atomic capture
+      itmp = igot
+      igot = i + i
+  !$acc end atomic
+    end do
+  !$acc end parallel
+
+  if (igot /= iexp) call abort
+  if (itmp /= iexp - 2) call abort
+
+  fgot = 1234.0
+  fexp = 1266.0
+
+  !$acc parallel loop copy (fgot, ftmp)
+    do i = 1, N
+  !$acc atomic capture
+      ftmp = fgot
+      fgot = fgot + 1.0
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (ftmp /= fexp - 1.0) call abort
+  if (fgot /= fexp) call abort
+
+  fgot = 1.0
+  fexp = 2.0**32
+
+  !$acc parallel loop copy (fgot, ftmp)
+    do i = 1, N
+  !$acc atomic capture
+      ftmp = fgot
+      fgot = fgot * 2.0
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (ftmp /= fexp / 2.0) call abort
+  if (fgot /= fexp) call abort
+
+  fgot = 32.0
+  fexp = fgot - N
+
+  !$acc parallel loop copy (fgot, ftmp)
+    do i = 1, N
+  !$acc atomic capture
+      ftmp = fgot
+      fgot = fgot - 1.0
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (ftmp /= fexp + 1.0) call abort
+  if (fgot /= fexp) call abort
+
+  fgot = 2**32.0
+  fexp = 1.0
+
+  !$acc parallel loop copy (fgot, ftmp)
+    do i = 1, N
+  !$acc atomic capture
+      ftmp = fgot
+      fgot = fgot / 2.0
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (ftmp /= fgot * 2.0) call abort
+  if (fgot /= fexp) call abort
+
+  lgot = .TRUE.
+  lexp = .FALSE.
+
+  !$acc parallel copy (lgot, ltmp)
+  !$acc atomic capture
+    ltmp = lgot
+    lgot = lgot .and. .FALSE.
+  !$acc end atomic
+  !$acc end parallel
+
+  if (ltmp .neqv. .not. lexp) call abort
+  if (lgot .neqv. lexp) call abort
+
+  lgot = .FALSE.
+  lexp = .FALSE.
+
+  !$acc parallel copy (lgot, ltmp)
+  !$acc atomic capture
+    ltmp = lgot
+    lgot = lgot .or. .FALSE.
+  !$acc end atomic
+  !$acc end parallel
+
+  if (ltmp .neqv. lexp) call abort
+  if (lgot .neqv. lexp) call abort
+
+  lgot = .FALSE.
+  lexp = .FALSE.
+
+  !$acc parallel copy (lgot, ltmp)
+  !$acc atomic capture
+    ltmp = lgot
+    lgot = lgot .eqv. .TRUE.
+  !$acc end atomic
+  !$acc end parallel
+
+  if (ltmp .neqv. lexp) call abort
+  if (lgot .neqv. lexp) call abort
+
+  lgot = .FALSE.
+  lexp = .TRUE.
+
+  !$acc parallel copy (lgot, ltmp)
+  !$acc atomic capture
+    ltmp = lgot
+    lgot = lgot .neqv. .TRUE.
+  !$acc end atomic
+  !$acc end parallel
+
+  if (ltmp .neqv. .not. lexp) call abort
+  if (lgot .neqv. lexp) call abort
+
+  fgot = 1234.0
+  fexp = 1266.0
+
+  !$acc parallel loop copy (fgot, ftmp)
+    do i = 1, N
+  !$acc atomic capture
+      ftmp = fgot
+      fgot = 1.0 + fgot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (ftmp /= fexp - 1.0) call abort 
+  if (fgot /= fexp) call abort
+
+  fgot = 1.0
+  fexp = 2.0**32
+
+  !$acc parallel loop copy (fgot, ftmp)
+    do i = 1, N
+  !$acc atomic capture
+      ftmp = fgot
+      fgot = 2.0 * fgot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (ftmp /= fexp / 2.0) call abort
+  if (fgot /= fexp) call abort
+
+  fgot = 32.0
+  fexp = 32.0
+
+  !$acc parallel loop copy (fgot, ftmp)
+    do i = 1, N
+  !$acc atomic capture
+      ftmp = fgot
+      fgot = 2.0 - fgot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (ftmp /= 2.0 - fexp) call abort
+  if (fgot /= fexp) call abort
+
+  fgot = 2.0**16
+  fexp = 2.0**16
+
+  !$acc parallel loop copy (fgot, ftmp)
+    do i = 1, N
+  !$acc atomic capture
+      ftmp = fgot
+      fgot = 2.0 / fgot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (ftmp /= 2.0 / fexp) call abort
+  if (fgot /= fexp) call abort
+
+  lgot = .TRUE.
+  lexp = .FALSE.
+
+  !$acc parallel copy (lgot, ltmp)
+  !$acc atomic capture
+    ltmp = lgot
+    lgot = .FALSE. .and. lgot
+  !$acc end atomic
+  !$acc end parallel
+
+  if (ltmp .neqv. .not. lexp) call abort
+  if (lgot .neqv. lexp) call abort
+
+  lgot = .FALSE.
+  lexp = .FALSE.
+
+  !$acc parallel copy (lgot, ltmp)
+  !$acc atomic capture
+    ltmp = lgot
+    lgot = .FALSE. .or. lgot
+  !$acc end atomic
+  !$acc end parallel
+
+  if (ltmp .neqv. lexp) call abort
+  if (lgot .neqv. lexp) call abort
+
+  lgot = .FALSE.
+  lexp = .FALSE.
+
+  !$acc parallel copy (lgot, ltmp)
+  !$acc atomic capture
+    ltmp = lgot
+    lgot = .TRUE. .eqv. lgot
+  !$acc end atomic
+  !$acc end parallel
+
+  if (ltmp .neqv. lexp) call abort
+  if (lgot .neqv. lexp) call abort
+
+  lgot = .FALSE.
+  lexp = .TRUE.
+
+  !$acc parallel copy (lgot, ltmp)
+  !$acc atomic capture
+    ltmp = lgot
+    lgot = .TRUE. .neqv. lgot
+  !$acc end atomic
+  !$acc end parallel
+
+  if (ltmp .neqv. .not. lexp) call abort
+  if (lgot .neqv. lexp) call abort
+
+  igot = 1
+  iexp = N
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 1, N
+  !$acc atomic capture
+      itmp = igot
+      igot = max (igot, i)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= iexp - 1) call abort
+  if (igot /= iexp) call abort
+
+  igot = N
+  iexp = 1
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 1, N
+  !$acc atomic capture
+      itmp = igot
+      igot = min (igot, i)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= iexp) call abort
+  if (igot /= iexp) call abort
+
+  igot = -1
+  iexp = 0
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 0, N - 1
+      iexpr = ibclr (-2, i)
+  !$acc atomic capture
+      itmp = igot
+      igot = iand (igot, iexpr)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= ibset (iexp, N - 1)) call abort
+  if (igot /= iexp) call abort
+
+  igot = 0
+  iexp = -1 
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 0, N - 1
+      iexpr = lshift (1, i)
+  !$acc atomic capture
+      itmp = igot
+      igot = ior (igot, iexpr)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= ieor (iexp, lshift (1, N - 1))) call abort
+  if (igot /= iexp) call abort
+
+  igot = -1
+  iexp = 0 
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 0, N - 1
+      iexpr = lshift (1, i)
+  !$acc atomic capture
+      itmp = igot
+      igot = ieor (igot, iexpr)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= ior (iexp, lshift (1, N - 1))) call abort
+  if (igot /= iexp) call abort
+
+  igot = 1
+  iexp = N
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 1, N
+  !$acc atomic capture
+      itmp = igot
+      igot = max (i, igot)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= iexp - 1) call abort
+  if (igot /= iexp) call abort
+
+  igot = N
+  iexp = 1
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 1, N
+  !$acc atomic capture
+      itmp = igot
+      igot = min (i, igot)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= iexp) call abort
+  if (igot /= iexp) call abort
+
+  igot = -1
+  iexp = 0
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 0, N - 1
+      iexpr = ibclr (-2, i)
+  !$acc atomic capture
+      itmp = igot
+      igot = iand (iexpr, igot)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= ibset (iexp, N - 1)) call abort
+  if (igot /= iexp) call abort
+
+  igot = 0
+  iexp = -1 
+	!!
+  !$acc parallel loop copy (igot, itmp)
+    do i = 0, N - 1
+      iexpr = lshift (1, i)
+  !$acc atomic capture
+      itmp = igot
+      igot = ior (iexpr, igot)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= ieor (iexp, lshift (1, N - 1))) call abort
+  if (igot /= iexp) call abort
+
+  igot = -1
+  iexp = 0 
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 0, N - 1
+      iexpr = lshift (1, i)
+  !$acc atomic capture
+      itmp = igot
+      igot = ieor (iexpr, igot)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= ior (iexp, lshift (1, N - 1))) call abort
+  if (igot /= iexp) call abort
+
+  fgot = 1234.0
+  fexp = 1266.0
+
+  !$acc parallel loop copy (fgot, ftmp)
+    do i = 1, N
+  !$acc atomic capture
+      fgot = fgot + 1.0
+      ftmp = fgot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (ftmp /= fexp) call abort
+  if (fgot /= fexp) call abort
+
+  fgot = 1.0
+  fexp = 2.0**32
+
+  !$acc parallel loop copy (fgot, ftmp)
+    do i = 1, N
+  !$acc atomic capture
+      fgot = fgot * 2.0
+      ftmp = fgot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (ftmp /= fexp) call abort
+  if (fgot /= fexp) call abort
+
+  fgot = 32.0
+  fexp = fgot - N
+
+  !$acc parallel loop copy (fgot, ftmp)
+    do i = 1, N
+  !$acc atomic capture
+      fgot = fgot - 1.0
+      ftmp = fgot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (ftmp /= fexp) call abort
+  if (fgot /= fexp) call abort
+
+  fgot = 2**32.0
+  fexp = 1.0
+
+  !$acc parallel loop copy (fgot, ftmp)
+    do i = 1, N
+  !$acc atomic capture
+      fgot = fgot / 2.0
+      ftmp = fgot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (ftmp /= fexp) call abort
+  if (fgot /= fexp) call abort
+
+  lgot = .TRUE.
+  lexp = .FALSE.
+
+  !$acc parallel copy (lgot, ltmp)
+  !$acc atomic capture
+    lgot = lgot .and. .FALSE.
+    ltmp = lgot
+  !$acc end atomic
+  !$acc end parallel
+
+  if (ltmp .neqv. lexp) call abort
+  if (lgot .neqv. lexp) call abort
+
+  lgot = .FALSE.
+  lexp = .FALSE.
+
+  !$acc parallel copy (lgot, ltmp)
+  !$acc atomic capture
+    lgot = lgot .or. .FALSE.
+    ltmp = lgot
+  !$acc end atomic
+  !$acc end parallel
+
+  if (ltmp .neqv. lexp) call abort
+  if (lgot .neqv. lexp) call abort
+
+  lgot = .FALSE.
+  lexp = .FALSE.
+
+  !$acc parallel copy (lgot, ltmp)
+  !$acc atomic capture
+    lgot = lgot .eqv. .TRUE.
+    ltmp = lgot
+  !$acc end atomic
+  !$acc end parallel
+
+  if (ltmp .neqv. lexp) call abort
+  if (lgot .neqv. lexp) call abort
+
+  lgot = .FALSE.
+  lexp = .TRUE.
+
+  !$acc parallel copy (lgot, ltmp)
+  !$acc atomic capture
+    lgot = lgot .neqv. .TRUE.
+    ltmp = lgot
+  !$acc end atomic
+  !$acc end parallel
+
+  if (ltmp .neqv. lexp) call abort
+  if (lgot .neqv. lexp) call abort
+
+  fgot = 1234.0
+  fexp = 1266.0
+
+  !$acc parallel loop copy (fgot, ftmp)
+    do i = 1, N
+  !$acc atomic capture
+      fgot = 1.0 + fgot
+      ftmp = fgot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (ftmp /= fexp) call abort
+  if (fgot /= fexp) call abort
+
+  fgot = 1.0
+  fexp = 2.0**32
+
+  !$acc parallel loop copy (fgot, ftmp)
+    do i = 1, N
+  !$acc atomic capture
+      fgot = 2.0 * fgot
+      ftmp = fgot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (ftmp /= fexp) call abort
+  if (fgot /= fexp) call abort
+
+  fgot = 32.0
+  fexp = 32.0
+
+  !$acc parallel loop copy (fgot, ftmp)
+    do i = 1, N
+  !$acc atomic capture
+      fgot = 2.0 - fgot
+      ftmp = fgot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (ftmp /= fexp) call abort
+  if (fgot /= fexp) call abort
+
+  fgot = 2.0**16
+  fexp = 2.0**16
+
+  !$acc parallel loop copy (fgot, ftmp)
+    do i = 1, N
+  !$acc atomic capture
+      fgot = 2.0 / fgot
+      ftmp = fgot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (ftmp /= fexp) call abort
+  if (fgot /= fexp) call abort
+
+  lgot = .TRUE.
+  lexp = .FALSE.
+
+  !$acc parallel copy (lgot, ltmp)
+  !$acc atomic capture
+    lgot = .FALSE. .and. lgot
+    ltmp = lgot
+  !$acc end atomic
+  !$acc end parallel
+
+  if (ltmp .neqv. lexp) call abort
+  if (lgot .neqv. lexp) call abort
+
+  lgot = .FALSE.
+  lexp = .FALSE.
+
+  !$acc parallel copy (lgot, ltmp)
+  !$acc atomic capture
+    lgot = .FALSE. .or. lgot
+    ltmp = lgot
+  !$acc end atomic
+  !$acc end parallel
+
+  if (ltmp .neqv. lexp) call abort
+  if (lgot .neqv. lexp) call abort
+
+  lgot = .FALSE.
+  lexp = .FALSE.
+
+  !$acc parallel copy (lgot, ltmp)
+  !$acc atomic capture
+    lgot = .TRUE. .eqv. lgot
+    ltmp = lgot
+  !$acc end atomic
+  !$acc end parallel
+
+  if (ltmp .neqv. lexp) call abort
+  if (lgot .neqv. lexp) call abort
+
+  lgot = .FALSE.
+  lexp = .TRUE.
+
+  !$acc parallel copy (lgot, ltmp)
+  !$acc atomic capture
+    lgot = .TRUE. .neqv. lgot
+    ltmp = lgot
+  !$acc end atomic
+  !$acc end parallel
+
+  if (ltmp .neqv. lexp) call abort
+  if (lgot .neqv. lexp) call abort
+
+  igot = 1
+  iexp = N
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 1, N
+  !$acc atomic capture
+      igot = max (igot, i)
+      itmp = igot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= iexp) call abort
+  if (igot /= iexp) call abort
+
+  igot = N
+  iexp = 1
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 1, N
+  !$acc atomic capture
+      igot = min (igot, i)
+      itmp = igot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= iexp) call abort
+  if (igot /= iexp) call abort
+
+  igot = -1
+  iexp = 0
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 0, N - 1
+      iexpr = ibclr (-2, i)
+  !$acc atomic capture
+      igot = iand (igot, iexpr)
+      itmp = igot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= iexp) call abort
+  if (igot /= iexp) call abort
+
+  igot = 0
+  iexp = -1 
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 0, N - 1
+      iexpr = lshift (1, i)
+  !$acc atomic capture
+      igot = ior (igot, iexpr)
+      itmp = igot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= iexp) call abort
+  if (igot /= iexp) call abort
+
+  igot = -1
+  iexp = 0 
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 0, N - 1
+      iexpr = lshift (1, i)
+  !$acc atomic capture
+      igot = ieor (igot, iexpr)
+      itmp = igot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= iexp) call abort
+  if (igot /= iexp) call abort
+
+  igot = 1
+  iexp = N
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 1, N
+  !$acc atomic capture
+      igot = max (i, igot)
+      itmp = igot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= iexp) call abort
+  if (igot /= iexp) call abort
+
+  igot = N
+  iexp = 1
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 1, N
+  !$acc atomic capture
+      igot = min (i, igot)
+      itmp = igot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= iexp) call abort
+  if (igot /= iexp) call abort
+
+  igot = -1
+  iexp = 0
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 0, N - 1
+      iexpr = ibclr (-2, i)
+  !$acc atomic capture
+      igot = iand (iexpr, igot)
+      itmp = igot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= iexp) call abort
+  if (igot /= iexp) call abort
+
+  igot = 0
+  iexp = -1 
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 0, N - 1
+      iexpr = lshift (1, i)
+  !$acc atomic capture
+      igot = ior (iexpr, igot)
+      itmp = igot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= iexp) call abort
+  if (igot /= iexp) call abort
+
+  igot = -1
+  iexp = 0 
+
+  !$acc parallel loop copy (igot, itmp)
+    do i = 0, N - 1
+      iexpr = lshift (1, i)
+  !$acc atomic capture
+      igot = ieor (iexpr, igot)
+      itmp = igot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (itmp /= iexp) call abort
+  if (igot /= iexp) call abort
+
+end program
diff --git libgomp/testsuite/libgomp.oacc-fortran/atomic_update-1.f90 libgomp/testsuite/libgomp.oacc-fortran/atomic_update-1.f90
new file mode 100644
index 0000000..6607c77
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/atomic_update-1.f90
@@ -0,0 +1,338 @@
+! { dg-do run }
+
+program main
+  integer igot, iexp, iexpr
+  real fgot, fexp
+  integer i
+  integer, parameter :: N = 32
+  logical lgot, lexp
+
+  fgot = 1234.0
+  fexp = 1266.0
+
+  !$acc parallel loop copy (fgot)
+    do i = 1, N
+  !$acc atomic update
+      fgot = fgot + 1.0
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (fgot /= fexp) call abort
+
+  fgot = 1.0
+  fexp = 2.0**32
+
+  !$acc parallel loop copy (fgot)
+    do i = 1, N
+  !$acc atomic update
+      fgot = fgot * 2.0
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (fgot /= fexp) call abort
+
+  fgot = 32.0
+  fexp = fgot - N
+
+  !$acc parallel loop copy (fgot)
+    do i = 1, N
+  !$acc atomic update
+      fgot = fgot - 1.0
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (fgot /= fexp) call abort
+
+  fgot = 2**32.0
+  fexp = 1.0
+
+  !$acc parallel loop copy (fgot)
+    do i = 1, N
+  !$acc atomic update
+      fgot = fgot / 2.0
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (fgot /= fexp) call abort
+
+  lgot = .TRUE.
+  lexp = .FALSE.
+
+  !$acc parallel copy (lgot)
+  !$acc atomic update
+    lgot = lgot .and. .FALSE.
+  !$acc end atomic
+  !$acc end parallel
+
+  if (lgot .neqv. lexp) call abort
+
+  lgot = .FALSE.
+  lexp = .FALSE.
+
+  !$acc parallel copy (lgot)
+  !$acc atomic update
+    lgot = lgot .or. .FALSE.
+  !$acc end atomic
+  !$acc end parallel
+
+  if (lgot .neqv. lexp) call abort
+
+  lgot = .FALSE.
+  lexp = .FALSE.
+
+  !$acc parallel copy (lgot)
+  !$acc atomic update
+    lgot = lgot .eqv. .TRUE.
+  !$acc end atomic
+  !$acc end parallel
+
+  if (lgot .neqv. lexp) call abort
+
+  lgot = .FALSE.
+  lexp = .TRUE.
+
+  !$acc parallel copy (lgot)
+  !$acc atomic update
+    lgot = lgot .neqv. .TRUE.
+  !$acc end atomic
+  !$acc end parallel
+
+  if (lgot .neqv. lexp) call abort
+
+  fgot = 1234.0
+  fexp = 1266.0
+
+  !$acc parallel loop copy (fgot)
+    do i = 1, N
+  !$acc atomic update
+      fgot = 1.0 + fgot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (fgot /= fexp) call abort
+
+  fgot = 1.0
+  fexp = 2.0**32
+
+  !$acc parallel loop copy (fgot)
+    do i = 1, N
+  !$acc atomic update
+      fgot = 2.0 * fgot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (fgot /= fexp) call abort
+
+  fgot = 32.0
+  fexp = 32.0
+
+  !$acc parallel loop copy (fgot)
+    do i = 1, N
+  !$acc atomic update
+      fgot = 2.0 - fgot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (fgot /= fexp) call abort
+
+  fgot = 2.0**16
+  fexp = 2.0**16
+
+  !$acc parallel loop copy (fgot)
+    do i = 1, N
+  !$acc atomic update
+      fgot = 2.0 / fgot
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (fgot /= fexp) call abort
+
+  lgot = .TRUE.
+  lexp = .FALSE.
+
+  !$acc parallel copy (lgot)
+  !$acc atomic update
+    lgot = .FALSE. .and. lgot
+  !$acc end atomic
+  !$acc end parallel
+
+  if (lgot .neqv. lexp) call abort
+
+  lgot = .FALSE.
+  lexp = .FALSE.
+
+  !$acc parallel copy (lgot)
+  !$acc atomic update
+    lgot = .FALSE. .or. lgot
+  !$acc end atomic
+  !$acc end parallel
+
+  if (lgot .neqv. lexp) call abort
+
+  lgot = .FALSE.
+  lexp = .FALSE.
+
+  !$acc parallel copy (lgot)
+  !$acc atomic update
+    lgot = .TRUE. .eqv. lgot
+  !$acc end atomic
+  !$acc end parallel
+
+  if (lgot .neqv. lexp) call abort
+
+  lgot = .FALSE.
+  lexp = .TRUE.
+
+  !$acc parallel copy (lgot)
+  !$acc atomic update
+    lgot = .TRUE. .neqv. lgot
+  !$acc end atomic
+  !$acc end parallel
+
+  if (lgot .neqv. lexp) call abort
+
+  igot = 1
+  iexp = N
+
+  !$acc parallel loop copy (igot)
+    do i = 1, N
+  !$acc atomic update
+      igot = max (igot, i)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (igot /= iexp) call abort
+
+  igot = N
+  iexp = 1
+
+  !$acc parallel loop copy (igot)
+    do i = 1, N
+  !$acc atomic update
+      igot = min (igot, i)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (igot /= iexp) call abort
+
+  igot = -1
+  iexp = 0
+
+  !$acc parallel loop copy (igot)
+    do i = 0, N - 1
+      iexpr = ibclr (-2, i)
+  !$acc atomic update
+      igot = iand (igot, iexpr)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (igot /= iexp) call abort
+
+  igot = 0
+  iexp = -1 
+
+  !$acc parallel loop copy (igot)
+    do i = 0, N - 1
+      iexpr = lshift (1, i)
+  !$acc atomic update
+      igot = ior (igot, iexpr)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (igot /= iexp) call abort
+
+  igot = -1
+  iexp = 0 
+
+  !$acc parallel loop copy (igot)
+    do i = 0, N - 1
+      iexpr = lshift (1, i)
+  !$acc atomic update
+      igot = ieor (igot, iexpr)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (igot /= iexp) call abort
+
+  igot = 1
+  iexp = N
+
+  !$acc parallel loop copy (igot)
+    do i = 1, N
+  !$acc atomic update
+      igot = max (i, igot)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (igot /= iexp) call abort
+
+  igot = N
+  iexp = 1
+
+  !$acc parallel loop copy (igot)
+    do i = 1, N
+  !$acc atomic update
+      igot = min (i, igot)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (igot /= iexp) call abort
+
+  igot = -1
+  iexp = 0
+
+  !$acc parallel loop copy (igot)
+    do i = 0, N - 1
+      iexpr = ibclr (-2, i)
+  !$acc atomic update
+      igot = iand (iexpr, igot)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (igot /= iexp) call abort
+
+  igot = 0
+  iexp = -1 
+
+  !$acc parallel loop copy (igot)
+    do i = 0, N - 1
+        iexpr = lshift (1, i)
+  !$acc atomic update
+      igot = ior (iexpr, igot)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (igot /= iexp) call abort
+
+  igot = -1
+  iexp = 0 
+
+  !$acc parallel loop copy (igot)
+    do i = 0, N - 1
+      iexpr = lshift (1, i)
+  !$acc atomic update
+      igot = ieor (iexpr, igot)
+  !$acc end atomic
+    end do
+  !$acc end parallel loop
+
+  if (igot /= iexp) call abort
+
+end program
diff --git libgomp/testsuite/libgomp.oacc-fortran/cache-1.f90 libgomp/testsuite/libgomp.oacc-fortran/cache-1.f90
new file mode 100644
index 0000000..f01b8e9
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/cache-1.f90
@@ -0,0 +1,26 @@
+
+program main
+    integer, parameter :: N = 8
+    integer, dimension (N) :: a, b
+    integer :: i
+    integer :: idx, len
+
+    idx = 1
+    len = 2
+
+    !$acc parallel copyin (a(1:N)) copyout (b(1:N))
+        do i = 1, N
+
+    !$acc cache (a(1:N))
+    !$acc cache (a(0:N))
+    !$acc cache (a(0:N), b(0:N))
+    !$acc cache (a(0))
+    !$acc cache (a(0), a(1), b(0:N))
+    !$acc cache (a(idx))
+    !$acc cache (a(idx:len))
+
+            b(i) = a(i)
+        end do
+    !$acc end parallel
+
+end program
diff --git libgomp/testsuite/libgomp.oacc-fortran/clauses-1.f90 libgomp/testsuite/libgomp.oacc-fortran/clauses-1.f90
new file mode 100644
index 0000000..e6ab78d
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/clauses-1.f90
@@ -0,0 +1,290 @@
+! { dg-do run }
+! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
+
+program main
+  use openacc
+  implicit none
+
+  integer, parameter :: N = 32
+  real, allocatable :: a(:), b(:), c(:)
+  integer i
+
+  i = 0
+
+  allocate (a(N))
+  allocate (b(N))
+  allocate (c(N))
+
+  a(:) = 3.0
+  b(:) = 0.0
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N))
+    do i = 1, N
+      b(i) = a(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 3.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 5.0
+  b(:) = 1.0
+
+  !$acc parallel copyin (a(1:N)) copyout (b(1:N))
+    do i = 1, N
+      b(i) = a(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 5.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 6.0
+  b(:) = 0.0
+
+  call acc_copyin (a, sizeof (a))
+
+  a(:) = 9.0
+
+  !$acc parallel present_or_copyin (a(1:N)) copyout (b(1:N))
+    do i = 1, N
+      b(i) = a(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 6.0) call abort
+  end do
+
+  call acc_copyout (a, sizeof (a))
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 6.0
+  b(:) = 0.0
+
+  !$acc parallel copyin (a(1:N)) present_or_copyout (b(1:N))
+     do i = 1, N
+       b(i) = a(i)
+     end do
+  !$acc end parallel
+
+  do i = 1, N
+     if (b(i) .ne. 6.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 5.0
+  b(:) = 2.0
+
+  call acc_copyin (b, sizeof (b))
+
+  !$acc parallel copyin (a(1:N)) present_or_copyout (b(1:N))
+     do i = 1, N
+       b(i) = a(i)
+     end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 5.0) call abort
+    if (b(i) .ne. 2.0) call abort
+  end do
+
+  call acc_copyout (b, sizeof (b))
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 3.0;
+  b(:) = 4.0;
+
+  !$acc parallel copy (a(1:N)) copyout (b(1:N))
+    do i = 1, N
+      a(i) = a(i) + 1
+      b(i) = a(i) + 2
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 4.0) call abort
+    if (b(i) .ne. 6.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 4.0
+  b(:) = 7.0
+
+  !$acc parallel present_or_copy (a(1:N)) present_or_copy (b(1:N))
+    do i = 1, N
+      a(i) = a(i) + 1
+      b(i) = b(i) + 2
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 5.0) call abort
+    if (b(i) .ne. 9.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 3.0
+  b(:) = 7.0
+
+  call acc_copyin (a, sizeof (a))
+  call acc_copyin (b, sizeof (b))
+
+  !$acc parallel present_or_copy (a(1:N)) present_or_copy (b(1:N))
+    do i = 1, N
+      a(i) = a(i) + 1
+      b(i) = b(i) + 2
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 3.0) call abort
+    if (b(i) .ne. 7.0) call abort
+  end do
+
+  call acc_copyout (a, sizeof (a))
+  call acc_copyout (b, sizeof (b))
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 3.0
+  b(:) = 7.0
+
+  !$acc parallel copyin (a(1:N)) create (c(1:N)) copyout (b(1:N))
+    do i = 1, N
+      c(i) = a(i)
+      b(i) = c(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 3.0) call abort
+    if (b(i) .ne. 3.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+  if (acc_is_present (c) .eqv. .TRUE.) call abort
+
+  a(:) = 4.0
+  b(:) = 8.0
+
+  !$acc parallel copyin (a(1:N)) present_or_create (c(1:N)) copyout (b(1:N))
+    do i = 1, N
+      c(i) = a(i)
+      b(i) = c(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 4.0) call abort
+    if (b(i) .ne. 4.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+  if (acc_is_present (c) .eqv. .TRUE.) call abort
+
+  a(:) = 4.0
+
+  call acc_copyin (a, sizeof (a))
+  call acc_copyin (b, sizeof (b))
+  call acc_copyin (c, sizeof (c))
+
+  !$acc parallel present (a(1:N)) present (c(1:N)) present (b(1:N))
+    do i = 1, N
+      c(i) = a(i)
+      b(i) = c(i)
+    end do
+  !$acc end parallel
+
+  call acc_copyout (a, sizeof (a))
+  call acc_copyout (b, sizeof (b))
+  call acc_copyout (c, sizeof (c))
+  
+  do i = 1, N
+    if (a(i) .ne. 4.0) call abort
+    if (b(i) .ne. 4.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+  if (acc_is_present (c) .eqv. .TRUE.) call abort
+
+  a(:) = 6.0
+  b(:) = 0.0
+
+  call acc_copyin (a, sizeof (a))
+
+  a(:) = 9.0
+
+  !$acc parallel pcopyin (a(1:N)) copyout (b(1:N))
+    do i = 1, N
+      b(i) = a(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 6.0) call abort
+  end do
+  
+  call acc_copyout (a, sizeof (a))
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 6.0
+  b(:) = 0.0
+
+  !$acc parallel copyin (a(1:N)) pcopyout (b(1:N))
+   do i = 1, N
+     b(i) = a(i)
+   end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. 6.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+
+  a(:) = 5.0
+  b(:) = 7.0
+
+  !$acc parallel copyin (a(1:N)) pcreate (c(1:N)) copyout (b(1:N))
+    do i = 1, N
+      c(i) = a(i)
+      b(i) = c(i)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 5.0) call abort
+    if (b(i) .ne. 5.0) call abort
+  end do
+
+  if (acc_is_present (a) .eqv. .TRUE.) call abort
+  if (acc_is_present (b) .eqv. .TRUE.) call abort
+  if (acc_is_present (c) .eqv. .TRUE.) call abort
+
+end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/data-1.f90 libgomp/testsuite/libgomp.oacc-fortran/data-1.f90
index 5e94e2d..bf323b3 100644
--- libgomp/testsuite/libgomp.oacc-fortran/data-1.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/data-1.f90
@@ -1,45 +1,212 @@
 ! { dg-do run }
+! { dg-additional-options "-cpp" }
 
-program test
-  integer, parameter :: N = 8
-  real, allocatable :: a(:), b(:)
+function is_mapped (n) result (rc)
+  use openacc
 
-  allocate (a(N))
-  allocate (b(N))
+  integer, intent (in) :: n
+  logical rc
 
-  a(:) = 3.0
-  b(:) = 0.0
+#if ACC_MEM_SHARED
+  integer i
 
-  !$acc enter data copyin (a(1:N), b(1:N))
+  rc = .TRUE.
+  i = n
+#else
+  rc = acc_is_present (n, sizeof (n))
+#endif
 
-  !$acc parallel
-  do i = 1, n
-    b(i) = a (i)
-  end do
-  !$acc end parallel
+end function is_mapped
 
-  !$acc exit data copyout (a(1:N), b(1:N))
+program main
+  integer i, j
+  logical is_mapped
 
-  do i = 1, n
-    if (a(i) .ne. 3.0) call abort
-    if (b(i) .ne. 3.0) call abort
-  end do
+  i = -1
+  j = -2
 
-  a(:) = 5.0
-  b(:) = 1.0
+  !$acc data copyin (i, j)
+    if (is_mapped (i) .eqv. .FALSE.) call abort
+    if (is_mapped (j) .eqv. .FALSE.) call abort
 
-  !$acc enter data copyin (a(1:N), b(1:N))
+    if (i .ne. -1 .or. j .ne. -2) call abort
 
-  !$acc parallel
-  do i = 1, n
-    b(i) = a (i)
-  end do
-  !$acc end parallel
+    i = 2
+    j = 1
 
-  !$acc exit data copyout (a(1:N), b(1:N))
+    if (i .ne. 2 .or. j .ne. 1) call abort
+  !$acc end data
 
-  do i = 1, n
-    if (a(i) .ne. 5.0) call abort
-    if (b(i) .ne. 5.0) call abort
-  end do
-end program test
+  if (i .ne. 2 .or. j .ne. 1) call abort
+
+  i = -1
+  j = -2
+
+  !$acc data copyout (i, j)
+    if (is_mapped (i) .eqv. .FALSE.) call abort
+    if (is_mapped (j) .eqv. .FALSE.) call abort
+
+    if (i .ne. -1 .or. j .ne. -2) call abort
+
+    i = 2
+    j = 1
+
+    if (i .ne. 2 .or. j .ne. 1) call abort
+
+    !$acc parallel present (i, j)
+      i = 4
+      j = 2
+    !$acc end parallel
+  !$acc end data
+
+  if (i .ne. 4 .or. j .ne. 2) call abort
+
+  i = -1
+  j = -2
+
+  !$acc data create (i, j)
+    if (is_mapped (i) .eqv. .FALSE.) call abort
+    if (is_mapped (j) .eqv. .FALSE.) call abort
+
+    if (i .ne. -1 .or. j .ne. -2) call abort
+
+    i = 2
+    j = 1
+
+    if (i .ne. 2 .or. j .ne. 1) call abort
+  !$acc end data
+
+  if (i .ne. 2 .or. j .ne. 1) call abort
+
+  i = -1
+  j = -2
+
+  !$acc data present_or_copyin (i, j)
+    if (is_mapped (i) .eqv. .FALSE.) call abort
+    if (is_mapped (j) .eqv. .FALSE.) call abort
+
+    if (i .ne. -1 .or. j .ne. -2) call abort
+
+    i = 2
+    j = 1
+
+    if (i .ne. 2 .or. j .ne. 1) call abort
+  !$acc end data
+
+  if (i .ne. 2 .or. j .ne. 1) call abort
+
+  i = -1
+  j = -2
+
+  !$acc data present_or_copyout (i, j)
+    if (is_mapped (i) .eqv. .FALSE.) call abort
+    if (is_mapped (j) .eqv. .FALSE.) call abort
+
+    if (i .ne. -1 .or. j .ne. -2) call abort
+
+    i = 2
+    j = 1
+
+    if (i .ne. 2 .or. j .ne. 1) call abort
+
+    !$acc parallel present (i, j)
+      i = 4
+      j = 2
+    !$acc end parallel
+  !$acc end data
+
+  if (i .ne. 4 .or. j .ne. 2) call abort
+
+  i = -1
+  j = -2
+
+  !$acc data present_or_copy (i, j)
+    if (is_mapped (i) .eqv. .FALSE.) call abort
+    if (is_mapped (j) .eqv. .FALSE.) call abort
+
+    if (i .ne. -1 .or. j .ne. -2) call abort
+
+    i = 2
+    j = 1
+
+    if (i .ne. 2 .or. j .ne. 1) call abort
+  !$acc end data
+
+#if ACC_MEM_SHARED
+  if (i .ne. 2 .or. j .ne. 1) call abort
+#else
+  if (i .ne. -1 .or. j .ne. -2) call abort
+#endif
+
+  i = -1
+  j = -2
+
+  !$acc data present_or_create (i, j)
+    if (is_mapped (i) .eqv. .FALSE.) call abort
+    if (is_mapped (j) .eqv. .FALSE.) call abort
+
+    i = 2
+    j = 1
+
+    if (i .ne. 2 .or. j .ne. 1) call abort
+  !$acc end data
+
+  if (i .ne. 2 .or. j .ne. 1) call abort
+
+  i = -1
+  j = -2
+
+  !$acc data copyin (i, j)
+    !$acc data present (i, j)
+      if (is_mapped (i) .eqv. .FALSE.) call abort
+      if (is_mapped (j) .eqv. .FALSE.) call abort
+
+      if (i .ne. -1 .or. j .ne. -2) call abort
+
+      i = 2
+      j = 1
+
+      if (i .ne. 2 .or. j .ne. 1) call abort
+    !$acc end data
+  !$acc end data
+
+  if (i .ne. 2 .or. j .ne. 1) call abort
+
+  i = -1
+  j = -2
+
+  !$acc data copyin (i, j)
+    !$acc data present (i, j)
+      if (is_mapped (i) .eqv. .FALSE.) call abort
+      if (is_mapped (j) .eqv. .FALSE.) call abort
+
+      if (i .ne. -1 .or. j .ne. -2) call abort
+
+      i = 2
+      j = 1
+
+      if (i .ne. 2 .or. j .ne. 1) call abort
+    !$acc end data
+  !$acc end data
+
+  if (i .ne. 2 .or. j .ne. 1) call abort
+
+  i = -1
+  j = -2
+
+  !$acc data
+#if !ACC_MEM_SHARED
+    if (is_mapped (i) .eqv. .TRUE.) call abort
+    if (is_mapped (j) .eqv. .TRUE.) call abort
+#endif
+    if (i .ne. -1 .or. j .ne. -2) call abort
+
+    i = 2
+    j = 1
+
+    if (i .ne. 2 .or. j .ne. 1) call abort
+  !$acc end data
+
+  if (i .ne. 2 .or. j .ne. 1) call abort
+
+end program main
diff --git libgomp/testsuite/libgomp.oacc-fortran/data-2.f90 libgomp/testsuite/libgomp.oacc-fortran/data-2.f90
index 8736c2a..d190700 100644
--- libgomp/testsuite/libgomp.oacc-fortran/data-2.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/data-2.f90
@@ -1,8 +1,14 @@
 ! { dg-do run }
 
 program test
+  use openacc
   integer, parameter :: N = 8
   real, allocatable :: a(:,:), b(:,:)
+  real, allocatable :: c(:), d(:)
+  integer i, j
+
+  i = 0
+  j = 0
 
   allocate (a(N,N))
   allocate (b(N,N))
@@ -28,4 +34,48 @@ program test
       if (b(j,i) .ne. 3.0) call abort
     end do
   end do
+
+  allocate (c(N))
+  allocate (d(N))
+
+  c(:) = 3.0
+  d(:) = 0.0
+
+  !$acc enter data copyin (c(1:N)) create (d(1:N)) async
+  !$acc wait
+  
+  !$acc parallel 
+    do i = 1, N
+      d(i) = c(i) + 1
+    end do
+  !$acc end parallel
+
+  !$acc exit data copyout (c(1:N), d(1:N)) async
+  !$acc wait
+
+  do i = 1, N
+    if (d(i) .ne. 4.0) call abort
+  end do
+
+  c(:) = 3.0
+  d(:) = 0.0
+
+  !$acc enter data copyin (c(1:N)) async
+  !$acc enter data create (d(1:N)) wait
+  !$acc wait
+
+  !$acc parallel 
+    do i = 1, N
+      d(i) = c(i) + 1
+    end do
+  !$acc end parallel
+  
+  !$acc exit data copyout (d(1:N)) async
+  !$acc exit data async
+  !$acc wait
+
+  do i = 1, N
+    if (d(i) .ne. 4.0) call abort
+  end do
+
 end program test
diff --git libgomp/testsuite/libgomp.oacc-fortran/data-3.f90 libgomp/testsuite/libgomp.oacc-fortran/data-3.f90
index 9868cb0..daf20a5 100644
--- libgomp/testsuite/libgomp.oacc-fortran/data-3.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/data-3.f90
@@ -17,7 +17,7 @@ program asyncwait
 
   !$acc enter data copyin (a(1:N)) copyin (b(1:N)) copyin (N) async
 
-  !$acc parallel async wait
+  !$acc parallel async wait present (a(1:N)) present (b(1:N)) present (N)
   do i = 1, N
      b(i) = a(i)
   end do
@@ -36,7 +36,7 @@ program asyncwait
 
   !$acc enter data copyin (a(1:N)) copyin (b(1:N)) async (1)
 
-  !$acc parallel async (1) wait (1)
+  !$acc parallel async (1) wait (1) present (a(1:N), b(1:N), N)
   do i = 1, N
      b(i) = a(i)
   end do
@@ -55,28 +55,30 @@ program asyncwait
   c(:) = 0.0
   d(:) = 0.0
 
-  !$acc enter data copyin (a(1:N)) create (b(1:N)) create (c(1:N)) create (d(1:N))
+  !$acc enter data copyin (a(1:N)) create (b(1:N)) create (c(1:N)) &
+  !$acc& create (d(1:N))
 
-  !$acc parallel async (1)
+  !$acc parallel async (1) present (a(1:N), b(1:N), c(1:N), N)
   do i = 1, N
      b(i) = (a(i) * a(i) * a(i)) / a(i)
   end do
   !$acc end parallel
 
-  !$acc parallel async (1)
+  !$acc parallel async (1) present (a(1:N), b(1:N), c(1:N), N)
   do i = 1, N
      c(i) = (a(i) * 4) / a(i)
   end do
   !$acc end parallel
 
-  !$acc parallel async (1)
+  !$acc parallel async (1) present (a(1:N), b(1:N), c(1:N), d(1:N), N)
   do i = 1, N
      d(i) = ((a(i) * a(i)  + a(i)) / a(i)) - a(i)
   end do
   !$acc end parallel
 
   !$acc wait (1)
-  !$acc exit data copyout (a(1:N)) copyout (b(1:N)) copyout (c(1:N)) copyout (d(1:N))
+  !$acc exit data copyout (a(1:N)) copyout (b(1:N)) copyout (c(1:N)) &
+  !$acc& copyout (d(1:N))
 
   do i = 1, N
      if (a(i) .ne. 3.0) call abort
@@ -91,34 +93,40 @@ program asyncwait
   d(:) = 0.0
   e(:) = 0.0
 
-  !$acc enter data copyin (a(1:N)) create (b(1:N)) create (c(1:N)) create (d(1:N)) copyin (e(1:N))
+  !$acc enter data copyin (a(1:N)) create (b(1:N)) create (c(1:N)) &
+  !$acc& create (d(1:N)) copyin (e(1:N))
 
-  !$acc parallel async (1)
+  !$acc parallel async (1) present (a(1:N), b(1:N), c(1:N), d(1:N)) &
+  !$acc& present (e(1:N), N)
   do i = 1, N
      b(i) = (a(i) * a(i) * a(i)) / a(i)
   end do
   !$acc end parallel
 
-  !$acc parallel async (1)
+  !$acc parallel async (1) present (a(1:N), b(1:N), c(1:N), d(1:N)) &
+  !$acc& present (e(1:N), N)
   do i = 1, N
      c(i) = (a(i) * 4) / a(i)
   end do
   !$acc end parallel
 
-  !$acc parallel async (1)
+  !$acc parallel async (1) present (a(1:N), b(1:N), c(1:N), d(1:N)) &
+  !$acc& present (e(1:N), N)
   do i = 1, N
      d(i) = ((a(i) * a(i) + a(i)) / a(i)) - a(i)
   end do
   !$acc end parallel
 
-  !$acc parallel wait (1) async (1)
+  !$acc parallel wait (1) async (1) present (a(1:N), b(1:N), c(1:N)) &
+  !$acc& present (d(1:N), e(1:N), N)
   do i = 1, N
      e(i) = a(i) + b(i) + c(i) + d(i)
   end do
   !$acc end parallel
 
   !$acc wait (1)
-  !$acc exit data copyout (a(1:N)) copyout (b(1:N)) copyout (c(1:N)) copyout (d(1:N)) copyout (e(1:N))
+  !$acc exit data copyout (a(1:N)) copyout (b(1:N)) copyout (c(1:N)) &
+  !$acc& copyout (d(1:N)) copyout (e(1:N))
   !$acc exit data delete (N)
 
   do i = 1, N
diff --git libgomp/testsuite/libgomp.oacc-fortran/data-4-2.f90 libgomp/testsuite/libgomp.oacc-fortran/data-4-2.f90
index 16a8598..d1ecf0a 100644
--- libgomp/testsuite/libgomp.oacc-fortran/data-4-2.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/data-4-2.f90
@@ -19,7 +19,7 @@ program asyncwait
 
   !$acc enter data copyin (a(1:N)) copyin (b(1:N)) copyin (N) async
 
-  !$acc parallel async wait
+  !$acc parallel async wait present (a(1:N), b(1:N), N)
   !$acc loop
   do i = 1, N
      b(i) = a(i)
@@ -39,7 +39,7 @@ program asyncwait
 
   !$acc update device (a(1:N), b(1:N)) async (1)
 
-  !$acc parallel async (1) wait (1)
+  !$acc parallel async (1) wait (1) present (a(1:N), b(1:N), N)
   !$acc loop
   do i = 1, N
      b(i) = a(i)
@@ -62,19 +62,19 @@ program asyncwait
   !$acc enter data copyin (c(1:N), d(1:N)) async (1)
   !$acc update device (a(1:N), b(1:N)) async (1)
 
-  !$acc parallel async (1)
+  !$acc parallel async (1) present (a(1:N), b(1:N), N)
   do i = 1, N
      b(i) = (a(i) * a(i) * a(i)) / a(i)
   end do
   !$acc end parallel
 
-  !$acc parallel async (1)
+  !$acc parallel async (1) present (a(1:N), c(1:N), N)
   do i = 1, N
      c(i) = (a(i) * 4) / a(i)
   end do
   !$acc end parallel
 
-  !$acc parallel async (1)
+  !$acc parallel async (1) present (a(1:N), d(1:N), N)
   do i = 1, N
      d(i) = ((a(i) * a(i)  + a(i)) / a(i)) - a(i)
   end do
@@ -100,25 +100,26 @@ program asyncwait
   !$acc enter data copyin (e(1:N)) async (1)
   !$acc update device (a(1:N), b(1:N), c(1:N), d(1:N)) async (1)
 
-  !$acc parallel async (1)
+  !$acc parallel async (1) present (a(1:N), b(1:N), N)
   do i = 1, N
      b(i) = (a(i) * a(i) * a(i)) / a(i)
   end do
   !$acc end parallel
 
-  !$acc parallel async (1)
+  !$acc parallel async (1) present (a(1:N), c(1:N), N)
   do i = 1, N
      c(i) = (a(i) * 4) / a(i)
   end do
   !$acc end parallel
 
-  !$acc parallel async (1)
+  !$acc parallel async (1) present (a(1:N), d(1:N), N)
   do i = 1, N
      d(i) = ((a(i) * a(i) + a(i)) / a(i)) - a(i)
   end do
   !$acc end parallel
 
-  !$acc parallel wait (1) async (1)
+  !$acc parallel wait (1) async (1) present (a(1:N), b(1:N), c(1:N)) &
+  !$acc& present (d(1:N), e(1:N), N)
   do i = 1, N
      e(i) = a(i) + b(i) + c(i) + d(i)
   end do
diff --git libgomp/testsuite/libgomp.oacc-fortran/data-4.f90 libgomp/testsuite/libgomp.oacc-fortran/data-4.f90
index f6886b0..4e95a9c 100644
--- libgomp/testsuite/libgomp.oacc-fortran/data-4.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/data-4.f90
@@ -17,7 +17,7 @@ program asyncwait
 
   !$acc enter data copyin (a(1:N)) copyin (b(1:N)) copyin (N) async
 
-  !$acc parallel async wait
+  !$acc parallel async wait present (a(1:N), b(1:N), N)
   !$acc loop
   do i = 1, N
      b(i) = a(i)
@@ -37,7 +37,7 @@ program asyncwait
 
   !$acc update device (a(1:N), b(1:N)) async (1)
 
-  !$acc parallel async (1) wait (1)
+  !$acc parallel async (1) wait (1) present (a(1:N), b(1:N), N)
   !$acc loop
   do i = 1, N
      b(i) = a(i)
@@ -60,19 +60,19 @@ program asyncwait
   !$acc enter data copyin (c(1:N), d(1:N)) async (1)
   !$acc update device (a(1:N), b(1:N)) async (1)
 
-  !$acc parallel async (1)
+  !$acc parallel async (1) present (a(1:N), b(1:N), N)
   do i = 1, N
      b(i) = (a(i) * a(i) * a(i)) / a(i)
   end do
   !$acc end parallel
 
-  !$acc parallel async (1)
+  !$acc parallel async (1) present (a(1:N), c(1:N), N)
   do i = 1, N
      c(i) = (a(i) * 4) / a(i)
   end do
   !$acc end parallel
 
-  !$acc parallel async (1)
+  !$acc parallel async (1) present (a(1:N), d(1:N), N)
   do i = 1, N
      d(i) = ((a(i) * a(i)  + a(i)) / a(i)) - a(i)
   end do
@@ -98,25 +98,26 @@ program asyncwait
   !$acc enter data copyin (e(1:N)) async (1)
   !$acc update device (a(1:N), b(1:N), c(1:N), d(1:N)) async (1)
 
-  !$acc parallel async (1)
+  !$acc parallel async (1) present (a(1:N), b(1:N), N)
   do i = 1, N
      b(i) = (a(i) * a(i) * a(i)) / a(i)
   end do
   !$acc end parallel
 
-  !$acc parallel async (1)
+  !$acc parallel async (1) present (a(1:N), c(1:N), N)
   do i = 1, N
      c(i) = (a(i) * 4) / a(i)
   end do
   !$acc end parallel
 
-  !$acc parallel async (1)
+  !$acc parallel async (1) present (a(1:N), d(1:N), N)
   do i = 1, N
      d(i) = ((a(i) * a(i) + a(i)) / a(i)) - a(i)
   end do
   !$acc end parallel
 
-  !$acc parallel wait (1) async (1)
+  !$acc parallel wait (1) async (1) present (a(1:N), b(1:N), c(1:N)) &
+  !$acc& present (d(1:N), e(1:N), N)
   do i = 1, N
      e(i) = a(i) + b(i) + c(i) + d(i)
   end do
diff --git libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90 libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
new file mode 100644
index 0000000..0bab5bd
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/declare-1.f90
@@ -0,0 +1,229 @@
+! { dg-do run  { target openacc_nvidia_accel_selected } }
+
+subroutine subr6 (a, d)
+  integer, parameter :: N = 8
+  integer :: i
+  integer :: a(N)
+  !$acc declare deviceptr (a)
+  integer :: d(N)
+
+  i = 0
+
+  !$acc parallel copy (d)
+    do i = 1, N
+      d(i) = a(i) + a(i)
+    end do
+  !$acc end parallel
+
+end subroutine
+
+subroutine subr5 (a, b, c, d)
+  integer, parameter :: N = 8
+  integer :: i
+  integer :: a(N)
+  !$acc declare present_or_copyin (a)
+  integer :: b(N)
+  !$acc declare present_or_create (b)
+  integer :: c(N)
+  !$acc declare present_or_copyout (c)
+  integer :: d(N)
+  !$acc declare present_or_copy (d)
+
+  i = 0
+
+  !$acc parallel
+    do i = 1, N
+      b(i) = a(i)
+      c(i) = b(i)
+      d(i) = d(i) + b(i)
+    end do
+  !$acc end parallel
+
+end subroutine
+
+subroutine subr4 (a, b)
+  integer, parameter :: N = 8
+  integer :: i
+  integer :: a(N)
+  !$acc declare present (a)
+  integer :: b(N)
+  !$acc declare copyout (b)
+
+  i = 0
+
+  !$acc parallel
+  do i = 1, N
+    b(i) = a(i)
+  end do
+  !$acc end parallel
+
+end subroutine
+
+subroutine subr3 (a, c)
+  integer, parameter :: N = 8
+  integer :: i
+  integer :: a(N)
+  !$acc declare present (a)
+  integer :: c(N)
+  !$acc declare copyin (c)
+
+  i = 0
+
+  !$acc parallel
+  do i = 1, N
+    a(i) = c(i)
+    c(i) = 0
+  end do
+  !$acc end parallel
+
+end subroutine
+
+subroutine subr2 (a, b, c)
+  integer, parameter :: N = 8
+  integer :: i
+  integer :: a(N)
+  !$acc declare present (a)
+  integer :: b(N)
+  !$acc declare create (b)
+  integer :: c(N)
+  !$acc declare copy (c)
+
+  i = 0
+
+  !$acc parallel
+  do i = 1, N
+    b(i) = a(i)
+    c(i) = b(i) + c(i) + 1
+  end do
+  !$acc end parallel
+
+end subroutine
+
+subroutine subr1 (a, b, c)
+  integer, parameter :: N = 8
+  integer :: i
+  integer :: a(N)
+  !$acc declare present (a)
+  integer :: b(N)
+  integer :: c(N)
+
+  i = 0
+
+  !$acc parallel
+  do i = 1, N
+    a(i) = a(i) + 1
+  end do
+  !$acc end parallel
+
+end subroutine
+
+subroutine test (a, e)
+  use openacc
+  logical :: e
+  integer, parameter :: N = 8
+  integer :: a(N)
+
+  if (acc_is_present (a) .neqv. e) call abort
+
+end subroutine
+
+subroutine subr0 (a, b, c, d)
+  integer, parameter :: N = 8
+  integer :: a(N)
+  !$acc declare copy (a)
+  integer :: b(N)
+  integer :: c(N)
+  integer :: d(N)
+
+  call test (a, .true.)
+  call test (b, .false.)
+  call test (c, .false.)
+
+  call subr1 (a, b, c)
+
+  call test (a, .true.)
+  call test (b, .false.)
+  call test (c, .false.)
+
+  call subr2 (a, b, c)
+
+  call test (a, .true.)
+  call test (b, .false.)
+  call test (c, .false.)
+
+  do i = 1, N
+    if (c(i) .ne. 8) call abort
+  end do
+
+  call subr3 (a, c)
+
+  call test (a, .true.)
+  call test (b, .false.)
+  call test (c, .false.)
+
+  do i = 1, N
+    if (a(i) .ne. 2) call abort
+    if (c(i) .ne. 8) call abort
+  end do
+
+  call subr4 (a, b)
+
+  call test (a, .true.)
+  call test (b, .false.)
+  call test (c, .false.)
+
+  do i = 1, N
+    if (b(i) .ne. 8) call abort
+  end do
+
+  call subr5 (a, b, c, d)
+
+  call test (a, .true.)
+  call test (b, .false.)
+  call test (c, .false.)
+  call test (d, .false.)
+
+  do i = 1, N
+    if (c(i) .ne. 8) call abort
+    if (d(i) .ne. 13) call abort
+  end do
+
+  call subr6 (a, d)
+
+  call test (a, .true.)
+  call test (d, .false.)
+
+  do i = 1, N
+    if (d(i) .ne. 16) call abort
+  end do
+
+end subroutine
+
+program main
+  use openacc
+  integer, parameter :: N = 8
+  integer :: a(N)
+  integer :: b(N)
+  integer :: c(N)
+  integer :: d(N)
+
+  a(:) = 2
+  b(:) = 3
+  c(:) = 4
+  d(:) = 5
+
+  call subr0 (a, b, c, d)
+
+  call test (a, .false.)
+  call test (b, .false.)
+  call test (c, .false.)
+  call test (d, .false.)
+
+  do i = 1, N
+    if (a(i) .ne. 8) call abort
+    if (b(i) .ne. 8) call abort
+    if (c(i) .ne. 8) call abort
+    if (d(i) .ne. 16) call abort
+  end do
+
+end program
diff --git libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90 libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90
new file mode 100644
index 0000000..593cde6
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90
@@ -0,0 +1,24 @@
+! { dg-do run }
+
+program main
+  use openacc
+  implicit none
+
+  integer :: i, n
+
+  n = 1000000
+
+  !$acc parallel async (0)
+    do i = 1, 1000000
+    end do
+  !$acc end parallel
+
+  call acc_wait_async (0, 1)
+
+  if (acc_async_test (0) .neqv. .TRUE.) call abort
+
+  if (acc_async_test (1) .neqv. .TRUE.) call abort
+
+  call acc_wait (1)
+
+end program
diff --git libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90 libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90
new file mode 100644
index 0000000..cffda87
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90
@@ -0,0 +1,28 @@
+! { dg-do run }
+
+program main
+  use openacc
+  implicit none
+
+  integer :: i, j, nprocs
+  integer, parameter :: N = 1000000
+
+  nprocs = 2
+
+  do j = 1, nprocs
+    !$acc parallel async (j)
+      do i = 1, N
+      end do
+    !$acc end parallel
+  end do
+
+  if (acc_async_test (1) .neqv. .TRUE.) call abort
+  if (acc_async_test (2) .neqv. .TRUE.) call abort
+
+  call acc_wait_all_async (nprocs + 1)
+
+  if (acc_async_test (nprocs + 1) .neqv. .TRUE.) call abort
+
+  call acc_wait_all ()
+
+end program
diff --git libgomp/testsuite/libgomp.oacc-fortran/lib-14.f90 libgomp/testsuite/libgomp.oacc-fortran/lib-14.f90
new file mode 100644
index 0000000..72a2b49
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/lib-14.f90
@@ -0,0 +1,79 @@
+! { dg-do run }
+
+program main
+  use openacc
+  implicit none
+
+  integer, parameter :: N = 256
+  integer, allocatable :: h(:)
+  integer :: i
+
+  allocate (h(N))
+
+  do i = 1, N
+    h(i) = i
+  end do 
+
+  call acc_present_or_copyin (h)
+
+  if (acc_is_present (h) .neqv. .TRUE.) call abort
+
+  call acc_copyout (h)
+
+  if (acc_is_present (h) .neqv. .FALSE.) call abort
+
+  do i = 1, N
+    if (h(i) /= i) call abort
+  end do
+
+  do i = 1, N
+    h(i) = i + i
+  end do 
+
+  call acc_pcopyin (h, sizeof (h))
+
+  if (acc_is_present (h) .neqv. .TRUE.) call abort
+
+  call acc_copyout (h)
+
+  if (acc_is_present (h) .neqv. .FALSE.) call abort
+
+  do i = 1, N
+    if (h(i) /= i + i) call abort
+  end do
+
+  call acc_create (h)
+
+  if (acc_is_present (h) .neqv. .TRUE.) call abort
+
+  !$acc parallel loop
+    do i = 1, N
+      h(i) = i
+    end do
+  !$end acc parallel
+
+  call acc_copyout (h)
+
+  if (acc_is_present (h) .neqv. .FALSE.) call abort
+
+  do i = 1, N
+    if (h(i) /= i) call abort
+  end do
+
+  call acc_present_or_create (h, sizeof (h))
+
+  if (acc_is_present (h) .neqv. .TRUE.) call abort
+
+  call acc_delete (h)
+
+  if (acc_is_present (h) .neqv. .FALSE.) call abort
+
+  call acc_pcreate (h)
+
+  if (acc_is_present (h) .neqv. .TRUE.) call abort
+
+  call acc_delete (h)
+
+  if (acc_is_present (h) .neqv. .FALSE.) call abort
+
+end program
diff --git libgomp/testsuite/libgomp.oacc-fortran/lib-15.f90 libgomp/testsuite/libgomp.oacc-fortran/lib-15.f90
new file mode 100644
index 0000000..3a834db
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/lib-15.f90
@@ -0,0 +1,52 @@
+! { dg-do run }
+! { dg-skip-if "" { *-*-* } { "*" } { "-DACC_MEM_SHARED=0" } }
+
+program main
+  use openacc
+  implicit none
+
+  integer, parameter :: N = 256
+  integer, allocatable :: h(:)
+  integer :: i
+
+  allocate (h(N))
+
+  do i = 1, N
+    h(i) = i
+  end do 
+
+  call acc_copyin (h)
+
+  do i = 1, N
+    h(i) = i + i
+  end do 
+
+  call acc_update_device (h, sizeof (h))
+
+  if (acc_is_present (h) .neqv. .TRUE.) call abort
+
+  h(:) = 0
+
+  call acc_copyout (h, sizeof (h))
+
+  do i = 1, N
+    if (h(i) /= i + i) call abort
+  end do 
+
+  call acc_copyin (h, sizeof (h))
+
+  h(:) = 0
+
+  call acc_update_self (h, sizeof (h))
+  
+  if (acc_is_present (h) .neqv. .TRUE.) call abort
+
+  do i = 1, N
+    if (h(i) /= i + i) call abort
+  end do 
+
+  call acc_delete (h)
+
+  if (acc_is_present (h) .neqv. .FALSE.) call abort
+  
+end program
diff --git libgomp/testsuite/libgomp.oacc-fortran/routine-5.f90 libgomp/testsuite/libgomp.oacc-fortran/routine-5.f90
new file mode 100644
index 0000000..aaeb994
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-fortran/routine-5.f90
@@ -0,0 +1,27 @@
+! { dg-do run }
+! { dg-options "-fno-inline" }
+
+program main
+    integer :: n
+
+    n = 5
+
+    !$acc parallel copy (n)
+      n = func (n)
+    !$acc end parallel
+
+    if (n .ne. 6) call abort
+
+contains
+
+    function func (n) result (rc)
+    !$acc routine gang worker vector seq nohost
+    integer, intent (in) :: n
+    integer :: rc
+
+    rc = n
+    rc = rc + 1
+
+    end function
+
+end program


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Next set of OpenACC changes: Fortran
  2015-05-05  8:59 ` Next set of OpenACC changes: Fortran Thomas Schwinge
@ 2015-05-05 10:42   ` Bernhard Reutner-Fischer
  0 siblings, 0 replies; 11+ messages in thread
From: Bernhard Reutner-Fischer @ 2015-05-05 10:42 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: GCC Patches, Jakub Jelinek, gfortran, Bernd Schmidt,
	Cesar Philippidis, Chung-Lin Tang, James Norris, Joseph Myers,
	Julian Brown, Tom de Vries

On 5 May 2015 at 10:58, Thomas Schwinge <thomas@codesourcery.com> wrote:
> Hi!

 +/* Node in the linked list used for storing !$oacc declare constructs.  */

The clause is called $ACC declare, isn't it?


> +  for (oc = new_oc; oc; oc = oc->next)
> +    {
> +      c = oc->clauses;
> +      for (n = c->lists[OMP_LIST_MAP]; n != NULL; n = n->next)
> +       n->sym->mark = 0;
> +    }
> +
> +  for (oc = new_oc; oc; oc = oc->next)
> +    {
> +      c = oc->clauses;
> +      for (n = c->lists[OMP_LIST_MAP]; n != NULL; n = n->next)
> +       {
> +         if (n->sym->mark)
> +           {
> +             gfc_error ("Symbol %qs present on multiple clauses at %C",
> +                        n->sym->name);
> +             return MATCH_ERROR;
> +           }
> +         else
> +           n->sym->mark = 1;
> +       }
> +    }
> +
> +  for (oc = new_oc; oc; oc = oc->next)
> +    {
> +      c = oc->clauses;
> +      for (n = c->lists[OMP_LIST_MAP]; n != NULL; n = n->next)
> +       n->sym->mark = 1;
> +    }

Much code for setting n->sym->mark = 1. What am i missing?

> +
> +  ns->oacc_declare = new_oc;
> +
>    return MATCH_YES;
>  }
>
> @@ -1304,10 +1580,21 @@ match
>  gfc_match_oacc_update (void)
>  {
>    gfc_omp_clauses *c;
> -  if (gfc_match_omp_clauses (&c, OACC_UPDATE_CLAUSES, false, false, true)
> +  locus here = gfc_current_locus;
> +
> +  if (gfc_match_omp_clauses (&c, OACC_UPDATE_CLAUSES,
> +                            OACC_UPDATE_CLAUSE_DEVICE_TYPE_MASK, false,
> +                            false, true)
>        != MATCH_YES)
>      return MATCH_ERROR;
>
> +  if (!c->lists[OMP_LIST_MAP])
> +    {
> +      gfc_error ("%<acc update%> must contain at least one "
> +                "%<device%> or %<host/self%> clause at %L", &here);
> +      return MATCH_ERROR;

$ACC UPDATE instead of %<acc update %> ?

> -  else if (code->ext.omp_clauses->gang
> -          && code->ext.omp_clauses->worker
> -          && code->ext.omp_clauses->vector)
> +  if (code->ext.omp_clauses->tile_list && code->ext.omp_clauses->gang
> +      && code->ext.omp_clauses->worker && code->ext.omp_clauses->vector)


conditions on separate lines, please.

> -  for (list = OMP_LIST_DEVICE_RESIDENT;
> -       list <= OMP_LIST_DEVICE_RESIDENT; list++)
> -    for (n = ns->oacc_declare_clauses->lists[list]; n; n = n->next)
> -      {
> -       n->sym->mark = 0;
> -       if (n->sym->attr.flavor == FL_PARAMETER)
> -         gfc_error ("PARAMETER object %qs is not allowed at %L", n->sym->name, &loc);
> -      }
> +      for (list = OMP_LIST_DEVICE_RESIDENT;
> +          list <= OMP_LIST_DEVICE_RESIDENT; list++)
> +       for (n = oc->clauses->lists[list]; n; n = n->next)
> +         {
> +           n->sym->mark = 0;
> +           if (n->sym->attr.flavor == FL_PARAMETER)
> +             gfc_error ("PARAMETER object %qs is not allowed at %L",
> +                        n->sym->name, &loc);
> +         }
>
> -  for (list = OMP_LIST_DEVICE_RESIDENT;
> -       list <= OMP_LIST_DEVICE_RESIDENT; list++)
> -    for (n = ns->oacc_declare_clauses->lists[list]; n; n = n->next)
> -      {
> -       if (n->sym->mark)
> -         gfc_error ("Symbol %qs present on multiple clauses at %L",
> -                    n->sym->name, &loc);
> -       else
> -         n->sym->mark = 1;
> -      }
> +      for (list = OMP_LIST_DEVICE_RESIDENT;
> +           list <= OMP_LIST_DEVICE_RESIDENT; list++)
> +       for (n = oc->clauses->lists[list]; n; n = n->next)
> +         {
> +           if (n->sym->mark)
> +             gfc_error ("Symbol %qs present on multiple clauses at %L",
> +                        n->sym->name, &loc);
> +           else
> +             n->sym->mark = 1;
> +         }
>
> -  for (n = ns->oacc_declare_clauses->lists[OMP_LIST_DEVICE_RESIDENT]; n;
> -       n = n->next)
> -    check_array_not_assumed (n->sym, loc, "DEVICE_RESIDENT");
> +      for (n = oc->clauses->lists[OMP_LIST_DEVICE_RESIDENT]; n; n = n->next)
> +       check_array_not_assumed (n->sym, loc, "DEVICE_RESIDENT");
> +
> +      for (n = oc->clauses->lists[OMP_LIST_MAP]; n; n = n->next)
> +       {
> +         if (n->expr && n->expr->ref->type == REF_ARRAY)
> +             gfc_error ("Subarray: %qs not allowed in $!ACC DECLARE at %L",
> +                        n->sym->name, &loc);
> +       }
> +    }
>  }

The ->mark setting looks complicated (as noted above)?

thanks,

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Next set of OpenACC changes: C family
  2015-05-05  8:58 ` Next set of OpenACC changes: C family Thomas Schwinge
@ 2015-05-05 14:19   ` Jakub Jelinek
  2015-05-05 15:40     ` Cesar Philippidis
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Jelinek @ 2015-05-05 14:19 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: gcc-patches, Bernd Schmidt, Cesar Philippidis, Chung-Lin Tang,
	James Norris, Joseph Myers, Julian Brown, Tom de Vries

On Tue, May 05, 2015 at 10:57:28AM +0200, Thomas Schwinge wrote:
> --- gcc/c-family/c-common.c
> +++ gcc/c-family/c-common.c
> @@ -809,7 +809,7 @@ const struct attribute_spec c_common_attribute_table[] =
>  			      handle_omp_declare_simd_attribute, false },
>    { "cilk simd function",     0, -1, true,  false, false,
>  			      handle_omp_declare_simd_attribute, false },
> -  { "omp declare target",     0, 0, true, false, false,
> +  { "omp declare target",     0, -1, true, false, false,
>  			      handle_omp_declare_target_attribute, false },
>    { "alloc_align",	      1, 1, false, true, true,
>  			      handle_alloc_align_attribute, false },

Can you explain this change?  "omp declare target" doesn't take any
arguments, so "0, 0," looks right to me.

> @@ -823,6 +823,7 @@ const struct attribute_spec c_common_attribute_table[] =
>  			      handle_bnd_legacy, false },
>    { "bnd_instrument",         0, 0, true, false, false,
>  			      handle_bnd_instrument, false },
> +  { "oacc declare",           0, -1, true,  false, false, NULL, false },
>    { NULL,                     0, 0, false, false, false, NULL, false }

If "oacc declare" is different, then supposedly you shouldn't reuse
"omp declare target" attribute for the OpenACC thingie.

> --- gcc/c-family/c-omp.c
> +++ gcc/c-family/c-omp.c
> @@ -1087,3 +1087,108 @@ c_omp_predetermined_sharing (tree decl)
>  
>    return OMP_CLAUSE_DEFAULT_UNSPECIFIED;
>  }
> +
> +/* Return a numerical code representing the device_type.  Currently,
> +   only device_type(nvidia) is supported.  All device_type parameters
> +   are treated as case-insensitive keywords.  */
> +
> +int
> +oacc_extract_device_id (const char *device)
> +{
> +  if (!strcasecmp (device, "nvidia"))
> +    return GOMP_DEVICE_NVIDIA_PTX;
> +  return GOMP_DEVICE_NONE;
> +}

Why do you support just one particular device_type?  That sounds broken.
You should just have some table with names <-> GOMP_DEVICE_* mappings.

> +	  if (code & (1 << GOMP_DEVICE_NVIDIA_PTX))
> +	    {
> +	      if (seen_nvidia)
> +		{
> +		  seen_nvidia = NULL_TREE;
> +		  error_at (OMP_CLAUSE_LOCATION (c),
> +			    "duplicate device_type (nvidia)");
> +		  goto filter_error;
> +		}
> +	      else
> +		seen_nvidia = OMP_CLAUSE_DEVICE_TYPE_CLAUSES (c);

Again, I must say I don't like the hardcoding of one particular
device type here.
Doesn't Intel want to support OpenACC for XeonPhi?  What about HSA
eventually, etc.?

> @@ -4624,7 +4657,7 @@ c_parser_compound_statement_nostart (c_parser *parser)
>  	  last_label = false;
>  	  mark_valid_location_for_stdc_pragma (false);
>  	  c_parser_declaration_or_fndef (parser, true, true, true, true,
> -					 true, NULL, vNULL);
> +					 true, NULL, vNULL, NULL_TREE, false);

Wouldn't default arguments be in order here?  Though, even those will mean
compile time cost of passing all the zeros almost all the time.

> -/* OpenMP 2.5:
> +/* OpenACC:
> +   num_gangs ( expression )
> +   num_workers ( expression )
> +   vector_length ( expression )
> +
> +   OpenMP 2.5:
>     num_threads ( expression ) */
>  
>  static tree
> -c_parser_omp_clause_num_threads (c_parser *parser, tree list)
> +c_parser_omp_positive_int_clause (c_parser *parser, pragma_omp_clause c_kind,
> +				  const char *str, tree list)
>  {
> -  location_t num_threads_loc = c_parser_peek_token (parser)->location;
> -  if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
> +  omp_clause_code kind;
> +  switch (c_kind)

This is undesirable, to add new clauses to the same handler you'd need
to add them both in the caller and to this switch.  Perhaps pass
omp_clause_code kind argument instead of pragma_omp_clause c_kind?

>  static tree
> -c_parser_omp_clause_num_workers (c_parser *parser, tree list)
> +c_parser_oacc_shape_clause (c_parser *parser, pragma_omp_clause c_kind,
> +			    const char *str, tree list)
>  {
> -  location_t num_workers_loc = c_parser_peek_token (parser)->location;
> -  if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
> +  omp_clause_code kind;
> +  const char *id = "num";
> +
> +  switch (c_kind)

Likewise.

> +/* Split the 'clauses' into a set of 'loop' clauses and a set of
> +   'not-loop' clauses.  */
>  
>  static tree
> -c_parser_oacc_kernels (location_t loc, c_parser *parser, char *p_name)
> +oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses)

Is this really C specific?  I mean, for OpenMP I'm sharing the clause
splitting code between C and C++ FEs in c-omp.c.

	Jakub

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Next set of OpenACC changes: C family
  2015-05-05 14:19   ` Jakub Jelinek
@ 2015-05-05 15:40     ` Cesar Philippidis
  0 siblings, 0 replies; 11+ messages in thread
From: Cesar Philippidis @ 2015-05-05 15:40 UTC (permalink / raw)
  To: Jakub Jelinek, Thomas Schwinge
  Cc: gcc-patches, Bernd Schmidt, Chung-Lin Tang, James Norris,
	Joseph Myers, Julian Brown, Tom de Vries

On 05/05/2015 07:18 AM, Jakub Jelinek wrote:
> On Tue, May 05, 2015 at 10:57:28AM +0200, Thomas Schwinge wrote:
>> --- gcc/c-family/c-common.c
>> +++ gcc/c-family/c-common.c
>> @@ -809,7 +809,7 @@ const struct attribute_spec c_common_attribute_table[] =
>>  			      handle_omp_declare_simd_attribute, false },
>>    { "cilk simd function",     0, -1, true,  false, false,
>>  			      handle_omp_declare_simd_attribute, false },
>> -  { "omp declare target",     0, 0, true, false, false,
>> +  { "omp declare target",     0, -1, true, false, false,
>>  			      handle_omp_declare_target_attribute, false },
>>    { "alloc_align",	      1, 1, false, true, true,
>>  			      handle_alloc_align_attribute, false },
> 
> Can you explain this change?  "omp declare target" doesn't take any
> arguments, so "0, 0," looks right to me.

Because we are using that attribute for oacc routines, and routines may
have contain clauses.

Thinking about this some more, we could probably revert this change. I
have another patch to disable exception handling inside openacc
accelerated regions because the nvptx target doesn't support them. In
that patch I introduced a new "oacc function" attribute. Maybe we should
attach the acc routine clauses on that "oacc function" attribute.

>> @@ -823,6 +823,7 @@ const struct attribute_spec c_common_attribute_table[] =
>>  			      handle_bnd_legacy, false },
>>    { "bnd_instrument",         0, 0, true, false, false,
>>  			      handle_bnd_instrument, false },
>> +  { "oacc declare",           0, -1, true,  false, false, NULL, false },
>>    { NULL,                     0, 0, false, false, false, NULL, false }
> 
> If "oacc declare" is different, then supposedly you shouldn't reuse
> "omp declare target" attribute for the OpenACC thingie.

I'm not sure about this one. Oacc has enough quirks where it may be
justifiable though. I'll find out who wrote this patch.

>> --- gcc/c-family/c-omp.c
>> +++ gcc/c-family/c-omp.c
>> @@ -1087,3 +1087,108 @@ c_omp_predetermined_sharing (tree decl)
>>  
>>    return OMP_CLAUSE_DEFAULT_UNSPECIFIED;
>>  }
>> +
>> +/* Return a numerical code representing the device_type.  Currently,
>> +   only device_type(nvidia) is supported.  All device_type parameters
>> +   are treated as case-insensitive keywords.  */
>> +
>> +int
>> +oacc_extract_device_id (const char *device)
>> +{
>> +  if (!strcasecmp (device, "nvidia"))
>> +    return GOMP_DEVICE_NVIDIA_PTX;
>> +  return GOMP_DEVICE_NONE;
>> +}
> 
> Why do you support just one particular device_type?  That sounds broken.
> You should just have some table with names <-> GOMP_DEVICE_* mappings.

I kind of wanted to keep this patch local in gomp-4_0-branch until it
was a little more functional. Adding proper support for device_type is
going to be more involved. For instance, the the tile clause changes the
shape of a loop, so if you have

  #pragma acc loop tile (2, 4) device_type (nvidia) tile (5, 5) \
     device_type (something_else) tile (1, 4)

we're going to have to generate three different versions of that
parallel region. Then we'd have to teach the compiler to the offload
regions with the proper number of gangs, workers and vectors, etc.

For our initial implementation, we just decided to support device_type
(nvidia), since openacc is really only working on nvptx and host
devices. And the runtime is rigged up to ignore num_gangs, num_workers
and vector_length for the host anyway. So that's why I filtered out the
device_type clauses in the front end.

Also, for full disclosure, we're parsing the tile clause, but we're not
actually tiling the loops yet. We're still in the process of getting the
oacc execution model working on the nvptx target. Things which are
"easy" to do in cpu threads (barriers and synchronization, global
memory, etc.) are not as straightforward on gpus, unfortunately.

>> +	  if (code & (1 << GOMP_DEVICE_NVIDIA_PTX))
>> +	    {
>> +	      if (seen_nvidia)
>> +		{
>> +		  seen_nvidia = NULL_TREE;
>> +		  error_at (OMP_CLAUSE_LOCATION (c),
>> +			    "duplicate device_type (nvidia)");
>> +		  goto filter_error;
>> +		}
>> +	      else
>> +		seen_nvidia = OMP_CLAUSE_DEVICE_TYPE_CLAUSES (c);
> 
> Again, I must say I don't like the hardcoding of one particular
> device type here.
> Doesn't Intel want to support OpenACC for XeonPhi?  What about HSA
> eventually, etc.?
> 
>> @@ -4624,7 +4657,7 @@ c_parser_compound_statement_nostart (c_parser *parser)
>>  	  last_label = false;
>>  	  mark_valid_location_for_stdc_pragma (false);
>>  	  c_parser_declaration_or_fndef (parser, true, true, true, true,
>> -					 true, NULL, vNULL);
>> +					 true, NULL, vNULL, NULL_TREE, false);
> 
> Wouldn't default arguments be in order here?  Though, even those will mean
> compile time cost of passing all the zeros almost all the time.

I'll check who did this.

>> -/* OpenMP 2.5:
>> +/* OpenACC:
>> +   num_gangs ( expression )
>> +   num_workers ( expression )
>> +   vector_length ( expression )
>> +
>> +   OpenMP 2.5:
>>     num_threads ( expression ) */
>>  
>>  static tree
>> -c_parser_omp_clause_num_threads (c_parser *parser, tree list)
>> +c_parser_omp_positive_int_clause (c_parser *parser, pragma_omp_clause c_kind,
>> +				  const char *str, tree list)
>>  {
>> -  location_t num_threads_loc = c_parser_peek_token (parser)->location;
>> -  if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
>> +  omp_clause_code kind;
>> +  switch (c_kind)
> 
> This is undesirable, to add new clauses to the same handler you'd need
> to add them both in the caller and to this switch.  Perhaps pass
> omp_clause_code kind argument instead of pragma_omp_clause c_kind?
> 
>>  static tree
>> -c_parser_omp_clause_num_workers (c_parser *parser, tree list)
>> +c_parser_oacc_shape_clause (c_parser *parser, pragma_omp_clause c_kind,
>> +			    const char *str, tree list)
>>  {
>> -  location_t num_workers_loc = c_parser_peek_token (parser)->location;
>> -  if (c_parser_require (parser, CPP_OPEN_PAREN, "expected %<(%>"))
>> +  omp_clause_code kind;
>> +  const char *id = "num";
>> +
>> +  switch (c_kind)
> 
> Likewise.
> 
>> +/* Split the 'clauses' into a set of 'loop' clauses and a set of
>> +   'not-loop' clauses.  */
>>  
>>  static tree
>> -c_parser_oacc_kernels (location_t loc, c_parser *parser, char *p_name)
>> +oacc_split_loop_clauses (tree clauses, tree *not_loop_clauses)
> 
> Is this really C specific?  I mean, for OpenMP I'm sharing the clause
> splitting code between C and C++ FEs in c-omp.c.

Probably not. C++ support was added late. We'll clean up this patch.

Cesar

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gomp4] Next set of OpenACC changes
  2015-05-05  8:54 Next set of OpenACC changes Thomas Schwinge
                   ` (3 preceding siblings ...)
  2015-05-05  9:00 ` Next set of OpenACC changes: Testsuite Thomas Schwinge
@ 2015-05-11 16:35 ` Thomas Schwinge
  2015-05-13 20:57   ` [gomp4] Assorted OpenACC changes (was: Next set of OpenACC changes) Thomas Schwinge
  4 siblings, 1 reply; 11+ messages in thread
From: Thomas Schwinge @ 2015-05-11 16:35 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek
  Cc: Bernd Schmidt, Cesar Philippidis, Chung-Lin Tang, James Norris,
	Joseph Myers, Julian Brown, Tom de Vries

[-- Attachment #1: Type: text/plain, Size: 30715 bytes --]

Hi!

On Tue, 05 May 2015 10:54:02 +0200, I wrote:
> In follow-up messages, I'll be posting the separated parts (for easier
> review) of a next set of OpenACC changes that we'd like to commit.
> ChangeLog updates not yet written; will do that before commit, obviously.

In order for us to be able to make progress with staging our other
OpenACC changes in gomp-4_0-branch, I have now committed to
gomp-4_0-branch r223007, which is these patches as posted plus a tiny
last-minute typo fix (see below), and we shall then work on addressing
the review comments already provided (thanks!) (as well as those which I
found myself, upon reviewing our changes), before later re-submitting for
trunk.

commit cd00a35cd24a3ac05cac0061639ce631e52f2f49
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon May 11 16:29:03 2015 +0000

    Next set of OpenACC changes
    
    	gcc/c-family/
    	* c-common.c (c_common_attribute_table): Set min_len to -1 for
    	"omp declare target".  Add "oacc declare".
    	* c-common.h (oacc_extract_device_id, oacc_filter_device_types):
    	New prototypes.
    	* c-omp.c (oacc_extract_device_id, oacc_filter_device_types): New
    	functions.
    	* c-pragma.c (oacc_pragmas): Add "atomic", "declare", "host_data",
    	"routine".
    	* c-pragma.h (pragma_kind): Add PRAGMA_OACC_ATOMIC,
    	PRAGMA_OACC_DECLARE, PRAGMA_OACC_HOST_DATA, PRAGMA_OACC_ROUTINE.
    	(pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_BIND,
    	PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT,
    	PRAGMA_OACC_CLAUSE_DEVICE_TYPE, PRAGMA_OACC_CLAUSE_INDEPENDENT,
    	PRAGMA_OACC_CLAUSE_LINK, PRAGMA_OACC_CLAUSE_NOHOST,
    	PRAGMA_OACC_CLAUSE_TILE, PRAGMA_OACC_CLAUSE_USE_DEVICE,
    	PRAGMA_OACC_CLAUSE_DEFAULT, remove PRAGMA_OACC_CLAUSE_SELF.
    	gcc/c/
    	* c-parser.c (c_parser): Add oacc_routines member.
    	(c_parse_file): Initialize it.
    	(c_parser_declaration_or_fndef): Add oacc_routine_clauses, and
    	oacc_routine_named formal parameters.  Adjust all users.  Support
    	OpenACC routines.
    	(c_parser_pragma): Handle PRAGMA_OACC_DECLARE,
    	PRAGMA_OACC_ROUTINE, PRAGMA_OACC_WAIT.  Add pragma context
    	checking for PRAGMA_OACC_ENTER_DATA, PRAGMA_OACC_EXIT_DATA.
    	(c_parser_omp_clause_name): Add consume_token formal parameter.
    	Handle "bind", "device_resident", "device_type", "dtype",
    	"independent", "link", "nohost", "tile", "use_device".
    	(c_parser_oacc_wait_list): Change an error message.
    	(c_parser_oacc_data_clause): Handle
    	PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT, PRAGMA_OACC_CLAUSE_LINK.
    	Don't handle PRAGMA_OACC_CLAUSE_SELF.
    	(c_parser_omp_clause_default): Add only_none formal parameter.
    	(c_parser_omp_clause_num_gangs, c_parser_omp_clause_num_threads)
    	(c_parser_omp_clause_num_workers)
    	(c_parser_omp_clause_vector_length): Replace functions by...
    	(require_positive_expr, c_parser_omp_positive_int_clause):
    	... these new functions.  Adjust all users.
    	(c_parser_omp_clause_untied, c_parser_omp_clause_branch): Replace
    	functions by...
    	(c_parser_omp_simple_clause): ... this new function.  Adjust all
    	users.
    	(c_parser_oacc_shape_clause, c_parser_oacc_clause_bind)
    	(c_parser_oacc_clause_device_type, c_parser_oacc_clause_tile)
    	(c_parser_oacc_clause_use_device): New functions.
    	(c_parser_oacc_all_clauses): Add dtype_mask, and scan_dtype formal
    	parameters.  Adjust all users.  Handle PRAGMA_OACC_CLAUSE_AUTO,
    	PRAGMA_OACC_CLAUSE_BIND, PRAGMA_OMP_CLAUSE_DEFAULT,
    	PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT,
    	PRAGMA_OACC_CLAUSE_DEVICE_TYPE, PRAGMA_OACC_CLAUSE_GANG,
    	PRAGMA_OACC_CLAUSE_INDEPENDENT, PRAGMA_OACC_CLAUSE_LINK,
    	PRAGMA_OACC_CLAUSE_NOHOST, PRAGMA_OACC_CLAUSE_SEQ,
    	PRAGMA_OACC_CLAUSE_TILE, PRAGMA_OACC_CLAUSE_USE_DEVICE,
    	PRAGMA_OACC_CLAUSE_VECTOR, PRAGMA_OACC_CLAUSE_WORKER.  Don't
    	handle PRAGMA_OACC_CLAUSE_SELF.
    	(c_parser_oacc_declare, oacc_split_loop_clauses)
    	(c_parser_oacc_host_data, c_parser_oacc_routine)
    	(c_finish_oacc_routine): New functions.
    	(c_parser_oacc_enter_exit_data): Change error reporting.
    	(c_parser_oacc_loop): Add mask, and cclauses formal parameters.
    	Handle PRAGMA_OACC_CLAUSE_DEVICE_TYPE, PRAGMA_OACC_CLAUSE_GANG,
    	PRAGMA_OACC_CLAUSE_WORKER, PRAGMA_OACC_CLAUSE_VECTOR,
    	PRAGMA_OACC_CLAUSE_AUTO, PRAGMA_OACC_CLAUSE_INDEPENDENT,
    	PRAGMA_OACC_CLAUSE_SEQ, PRAGMA_OACC_CLAUSE_TILE,
    	PRAGMA_OACC_CLAUSE_PRIVATE.
    	(c_parser_oacc_kernels): Support combined directives.  Handle
    	PRAGMA_OACC_CLAUSE_DEFAULT, PRAGMA_OACC_CLAUSE_DEVICE_TYPE.
    	(c_parser_oacc_parallel): Likewise.  Handle
    	PRAGMA_OACC_CLAUSE_PRIVATE.
    	(c_parser_omp_construct): Handle PRAGMA_OACC_ATOMIC,
    	PRAGMA_OACC_HOST_DATA.
    	(c_parser_oacc_update): Handle PRAGMA_OACC_CLAUSE_DEVICE_TYPE,
    	PRAGMA_OACC_CLAUSE_WAIT.  Don't handle PRAGMA_OACC_CLAUSE_SELF.
    	* c-tree.h (c_finish_oacc_host_data): New prototype.
    	* c-typeck.c (c_finish_oacc_host_data): New function.
    	(c_finish_omp_clauses): Add oacc formal parameter.  Adjust all
    	users.  Handle OMP_CLAUSE_INDEPENDENT, OMP_CLAUSE_USE_DEVICE,
    	OMP_CLAUSE_BIND, OMP_CLAUSE_NOHOST, OMP_CLAUSE_TILE.
    	gcc/cp/
    	* cp-gimplify.c (cxx_omp_clause_copy_ctor): Handle OMP_CLAUSE_MAP.
    	* cp-tree.h (finish_oacc_host_data): New prototype.
    	* parser.h (cp_parser): Add oacc_routine, named_oacc_routines
    	members.
    	* parser.c (cp_parser_new): Initialize them.
    	(cp_ensure_no_omp_declare_simd, cp_parser_init_declarator)
    	(cp_parser_late_return_type_opt, cp_parser_member_declaration)
    	(cp_parser_function_definition_from_specifiers_and_declarator)
    	(cp_parser_save_member_function_body, cp_parser_omp_declare_simd)
    	(cp_parser_omp_declare, cp_parser_pragma): Extend for OpenACC routines.
    	(cp_finalize_oacc_routine)
    	(cp_parser_oacc_routine_check_parallelism, cp_parser_oacc_routine)
    	(cp_parser_late_parsing_oacc_routine): New functions.
    	(cp_parser_omp_clause_name): Add consume_token formal parameter.
    	Handle "auto", "bind", "device_resident", "device_type", "dtype",
    	"gang", "independent", "link", "nohost", "seq", "tile",
    	"use_device", "vector", "worker".
    	(cp_parser_oacc_data_clause): Handle
    	PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT, PRAGMA_OACC_CLAUSE_LINK.
    	Don't handle PRAGMA_OACC_CLAUSE_SELF.
    	(cp_parser_oacc_clause_vector_length)
    	(cp_parser_omp_clause_num_gangs, cp_parser_omp_clause_num_threads)
    	(cp_parser_omp_clause_num_workers): Replace functions by...
    	(require_positive_expr, cp_parser_omp_positive_int_clause):
    	... these new functions.  Adjust all users.
    	(cp_parser_oacc_shape_clause, cp_parser_oacc_clause_device_type)
    	(cp_parser_oacc_clause_tile, cp_parser_oacc_clause_bind): New
    	functions.
    	(cp_parser_oacc_wait_list): Change an error message.
    	(cp_parser_omp_clause_default): Add is_omp formal parameter.
    	(cp_parser_omp_clause_untied, cp_parser_omp_clause_branch):
    	Replace functions by...
    	(cp_parser_omp_simple_clause): ... this new function.  Adjust all
    	users.
    	(cp_parser_oacc_all_clauses): Add dtype_mask, and scan_dtype
    	formal parameters.  Adjust all users.  Handle
    	PRAGMA_OACC_CLAUSE_AUTO, PRAGMA_OACC_CLAUSE_BIND,
    	PRAGMA_OMP_CLAUSE_DEFAULT, PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT,
    	PRAGMA_OACC_CLAUSE_DEVICE_TYPE, PRAGMA_OACC_CLAUSE_GANG,
    	PRAGMA_OACC_CLAUSE_INDEPENDENT, PRAGMA_OACC_CLAUSE_LINK,
    	PRAGMA_OACC_CLAUSE_NOHOST, PRAGMA_OACC_CLAUSE_PRIVATE,
    	PRAGMA_OACC_CLAUSE_SEQ, PRAGMA_OACC_CLAUSE_TILE,
    	PRAGMA_OACC_CLAUSE_USE_DEVICE, PRAGMA_OACC_CLAUSE_VECTOR,
    	PRAGMA_OACC_CLAUSE_WORKER.  Don't handle PRAGMA_OACC_CLAUSE_SELF.
    	(cp_parser_oacc_host_data, oacc_split_loop_clauses):
    	(cp_parser_oacc_enter_exit_data): Change error reporting.
    	(cp_parser_oacc_loop): Add p_name, mask, and cclauses formal
    	parameters.  Handle PRAGMA_OACC_CLAUSE_DEVICE_TYPE,
    	PRAGMA_OACC_CLAUSE_GANG, PRAGMA_OACC_CLAUSE_PRIVATE
    	PRAGMA_OACC_CLAUSE_VECTOR, PRAGMA_OACC_CLAUSE_WORKER,
    	PRAGMA_OACC_CLAUSE_AUTO, PRAGMA_OACC_CLAUSE_INDEPENDENT,
    	PRAGMA_OACC_CLAUSE_SEQ, PRAGMA_OACC_CLAUSE_TILE.  Support combined
    	directives.
    	(cp_parser_oacc_kernels, cp_parser_oacc_parallel): Replace
    	functions by...
    	(cp_parser_oacc_parallel_kernels): ... this new function.  Adjust
    	all users.  Support combined directives.  For "kernels", handle
    	PRAGMA_OACC_CLAUSE_DEFAULT, PRAGMA_OACC_CLAUSE_DEVICE_TYPE.  For
    	"parallel", handle PRAGMA_OACC_CLAUSE_DEFAULT,
    	PRAGMA_OACC_CLAUSE_DEVICE_TYPE, PRAGMA_OACC_CLAUSE_GANG,
    	PRAGMA_OACC_CLAUSE_PRIVATE.
    	(cp_parser_oacc_update): Handle PRAGMA_OACC_CLAUSE_DEVICE_TYPE.
    	Don't handle PRAGMA_OACC_CLAUSE_SELF.
    	(cp_parser_omp_construct): Handle PRAGMA_OACC_ATOMIC,
    	PRAGMA_OACC_HOST_DATA.
    	(cp_parser_pragma): Handle PRAGMA_OACC_ATOMIC,
    	PRAGMA_OACC_HOST_DATA, PRAGMA_OACC_ROUTINE.  Add pragma context
    	checking for PRAGMA_OACC_ENTER_DATA, PRAGMA_OACC_EXIT_DATA,
    	PRAGMA_OACC_UPDATE, PRAGMA_OACC_WAIT.
    	* pt.c (tsubst_omp_clauses): Handle OMP_CLAUSE_NUM_GANGS,
    	OMP_CLAUSE_NUM_WORKERS, OMP_CLAUSE_VECTOR_LENGTH, OMP_CLAUSE_GANG,
    	OMP_CLAUSE_WORKER, OMP_CLAUSE_VECTOR, OMP_CLAUSE_ASYNC,
    	OMP_CLAUSE_WAIT, OMP_CLAUSE_INDEPENDENT, OMP_CLAUSE_AUTO,
    	OMP_CLAUSE_SEQ, OMP_CLAUSE_TILE.
    	(tsubst_expr): Handle OACC_PARALLEL, OACC_KERNELS, OACC_LOOP,
    	OACC_DATA, OACC_ENTER_DATA, OACC_EXIT_DATA, OACC_UPDATE.
    	* semantics.c (finish_omp_clauses): Add oacc formal parameter.
    	Adjust all users.  Handle OMP_CLAUSE_GANG, OMP_CLAUSE_VECTOR,
    	OMP_CLAUSE_WORKER, OMP_CLAUSE_NUM_GANGS, OMP_CLAUSE_NUM_WORKERS,
    	OMP_CLAUSE_USE_DEVICE, OMP_CLAUSE_AUTO, OMP_CLAUSE_INDEPENDENT,
    	OMP_CLAUSE_SEQ, OMP_CLAUSE_BIND, OMP_CLAUSE_NOHOST,
    	OMP_CLAUSE_TILE.
    	(finish_oacc_host_data): New function.
    	gcc/fortran/
    	* dump-parse-tree.c (show_namespace): Rewrite handling of OpenACC
    	declare.
    	* gfortran.h (gfc_statement): Add ST_OACC_ATOMIC,
    	ST_OACC_END_ATOMIC.
    	(gfc_omp_clauses): Add routine_bind, dtype, dtype_clauses, nohost,
    	acc_collapse, bind, num_gangs, num_workers, vector_length, tile
    	members.
    	(gfc_oacc_declare, gfc_oacc_routine_name): New typedefs.
    	(gfc_get_oacc_declare, gfc_get_oacc_routine_name): New macros.
    	(gfc_namespace): Add oacc_declare, oacc_routine_clauses,
    	oacc_routine_names, oacc_routine members, remove
    	oacc_declare_clauses member.
    	(gfc_exec_op): Add EXEC_OACC_ROUTINE, EXEC_OACC_ATOMIC,
    	EXEC_OACC_DECLARE.
    	(gfc_code): Add oacc_declare member.
    	(gfc_free_oacc_declares, insert_oacc_declare): New prototypes.
    	* match.h (gfc_match_oacc_atomic): New prototype.
    	* openmp.c (OMP_CLAUSE_HOST_SELF): Rename to...
    	(OMP_CLAUSE_HOST): ... this.  Adjust all users.
    	(OMP_CLAUSE_BIND, OMP_CLAUSE_NOHOST, OMP_CLAUSE_DEVICE_TYPE): New
    	macros.
    	(gfc_free_oacc_declares): New function.
    	(gfc_match_omp_map_clause): Add allow_sections formal parameter.
    	Adjust all users.
    	(gfc_match_omp_clauses): Add dtype_mask formal parameter.  Adjust
    	all users.  Change handling of OMP_CLAUSE_VECTOR_LENGTH,
    	OMP_CLAUSE_NUM_GANGS, OMP_CLAUSE_NUM_WORKERS, OMP_CLAUSE_TILE,
    	OMP_CLAUSE_DEFAULT, OMP_CLAUSE_COLLAPSE.  Handle OMP_CLAUSE_BIND,
    	OMP_CLAUSE_NOHOST, OMP_CLAUSE_DEVICE_TYPE.
    	(OACC_PARALLEL_CLAUSES, OACC_KERNELS_CLAUSES, OACC_LOOP_CLAUSES)
    	(OACC_UPDATE_CLAUSES): Add OMP_CLAUSE_DEVICE_TYPE.
    	(OACC_ROUTINE_CLAUSES, OACC_LOOP_CLAUSE_DEVICE_TYPE_MASK)
    	(OACC_KERNELS_CLAUSE_DEVICE_TYPE_MASK)
    	(OACC_PARALLEL_CLAUSE_DEVICE_TYPE_MASK)
    	(OACC_ROUTINE_CLAUSE_DEVICE_TYPE_MASK)
    	(OACC_UPDATE_CLAUSE_DEVICE_TYPE_MASK): New macros.
    	(gfc_match_oacc_declare, gfc_match_oacc_routine): Rewrite
    	functions.
    	(gfc_match_oacc_update): Add error reporting.
    	(gfc_match_omp_atomic, gfc_match_oacc_atomic): New wrapper
    	functions around...
    	(gfc_match_omp_oacc_atomic): ... this new function.
    	(check_array_not_assumed): Remove pointer check.
    	(oacc_code_to_statement): Handle EXEC_OACC_ATOMIC.
    	(resolve_oacc_loop_blocks): Don't error out for combined OpenACC
    	gang, worker, and vector clauses.
    	(resolve_oacc_cache): Remove function.
    	(gfc_resolve_oacc_declare): Rewrite function.
    	(gfc_resolve_oacc_directive): Handle EXEC_OACC_ATOMIC.  Don't
    	handle EXEC_OACC_CACHE.
    	* parse.c (decode_oacc_directive): Handle "atomic", "end atomic".
    	(case_exec_markers): Add ST_OACC_ATOMIC.
    	(case_decl): Add ST_OACC_DECLARE.
    	(gfc_ascii_statement): Handle ST_OACC_ATOMIC, ST_OACC_END_ATOMIC.
    	(verify_st_order, parse_spec): Remove handling of ST_OACC_DECLARE.
    	(parse_omp_atomic): Rename to...
    	(parse_omp_oacc_atomic): ... this new function.  Add omp_p formal
    	parameter.  Adjust all users.
    	(parse_executable): Handle ST_OACC_ATOMIC.
    	(parse_progunit): Remove handling of OpenACC declare.
    	(is_oacc): Handle EXEC_OACC_ROUTINE.
    	* parse.h (gfc_state_data): Add ext.oacc_declare member.  Remove
    	ext.oacc_declare_clauses member.
    	* resolve.c (gfc_resolve_blocks): Handle EXEC_OACC_ATOMIC,
    	EXEC_OACC_ROUTINE, EXEC_OACC_DECLARE.
    	(gfc_resolve_code): Handle EXEC_OACC_ATOMIC, EXEC_OACC_DECLARE.
    	* st.c (gfc_free_statement): Handle EXEC_OACC_DECLARE,
    	EXEC_OACC_ROUTINE, EXEC_OACC_ATOMIC.
    	* trans-decl.c (find_end, insert_oacc_declare): New functions.
    	(gfc_generate_function_code): Change handling of OpenACC declare.
    	* trans-openmp.c (gfc_omp_clause_copy_ctor): Handle
    	OMP_CLAUSE_REDUCTION.
    	(gfc_trans_omp_clauses): Add appropriate, generate OMP_CLAUSE_SEQ
    	(instead of OMP_CLAUSE_ORDERED), OMP_CLAUSE_AUTO, or
    	OMP_CLAUSE_TILE.
    	(gfc_trans_oacc_combined_directive): Don't set
    	OACC_KERNELS_COMBINED, and OACC_PARALLEL_COMBINED.
    	(gfc_trans_oacc_declare): Rewrite function.
    	(gfc_trans_oacc_directive): Handle EXEC_OACC_ATOMIC,
    	EXEC_OACC_DECLARE.
    	* trans-stmt.c (gfc_trans_block_construct): Change handling of
    	OpenACC declare.
    	* trans.c (trans_code): Handle EXEC_OACC_ATOMIC,
    	EXEC_OACC_DECLARE.
    	gcc/
    	* gimplify.c (gimplify_scan_omp_clauses)
    	(gimplify_adjust_omp_clauses): Handle OMP_CLAUSE_TILE.
    	(gimplify_expr): Don't verify OACC_KERNELS_COMBINED, and
    	OACC_PARALLEL_COMBINED.
    	* omp-low.c (scan_sharing_clauses): Handle OMP_CLAUSE_BIND,
    	OMP_CLAUSE_NOHOST, OMP_CLAUSE_TILE.
    	(check_omp_nesting_restrictions): Support GIMPLE_OMP_ATOMIC_LOAD,
    	GIMPLE_OMP_ATOMIC_STORE inside OpenACC contexts.
    	* tree-core.h (omp_clause_code): Add OMP_CLAUSE_BIND,
    	OMP_CLAUSE_NOHOST, OMP_CLAUSE_TILE, OMP_CLAUSE_DEVICE_TYPE.
    	* tree.c (omp_clause_num_ops, omp_clause_code_name, walk_tree_1):
    	Update for these.
    	* tree-pretty-print.c (dump_omp_clause): Handle OMP_CLAUSE_TILE.
    	* tree.h (OACC_KERNELS_COMBINED, OACC_PARALLEL_COMBINED): Remove
    	macros.
    	(OMP_CLAUSE_BIND_NAME, OMP_CLAUSE_TILE_LIST)
    	(OMP_CLAUSE_DEVICE_TYPE_DEVICES, OMP_CLAUSE_DEVICE_TYPE_CLAUSES):
    	Add macros.
    	gcc/testsuite/
    	* c-c++-common/goacc-gomp/nesting-1.c: Update.
    	* c-c++-common/goacc-gomp/nesting-fail-1.c: Likewise.
    	* c-c++-common/goacc/asyncwait-1.c: Likewise.
    	* c-c++-common/goacc/data-2.c: Likewise.
    	* c-c++-common/goacc/reduction-1.c: Likewise.
    	* c-c++-common/goacc/reduction-2.c: Likewise.
    	* c-c++-common/goacc/reduction-3.c: Likewise.
    	* c-c++-common/goacc/reduction-4.c: Likewise.
    	* gfortran.dg/goacc/cache-1.f95: Likewise.
    	* gfortran.dg/goacc/coarray.f95: Likewise.
    	* gfortran.dg/goacc/coarray_2.f90: Likewise.
    	* gfortran.dg/goacc/combined_loop.f90: Likewise.
    	* gfortran.dg/goacc/cray.f95: Likewise.
    	* gfortran.dg/goacc/declare-1.f95: Likewise.
    	* gfortran.dg/goacc/host_data-tree.f95: Likewise.
    	* gfortran.dg/goacc/loop-1.f95: Likewise.
    	* gfortran.dg/goacc/loop-2.f95: Likewise.
    	* gfortran.dg/goacc/parameter.f95: Likewise.
    	* c-c++-common/goacc/loop-1.c: Enable for C++.
    	* c-c++-common/goacc/kernels-1.c: Rename to...
    	* c-c++-common/goacc/kernels-empty.c: ... this new file.
    	* c-c++-common/goacc/parallel-1.c: Rename to...
    	* c-c++-common/goacc/parallel-empty.c: ... this new file.
    	* c-c++-common/goacc/declare-1.c: New file.
    	* c-c++-common/goacc/declare-2.c: Likewise.
    	* c-c++-common/goacc/dtype-1.c: Likewise.
    	* c-c++-common/goacc/dtype-2.c: Likewise.
    	* c-c++-common/goacc/host_data-1.c: Likewise.
    	* c-c++-common/goacc/host_data-2.c: Likewise.
    	* c-c++-common/goacc/host_data-3.c: Likewise.
    	* c-c++-common/goacc/host_data-4.c: Likewise.
    	* c-c++-common/goacc/kernels-eternal.c: Likewise.
    	* c-c++-common/goacc/kernels-noreturn.c: Likewise.
    	* c-c++-common/goacc/parallel-eternal.c: Likewise.
    	* c-c++-common/goacc/parallel-noreturn.c: Likewise.
    	* c-c++-common/goacc/routine-1.c: Likewise.
    	* c-c++-common/goacc/routine-2.c: Likewise.
    	* c-c++-common/goacc/routine-3.c: Likewise.
    	* c-c++-common/goacc/routine-4.c: Likewise.
    	* c-c++-common/goacc/tile.c: Likewise.
    	* g++.dg/goacc/template-reduction.C: Likewise.
    	* g++.dg/goacc/template.C: Likewise.
    	* gfortran.dg/goacc/declare-2.f95: Likewise.
    	* gfortran.dg/goacc/default.f95: Likewise.
    	* gfortran.dg/goacc/dtype-1.f95: Likewise.
    	* gfortran.dg/goacc/dtype-2.f95: Likewise.
    	* gfortran.dg/goacc/modules.f95: Likewise.
    	* gfortran.dg/goacc/update.f95: Likewise.
    	include/
    	* gomp-constants.h (gomp_map_kind): Add GOMP_MAP_DEVICE_RESIDENT,
    	GOMP_MAP_LINK.
    	libgomp/
    	* oacc-mem.c (update_dev_host): Add missing initialization.
    	* oacc-ptx.h (GOMP_ATOMIC_PTX): New macro.
    	* plugin/plugin-nvptx.c (link_ptx): Link it in.
    	* testsuite/libgomp.oacc-c-c++-common/clauses-1.c: Update.
    	* testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/data-3.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-69.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-70.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-71.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-72.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-73.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-74.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-75.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-76.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-77.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-78.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-79.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-80.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-81.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-82.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/lib-83.c: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-2.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-3.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-4-2.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-4.f90: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/subr.h: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/subr.ptx: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/timer.h: Remove file.
    	* testsuite/libgomp.oacc-c-c++-common/kernels-1.c: Move common
    	code from here...
    	* testsuite/libgomp.oacc-c-c++-common/parallel-1.c: ..., and
    	here...
    	* testsuite/libgomp.oacc-c-c++-common/data-clauses.h: ... into
    	this new file.
    	* testsuite/libgomp.oacc-c++/template-reduction.C: New test.
    	* testsuite/libgomp.oacc-c-c++-common/atomic_capture-1.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/atomic_capture-2.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/atomic_update-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/routine-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/routine-2.c: Likewise.
    	* testsuite/libgomp.oacc-fortran/atomic_capture-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/atomic_update-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/cache-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/clauses-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/declare-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/lib-12.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/lib-13.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/lib-14.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/lib-15.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/routine-5.f90: Likewise.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@223007 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp                                 |   28 +
 gcc/c-family/ChangeLog.gomp                        |   26 +
 gcc/c-family/c-common.c                            |    3 +-
 gcc/c-family/c-common.h                            |    2 +
 gcc/c-family/c-omp.c                               |  105 ++
 gcc/c-family/c-pragma.c                            |    4 +
 gcc/c-family/c-pragma.h                            |   14 +-
 gcc/c/ChangeLog.gomp                               |   71 +
 gcc/c/c-parser.c                                   | 1353 ++++++++++++----
 gcc/c/c-tree.h                                     |    3 +-
 gcc/c/c-typeck.c                                   |  112 +-
 gcc/cp/ChangeLog.gomp                              |   93 ++
 gcc/cp/cp-gimplify.c                               |    3 +-
 gcc/cp/cp-tree.h                                   |    3 +-
 gcc/cp/parser.c                                    | 1382 +++++++++++++----
 gcc/cp/parser.h                                    |    4 +
 gcc/cp/pt.c                                        |   43 +-
 gcc/cp/semantics.c                                 |  151 +-
 gcc/fortran/ChangeLog.gomp                         |   94 ++
 gcc/fortran/dump-parse-tree.c                      |   12 +-
 gcc/fortran/gfortran.h                             |   50 +-
 gcc/fortran/match.h                                |    1 +
 gcc/fortran/openmp.c                               |  581 +++++--
 gcc/fortran/parse.c                                |   65 +-
 gcc/fortran/parse.h                                |    2 +-
 gcc/fortran/resolve.c                              |    5 +
 gcc/fortran/st.c                                   |    7 +
 gcc/fortran/trans-decl.c                           |   62 +-
 gcc/fortran/trans-openmp.c                         |   66 +-
 gcc/fortran/trans-stmt.c                           |    7 +-
 gcc/fortran/trans-stmt.h                           |    2 +-
 gcc/fortran/trans.c                                |    2 +
 gcc/gimplify.c                                     |   16 +-
 gcc/omp-low.c                                      |   11 +-
 gcc/testsuite/ChangeLog.gomp                       |   58 +
 gcc/testsuite/c-c++-common/goacc-gomp/nesting-1.c  |   46 +
 .../c-c++-common/goacc-gomp/nesting-fail-1.c       |   25 -
 gcc/testsuite/c-c++-common/goacc/asyncwait-1.c     |    4 +-
 gcc/testsuite/c-c++-common/goacc/data-2.c          |   12 +-
 gcc/testsuite/c-c++-common/goacc/declare-1.c       |   84 +
 gcc/testsuite/c-c++-common/goacc/declare-2.c       |   67 +
 gcc/testsuite/c-c++-common/goacc/dtype-1.c         |  113 ++
 gcc/testsuite/c-c++-common/goacc/dtype-2.c         |   31 +
 gcc/testsuite/c-c++-common/goacc/host_data-1.c     |   14 +
 gcc/testsuite/c-c++-common/goacc/host_data-2.c     |   14 +
 gcc/testsuite/c-c++-common/goacc/host_data-3.c     |   16 +
 gcc/testsuite/c-c++-common/goacc/host_data-4.c     |   15 +
 .../goacc/{kernels-1.c => kernels-empty.c}         |    0
 gcc/testsuite/c-c++-common/goacc/kernels-eternal.c |   11 +
 .../c-c++-common/goacc/kernels-noreturn.c          |   12 +
 gcc/testsuite/c-c++-common/goacc/loop-1.c          |    2 -
 .../goacc/{parallel-1.c => parallel-empty.c}       |    0
 .../c-c++-common/goacc/parallel-eternal.c          |   11 +
 .../c-c++-common/goacc/parallel-noreturn.c         |   12 +
 gcc/testsuite/c-c++-common/goacc/reduction-1.c     |   25 +-
 gcc/testsuite/c-c++-common/goacc/reduction-2.c     |   22 +-
 gcc/testsuite/c-c++-common/goacc/reduction-3.c     |   22 +-
 gcc/testsuite/c-c++-common/goacc/reduction-4.c     |   40 +-
 gcc/testsuite/c-c++-common/goacc/routine-1.c       |   35 +
 gcc/testsuite/c-c++-common/goacc/routine-2.c       |   36 +
 gcc/testsuite/c-c++-common/goacc/routine-3.c       |   52 +
 gcc/testsuite/c-c++-common/goacc/routine-4.c       |   87 ++
 gcc/testsuite/c-c++-common/goacc/tile.c            |   26 +
 gcc/testsuite/g++.dg/goacc/template-reduction.C    |  100 ++
 gcc/testsuite/g++.dg/goacc/template.C              |  131 ++
 gcc/testsuite/gfortran.dg/goacc/cache-1.f95        |    1 -
 gcc/testsuite/gfortran.dg/goacc/coarray.f95        |    2 +-
 gcc/testsuite/gfortran.dg/goacc/coarray_2.f90      |    1 +
 gcc/testsuite/gfortran.dg/goacc/combined_loop.f90  |    2 +-
 gcc/testsuite/gfortran.dg/goacc/cray.f95           |    1 -
 gcc/testsuite/gfortran.dg/goacc/declare-1.f95      |    3 +-
 gcc/testsuite/gfortran.dg/goacc/declare-2.f95      |   44 +
 gcc/testsuite/gfortran.dg/goacc/default.f95        |   17 +
 gcc/testsuite/gfortran.dg/goacc/dtype-1.f95        |  161 ++
 gcc/testsuite/gfortran.dg/goacc/dtype-2.f95        |   39 +
 gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95 |    2 +-
 gcc/testsuite/gfortran.dg/goacc/loop-1.f95         |    1 -
 gcc/testsuite/gfortran.dg/goacc/loop-2.f95         |   26 +-
 gcc/testsuite/gfortran.dg/goacc/modules.f95        |   55 +
 gcc/testsuite/gfortran.dg/goacc/parameter.f95      |    1 -
 gcc/testsuite/gfortran.dg/goacc/update.f95         |    5 +
 gcc/tree-core.h                                    |   14 +-
 gcc/tree-pretty-print.c                            |    6 +
 gcc/tree.c                                         |   13 +-
 gcc/tree.h                                         |   21 +-
 include/ChangeLog.gomp                             |    6 +
 include/gomp-constants.h                           |    4 +
 libgomp/ChangeLog.gomp                             |   70 +
 libgomp/oacc-mem.c                                 |    3 +
 libgomp/oacc-ptx.h                                 |   28 +
 libgomp/plugin/plugin-nvptx.c                      |   10 +
 .../libgomp.oacc-c++/template-reduction.C          |  102 ++
 .../libgomp.oacc-c-c++-common/atomic_capture-1.c   |  866 +++++++++++
 .../libgomp.oacc-c-c++-common/atomic_capture-2.c   | 1626 ++++++++++++++++++++
 .../libgomp.oacc-c-c++-common/atomic_update-1.c    |  760 +++++++++
 .../libgomp.oacc-c-c++-common/clauses-1.c          |   26 +
 .../testsuite/libgomp.oacc-c-c++-common/data-2.c   |   44 +-
 .../testsuite/libgomp.oacc-c-c++-common/data-3.c   |   18 +-
 .../{parallel-1.c => data-clauses.h}               |   32 +-
 .../libgomp.oacc-c-c++-common/kernels-1.c          |  182 +--
 .../testsuite/libgomp.oacc-c-c++-common/lib-69.c   |   70 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-70.c   |   79 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-71.c   |   55 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-72.c   |   60 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-73.c   |   64 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-74.c   |   91 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-75.c   |   89 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-76.c   |   88 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-77.c   |   91 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-78.c   |   91 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-79.c   |   91 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-80.c   |   95 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-81.c   |  106 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-82.c   |   43 +-
 .../testsuite/libgomp.oacc-c-c++-common/lib-83.c   |   22 +-
 .../libgomp.oacc-c-c++-common/parallel-1.c         |  204 +--
 .../libgomp.oacc-c-c++-common/routine-1.c          |   40 +
 .../libgomp.oacc-c-c++-common/routine-2.c          |   41 +
 libgomp/testsuite/libgomp.oacc-c-c++-common/subr.h |   44 +-
 .../testsuite/libgomp.oacc-c-c++-common/subr.ptx   |  222 +--
 .../testsuite/libgomp.oacc-c-c++-common/timer.h    |  103 --
 .../libgomp.oacc-fortran/atomic_capture-1.f90      |  784 ++++++++++
 .../libgomp.oacc-fortran/atomic_update-1.f90       |  338 ++++
 libgomp/testsuite/libgomp.oacc-fortran/cache-1.f90 |   26 +
 .../testsuite/libgomp.oacc-fortran/clauses-1.f90   |  290 ++++
 libgomp/testsuite/libgomp.oacc-fortran/data-1.f90  |  231 ++-
 libgomp/testsuite/libgomp.oacc-fortran/data-2.f90  |   50 +
 libgomp/testsuite/libgomp.oacc-fortran/data-3.f90  |   34 +-
 .../testsuite/libgomp.oacc-fortran/data-4-2.f90    |   19 +-
 libgomp/testsuite/libgomp.oacc-fortran/data-4.f90  |   19 +-
 .../testsuite/libgomp.oacc-fortran/declare-1.f90   |  229 +++
 libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90  |   24 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-13.f90  |   28 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-14.f90  |   79 +
 libgomp/testsuite/libgomp.oacc-fortran/lib-15.f90  |   52 +
 .../testsuite/libgomp.oacc-fortran/routine-5.f90   |   27 +
 136 files changed, 11216 insertions(+), 2501 deletions(-)

This is the patches as posted, including the following last-minute typo
fix:

--- a/gcc/testsuite/c-c++-common/goacc/host_data-4.c
+++ b/gcc/testsuite/c-c++-common/goacc/host_data-4.c
@@ -6,7 +6,7 @@ int main (int argc, char* argv[])
 
   #pragma acc enter data copyin (x)
   /* Specifying an array index is not valid for host_data/use_device.  */
-  #pragma acc host_data use_device (x[4]) /* { dg-error "expected \\\')' before '\\\[' token" } */
+  #pragma acc host_data use_device (x[4]) /* { dg-error "expected '\\\)' before '\\\[' token" } */
     ;
   #pragma acc exit data delete (x)
 


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gomp4] Assorted OpenACC changes (was: Next set of OpenACC changes)
  2015-05-11 16:35 ` [gomp4] Next set of OpenACC changes Thomas Schwinge
@ 2015-05-13 20:57   ` Thomas Schwinge
  2015-05-14  8:37     ` Jakub Jelinek
  0 siblings, 1 reply; 11+ messages in thread
From: Thomas Schwinge @ 2015-05-13 20:57 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek
  Cc: Bernd Schmidt, Cesar Philippidis, Chung-Lin Tang, James Norris,
	Joseph Myers, Julian Brown, Tom de Vries


[-- Attachment #1.1: Type: text/plain, Size: 18939 bytes --]

Hi!

On Mon, 11 May 2015 18:35:12 +0200, I wrote:
> On Tue, 05 May 2015 10:54:02 +0200, I wrote:
> > In follow-up messages, I'll be posting the separated parts (for easier
> > review) of a next set of OpenACC changes that we'd like to commit.
> > ChangeLog updates not yet written; will do that before commit, obviously.
> 
> In order for us to be able to make progress with staging our other
> OpenACC changes in gomp-4_0-branch, I have now committed to
> gomp-4_0-branch r223007, which is these patches as posted plus a tiny
> last-minute typo fix (see below), and we shall then work on addressing
> the review comments already provided (thanks!) (as well as those which I
> found myself, upon reviewing our changes), before later re-submitting for
> trunk.

In a similar vein, I have now committed the following to gomp-4_0-branch
in r223178.  This is not meant to be integrated into trunk as-is: there
are incompatible libgomp ABI changes, for example.  We'd still appreciate
any review comments, of course.

To avoid running into mailing list size limits, the patch is attached as
gcc-gomp-4_0-branch-r223178.patch.gz, so here's just the commit log:

commit 631438a29e8d275570ba2e881ffcdea2dfe7b5e1
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Wed May 13 20:25:48 2015 +0000

    Assorted OpenACC changes
    
    	gcc/ada/
    	* gcc-interface/utils.c (DEF_FUNCTION_TYPE_VAR_11): Remove.
    	(DEF_FUNCTION_TYPE_VAR_12): New macro.
    	gcc/c-family/
    	* c-common.c (DEF_FUNCTION_TYPE_VAR_11): Remove.
    	(DEF_FUNCTION_TYPE_VAR_12): New macro.
    	gcc/c/
    	* c-parser.c (c_parser_oacc_data_clause, oacc_split_loop_clauses)
    	(c_parser_oacc_parallel): Handle PRAGMA_OACC_CLAUSE_FIRSTPRIVATE.
    	(c_parser_oacc_all_clauses): Update handling of
    	PRAGMA_OACC_CLAUSE_FIRSTPRIVATE.
    	* c-typeck.c (c_finish_omp_clauses): Add error checking for
    	GOMP_MAP_FORCE_TO_GANGLOCAL.
    	gcc/cp/
    	* parser.c (cp_parser_oacc_data_clause)
    	(cp_parser_oacc_all_clauses, oacc_split_loop_clauses)
    	(cp_parser_oacc_parallel): Handle PRAGMA_OACC_CLAUSE_FIRSTPRIVATE.
    	* semantics.c (finish_omp_clauses): Add error checking for
    	GOMP_MAP_FORCE_TO_GANGLOCAL.
    	gcc/fortran/
    	* f95-lang.c (DEF_FUNCTION_TYPE_VAR_11): Remove.
    	(DEF_FUNCTION_TYPE_VAR_12): New macro.
    	* gfortran.h (gfc_omp_map_op): Add OMP_MAP_GANGLOCAL,
    	OMP_MAP_FORCE_TO_GANGLOCAL.
    	* trans-openmp.c (gfc_trans_omp_clauses): Handle them.
    	* openmp.c (gfc_match_omp_clauses): Update handling of
    	OMP_CLAUSE_FIRSTPRIVATE.
    	* types.def
    	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR):
    	Remove.
    	(BT_FN_INT_INT_INT_INT)
    	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_SIZE_INT_INT_VAR):
    	New function types.
    	gcc/jit/
    	* jit-builtins.c (DEF_FUNCTION_TYPE_VAR_11): Remove.
    	(DEF_FUNCTION_TYPE_VAR_12): New macro.
    	* jit-builtins.h (DEF_FUNCTION_TYPE_VAR_11): Remove.
    	(DEF_FUNCTION_TYPE_VAR_12): New macro.
    	gcc/lto/
    	* lto-lang.c (DEF_FUNCTION_TYPE_VAR_11): Remove.
    	(DEF_FUNCTION_TYPE_VAR_12): New macro.
    	gcc/testsuite/
    	* c-c++-common/goacc/dtype-1.c: Update.
    	* c-c++-common/goacc/dtype-2.c: Likewise.
    	* c-c++-common/goacc/host_data-1.c: Likewise.
    	* c-c++-common/goacc/host_data-2.c: Likewise.
    	* c-c++-common/goacc/host_data-3.c: Likewise.
    	* c-c++-common/goacc/host_data-4.c: Likewise.
    	* c-c++-common/goacc/sb-3.c: Likewise.
    	* c-c++-common/goacc/tile.c: Likewise.
    	* g++.dg/goacc/template-reduction.C: Likewise.
    	* g++.dg/goacc/template.C: Likewise.
    	* gfortran.dg/goacc/coarray.f95: Likewise.
    	* gfortran.dg/goacc/dtype-1.f95: Likewise.
    	* gfortran.dg/goacc/host_data-tree.f95: Likewise.
    	* gfortran.dg/goacc/list.f95: Likewise.
    	* gfortran.dg/goacc/loop-tree-1.f90: Likewise.
    	* gfortran.dg/goacc/parallel-tree.f95: Likewise.
    	* c-c++-common/goacc/executeables-1.c: New file.
    	* c-c++-common/goacc/firstprivate.c: Likewise.
    	* c-c++-common/goacc/private-reduction-1.c: Likewise.
    	* g++.dg/goacc/loop-1.c: Likewise.
    	* g++.dg/goacc/loop-2.c: Likewise.
    	* g++.dg/goacc/loop-3.c: Likewise.
    	* gcc.dg/goacc/sb-1.c: Likewise.
    	* gcc.dg/goacc/sb-2.c: Likewise.
    	* gcc.dg/goacc/sb-3.c: Likewise.
    	* gfortran.dg/goacc/firstprivate-1.f95: Likewise.
    	gcc/
    	* builtin-types.def
    	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR):
    	Remove.
    	(BT_FN_INT_INT_INT_INT)
    	(BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_SIZE_INT_INT_VAR):
    	New function types.
    	* builtins.c (expand_oacc_builtin, expand_oacc_ganglocal_ptr): New
    	functions.
    	(expand_builtin, is_simple_builtin): Use them.
    	* config/nvptx/mkoffload.c (process): Protect
    	GOMP_offload_register prototype from C++ name mangling.
    	* config/nvptx/nvptx.c (nvptx_file_start): Print declaration of
    	sdata.
    	* config/nvptx/nvptx.md (UNSPEC_NCTAID, UNSPEC_CTAID)
    	(UNSPEC_SHARED_DATA): New constants.
    	(oacc_nctaid_insn, oacc_nctaid, oacc_ctaid_insn, oacc_ctaid)
    	(ganglocal_ptr<mode>, ganglocal_ptr): New patterns.
    	* doc/md.texi (oacc_ntid, oacc_tid): Document.
    	* gimple.h (gimple_statement_omp_parallel_layout): Add
    	ganglocal_size member.
    	(gimple_omp_target_ganglocal_size)
    	(gimple_omp_target_set_ganglocal_size): New functions.
    	* gimplify.c (gimplify_omp_var_data): Add GOVD_USE_DEVICE,
    	GOVD_FORCE_MAP, GOVD_GANGLOCAL.
    	(omp_region_type): Add ORT_HOST_DATA.
    	(omp_region_kind, acc_region_kind): New enum types.
    	(gimplify_omp_ctx): Add region_kind, acc_region_kind members.
    	(new_omp_context, omp_add_variable, omp_notice_variable)
    	(gimplify_scan_omp_clauses, gimplify_adjust_omp_clauses_1): Update
    	for OpenACC.
    	(gimplify_scan_omp_clauses): Add region_kind formal parameter.
    	Adjust all users.
    	(gimplify_oacc_host_data_1, gimplify_oacc_host_data): New
    	functions.
    	(gimplify_expr): Update handling of OACC_HOST_DATA.
    	* omp-builtins.def (BUILT_IN_GOACC_KERNELS_INTERNAL)
    	(BUILT_IN_GOACC_KERNELS, BUILT_IN_GOACC_PARALLEL): Change type
    	from BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_INT_INT_VAR
    	to
    	BT_FN_VOID_INT_OMPFN_SIZE_PTR_PTR_PTR_INT_INT_INT_SIZE_INT_INT_VAR.
    	Adjust all users.
    	(BUILT_IN_GOACC_GET_THREAD_NUM, BUILT_IN_GOACC_GET_NUM_THREADS):
    	Change type from from BT_FN_INT to BT_FN_INT_INT_INT_INT.  Adjust
    	all users.
    	(BUILT_IN_GOACC_NTID, BUILT_IN_GOACC_TID, BUILT_IN_GOACC_NCTAID)
    	(BUILT_IN_GOACC_CTAID, BUILT_IN_GOACC_GET_GANGLOCAL_PTR)
    	(BUILT_IN_GOACC_DEVICEPTR): New builtins.
    	* omp-low.c (omp_context): Add oacc_reduction_set, ganglocal_init,
    	ganglocal_ptr, ganglocal_size, ganglocal_size_host members.
    	(new_omp_context, delete_omp_context): Initialize/deinitialize
    	these, respectively.
    	(omp_for_data): Add gang, worker, vector members.
    	(extract_omp_for_data): Populate these.
    	(oacc_max_threads, oacc_finalize_reduction_data): Rewrite
    	functions.
    	(is_oacc_parallel, oacc_parallel_max_reduction_array_size)
    	(align_and_expand, alloc_var_ganglocal, install_var_ganglocal)
    	(install_array_var_ganglocal)
    	(oacc_outermost_parallel_kernels_context, oacc_inside_routine)
    	(is_oacc_multithreaded, oacc_needs_global_memory)
    	(is_atomic_compatible_reduction, oacc_serial_reduction)
    	(oacc_process_reduction_data_helper): New functions.
    	(build_outer_var_ref, fixup_remapped_decl, scan_sharing_clauses)
    	(check_omp_nesting_restrictions, lower_rec_input_clauses)
    	(lower_reduction_clauses, oacc_initialize_reduction_data)
    	(oacc_process_reduction_data, lower_omp_target)
    	(lower_omp_regimplify_p): Update for OpenACC.
    	* tree-parloops.c (create_parallel_loop): For OpenACC, switch from
    	vector to gang parallelism.
    	* tree-pretty-print.c (dump_omp_clause): Handle
    	GOMP_MAP_FORCE_TO_GANGLOCAL.
    	include/
    	* gomp-constants.h (GOMP_MAP_FLAG_GANGLOCAL): New macro.
    	(gomp_map_kind): Add GOMP_MAP_GANGLOCAL,
    	GOMP_MAP_FORCE_TO_GANGLOCAL.
    	libgomp/
    	* libgomp.h (splay_tree_key_s): Add dealloc_host member.  Adjust
    	all users.
    	* libgomp.map (GOACC_2.0.GOMP_4_BRANCH): Add GOACC_deviceptr,
    	GOACC_get_ganglocal_ptr.
    	* libgomp_g.h (GOACC_get_ganglocal_ptr): New prototype.
    	* oacc-mem.c (GOACC_deviceptr): New function.
    	* oacc-parallel.c (__goacc_host_ganglocal_ptr): New static
    	variable.
    	(GOACC_get_ganglocal_ptr, alloc_host_shared_mem)
    	(free_host_shared_mem, alloc_ganglocal_addrs): New functions.
    	(GOACC_parallel, GOACC_kernels): Use them.  Add shared_size formal
    	parameter.  Adjust all users.
    	(GOACC_parallel): Remove num_workers check.
    	(GOACC_enter_exit_data, GOACC_update): Handle more mapping kinds.
    	(GOACC_get_num_threads, GOACC_get_thread_num): Add gang, worker,
    	vector formal parameters.  Adjust all users.
    	* plugin/plugin-host.c (GOMP_OFFLOAD_openacc_parallel): Add
    	shared_size formal parameter.  Adjust all users.
    	* plugin/plugin-nvptx.c (nvptx_exec)
    	(GOMP_OFFLOAD_openacc_parallel): Add shared_size formal parameter.
    	Adjust all users.
    	* target.c (gomp_map_vars, gomp_unmap_vars)
    	(gomp_offload_image_to_device): Update for OpenACC.
    	* testsuite/libgomp.oacc-c++/template-reduction.C: Update.
    	* testsuite/libgomp.oacc-c-c++-common/collapse-2.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/reduction-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/reduction-2.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/reduction-3.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/reduction-4.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Likewise.
    	* testsuite/libgomp.oacc-fortran/collapse-5.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/collapse-6.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/collapse-7.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/collapse-8.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/data-2.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/reduction-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/reduction-2.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/reduction-3.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/reduction-4.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/reduction-5.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/reduction-6.f90: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/enter-data.c: New file.
    	* testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/firstprivate-2.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/host_data-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/host_data-2.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/parallel-loop-1.c: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/parallel-loop-1.h: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/parallel-loop-2.h: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c:
    	Likewise.
    	* testsuite/libgomp.oacc-c/gwv.c: Likewise.
    	* testsuite/libgomp.oacc-fortran/firstprivate-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/host_data-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/if-1.f90: Likewise.
    	* testsuite/libgomp.oacc-fortran/implicit-firstprivate-ref.f90:
    	Likewise.
    	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Likewise.
    	* testsuite/libgomp.oacc-c-c++-common/reduction-initial-1.c:
    	Remove file.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@223178 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog.gomp                                 |   81 ++
 gcc/ada/ChangeLog.gomp                             |   12 +
 gcc/ada/gcc-interface/utils.c                      |   17 +-
 gcc/builtin-types.def                              |    5 +-
 gcc/builtins.c                                     |   90 ++
 gcc/c-family/ChangeLog.gomp                        |   12 +
 gcc/c-family/c-common.c                            |   17 +-
 gcc/c/ChangeLog.gomp                               |   16 +
 gcc/c/c-parser.c                                   |    7 +-
 gcc/c/c-typeck.c                                   |    4 +
 gcc/config/nvptx/mkoffload.c                       |    8 +
 gcc/config/nvptx/nvptx.c                           |    1 +
 gcc/config/nvptx/nvptx.md                          |   52 +
 gcc/cp/ChangeLog.gomp                              |   15 +
 gcc/cp/parser.c                                    |   10 +
 gcc/cp/semantics.c                                 |    4 +
 gcc/doc/md.texi                                    |   18 +
 gcc/fortran/ChangeLog.gomp                         |   23 +
 gcc/fortran/f95-lang.c                             |   14 +-
 gcc/fortran/gfortran.h                             |    4 +-
 gcc/fortran/openmp.c                               |   20 +-
 gcc/fortran/trans-openmp.c                         |    6 +
 gcc/fortran/types.def                              |    5 +-
 gcc/gimple.h                                       |   23 +
 gcc/gimplify.c                                     |  347 +++++-
 gcc/jit/ChangeLog.gomp                             |   19 +
 gcc/jit/jit-builtins.c                             |   10 +-
 gcc/jit/jit-builtins.h                             |    7 +-
 gcc/lto/ChangeLog.gomp                             |   12 +
 gcc/lto/lto-lang.c                                 |   17 +-
 gcc/omp-builtins.def                               |   22 +-
 gcc/omp-low.c                                      | 1145 ++++++++++++++++----
 gcc/testsuite/ChangeLog.gomp                       |   36 +
 gcc/testsuite/c-c++-common/goacc/dtype-1.c         |   93 +-
 gcc/testsuite/c-c++-common/goacc/dtype-2.c         |    2 +-
 gcc/testsuite/c-c++-common/goacc/executeables-1.c  |   74 ++
 gcc/testsuite/c-c++-common/goacc/firstprivate.c    |    9 +
 gcc/testsuite/c-c++-common/goacc/host_data-1.c     |    1 -
 gcc/testsuite/c-c++-common/goacc/host_data-2.c     |    1 -
 gcc/testsuite/c-c++-common/goacc/host_data-3.c     |    6 +-
 gcc/testsuite/c-c++-common/goacc/host_data-4.c     |    1 -
 .../c-c++-common/goacc/private-reduction-1.c       |   10 +
 gcc/testsuite/c-c++-common/goacc/sb-3.c            |    2 +-
 gcc/testsuite/c-c++-common/goacc/tile.c            |    3 -
 gcc/testsuite/g++.dg/goacc/loop-1.c                |   23 +
 gcc/testsuite/g++.dg/goacc/loop-2.c                |   70 ++
 gcc/testsuite/g++.dg/goacc/loop-3.c                |   43 +
 gcc/testsuite/g++.dg/goacc/template-reduction.C    |    6 +-
 gcc/testsuite/g++.dg/goacc/template.C              |   15 +
 gcc/testsuite/gcc.dg/goacc/sb-1.c                  |   73 ++
 gcc/testsuite/gcc.dg/goacc/sb-2.c                  |   20 +
 .../{c-c++-common => gcc.dg}/goacc/sb-3.c          |    4 +-
 gcc/testsuite/gfortran.dg/goacc/coarray.f95        |    1 -
 gcc/testsuite/gfortran.dg/goacc/dtype-1.f95        |  105 +-
 gcc/testsuite/gfortran.dg/goacc/firstprivate-1.f95 |   11 +
 gcc/testsuite/gfortran.dg/goacc/host_data-tree.f95 |    1 -
 gcc/testsuite/gfortran.dg/goacc/list.f95           |    6 +-
 gcc/testsuite/gfortran.dg/goacc/loop-tree-1.f90    |    2 +-
 gcc/testsuite/gfortran.dg/goacc/parallel-tree.f95  |    2 +-
 gcc/tree-parloops.c                                |   18 +-
 gcc/tree-pretty-print.c                            |   12 +
 include/ChangeLog.gomp                             |   13 +
 include/gomp-constants.h                           |    7 +-
 libgomp/ChangeLog.gomp                             |   71 ++
 libgomp/libgomp.h                                  |    4 +-
 libgomp/libgomp.map                                |    2 +
 libgomp/libgomp_g.h                                |    9 +-
 libgomp/oacc-mem.c                                 |   38 +-
 libgomp/oacc-parallel.c                            |  140 ++-
 libgomp/plugin/plugin-host.c                       |    1 +
 libgomp/plugin/plugin-nvptx.c                      |   35 +-
 libgomp/target.c                                   |   11 +
 .../libgomp.oacc-c++/template-reduction.C          |   14 +-
 .../libgomp.oacc-c-c++-common/collapse-2.c         |    2 +-
 .../testsuite/libgomp.oacc-c-c++-common/data-2.c   |  161 ++-
 .../libgomp.oacc-c-c++-common/enter-data.c         |   23 +
 .../libgomp.oacc-c-c++-common/firstprivate-1.c     |   32 +
 .../libgomp.oacc-c-c++-common/firstprivate-2.c     |   55 +
 .../libgomp.oacc-c-c++-common/host_data-1.c        |  125 +++
 .../libgomp.oacc-c-c++-common/host_data-2.c        |   50 +
 .../libgomp.oacc-c-c++-common/parallel-loop-1.c    |   37 +
 .../libgomp.oacc-c-c++-common/parallel-loop-1.h    |   20 +
 .../libgomp.oacc-c-c++-common/parallel-loop-2.h    |  282 +++++
 .../libgomp.oacc-c-c++-common/parallel-reduction.c |   67 ++
 .../libgomp.oacc-c-c++-common/reduction-1.c        |   13 +-
 .../libgomp.oacc-c-c++-common/reduction-2.c        |   93 +-
 .../libgomp.oacc-c-c++-common/reduction-3.c        |   93 +-
 .../libgomp.oacc-c-c++-common/reduction-4.c        |   69 +-
 .../libgomp.oacc-c-c++-common/reduction-5.c        |    6 +-
 .../reduction-initial-1.c                          |   25 -
 libgomp/testsuite/libgomp.oacc-c/gwv.c             |   34 +
 .../testsuite/libgomp.oacc-fortran/collapse-5.f90  |    2 +-
 .../testsuite/libgomp.oacc-fortran/collapse-6.f90  |    2 +-
 .../testsuite/libgomp.oacc-fortran/collapse-7.f90  |    2 +-
 .../testsuite/libgomp.oacc-fortran/collapse-8.f90  |    2 +-
 libgomp/testsuite/libgomp.oacc-fortran/data-2.f90  |  126 ++-
 .../libgomp.oacc-fortran/firstprivate-1.f90        |   42 +
 .../testsuite/libgomp.oacc-fortran/host_data-1.f90 |   28 +
 libgomp/testsuite/libgomp.oacc-fortran/if-1.f90    |  453 ++++++++
 .../implicit-firstprivate-ref.f90                  |   42 +
 .../libgomp.oacc-fortran/parallel-reduction.f90    |   38 +
 .../testsuite/libgomp.oacc-fortran/reduction-1.f90 |   54 +-
 .../testsuite/libgomp.oacc-fortran/reduction-2.f90 |   46 +-
 .../testsuite/libgomp.oacc-fortran/reduction-3.f90 |   46 +-
 .../testsuite/libgomp.oacc-fortran/reduction-4.f90 |   36 +-
 .../testsuite/libgomp.oacc-fortran/reduction-5.f90 |    4 +-
 .../testsuite/libgomp.oacc-fortran/reduction-6.f90 |    4 +-
 107 files changed, 4430 insertions(+), 722 deletions(-)


Grüße,
 Thomas



[-- Attachment #1.2: gcc-gomp-4_0-branch-r223178.patch.gz --]
[-- Type: application/x-gzip, Size: 48988 bytes --]

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gomp4] Assorted OpenACC changes (was: Next set of OpenACC changes)
  2015-05-13 20:57   ` [gomp4] Assorted OpenACC changes (was: Next set of OpenACC changes) Thomas Schwinge
@ 2015-05-14  8:37     ` Jakub Jelinek
  0 siblings, 0 replies; 11+ messages in thread
From: Jakub Jelinek @ 2015-05-14  8:37 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: gcc-patches, Bernd Schmidt, Cesar Philippidis, Chung-Lin Tang,
	James Norris, Joseph Myers, Julian Brown, Tom de Vries

On Wed, May 13, 2015 at 10:52:08PM +0200, Thomas Schwinge wrote:
> In a similar vein, I have now committed the following to gomp-4_0-branch
> in r223178.  This is not meant to be integrated into trunk as-is: there
> are incompatible libgomp ABI changes, for example.  We'd still appreciate

ABI incompatible libgomp changes are absolute no-no.  You need to provide
backwards compatibility...

	Jakub

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-05-14  8:13 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-05  8:54 Next set of OpenACC changes Thomas Schwinge
2015-05-05  8:56 ` Next set of OpenACC changes: middle end, libgomp Thomas Schwinge
2015-05-05  8:58 ` Next set of OpenACC changes: C family Thomas Schwinge
2015-05-05 14:19   ` Jakub Jelinek
2015-05-05 15:40     ` Cesar Philippidis
2015-05-05  8:59 ` Next set of OpenACC changes: Fortran Thomas Schwinge
2015-05-05 10:42   ` Bernhard Reutner-Fischer
2015-05-05  9:00 ` Next set of OpenACC changes: Testsuite Thomas Schwinge
2015-05-11 16:35 ` [gomp4] Next set of OpenACC changes Thomas Schwinge
2015-05-13 20:57   ` [gomp4] Assorted OpenACC changes (was: Next set of OpenACC changes) Thomas Schwinge
2015-05-14  8:37     ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).