public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [Patch] OpenMP: Support reverse offload (middle end part)
@ 2022-07-21 10:55 Tobias Burnus
  2022-07-21 12:33 ` Tobias Burnus
  0 siblings, 1 reply; 3+ messages in thread
From: Tobias Burnus @ 2022-07-21 10:55 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 2736 bytes --]

This patch does three things:

(a) It removes a 'sorry' for 'device(ancestor:1)' and passes
     GOMP_DEVICE_HOST_FALLBACK as device number.

This is sufficient for full "reverse" offload support with ENABLE_OFFLOADING
being false - and -foffload=disable. And for simple hello-world cases.


On the libgomp side, the 'requires reverse_offload' currently implies that
the initial device is the only device. While that's all fine, this change
is insufficient if offloading devices are enabled during compilation as:


(b.1) The offload-device lto1 should not see the content of the ancestor:1 target
region and all the calls it does. If it does, there will be link errors for
functions not available and it also would pointlessly increase the code size.

Thus, the second part is to create an empty function for devices and a full
version for the host.

The general idea is: The device version can be used as lookup pointer in the
offload_funcs table; thus, we both need a function on the device and a call to
GOMP_target_ext.

It turned out to be quite difficult as late in the processing changing a
FUNCTION_DECL is not that easy – nor removing it after all analysis has been
done. I hope the current version is not too hackish – and maybe someone has
an idea how to best not to assembly the 'nonhost' version on the host.
(Not critical as it is small (having an empty body) - but still it would be
nicer not to write it to .s file.)


(b.2) The omp-offload.cc assert showed that cloning and inlining happened
for the included libgomp example. While inlining should be okay (of
'subroutine m2_tg_fn' (and for C/C++ 'tg_fn')) - cloning will break
the offload_func table lookup - and, hence, had to be excluded → "noclone".
I think it could also affect non-anchestor:1 code - but did not try to
create an example.


(c) Prepare for actual reverse offloading
While (b) already does some prep work for real offloading, at least one more
step is needed: In order to allow that the function pointer can be used for
offload_func table lookup, it has to be passed to libgomp.

Currently, the 'fn' argument is nullified in on-device calls to GOMP_target_ext.
The third part of this patch nullifies it now only for non-reverse offloads.

OK for mainline?

  * * *

Next steps: Implement reverse offloading for devices. In theory, this only
requires libgomp work, but let's see what else will be required.

Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: omp-ancestor-ME-v2.diff --]
[-- Type: text/x-patch, Size: 23012 bytes --]

OpenMP: Support reverse offload (middle end part)

gcc/ChangeLog:

	* internal-fn.cc (expand_GOMP_TARGET_REV): New.
	* internal-fn.def (GOMP_TARGET_REV): New.
	* lto-cgraph.cc (lto_output_node, verify_node_partition): Mark
	'omp target device_ancestor_host' as in_other_partition and don't
	error if absent.
	* omp-low.cc (create_omp_child_function): Mark as 'noclone'.
	* omp-expand.cc (expand_omp_target): For reverse offload, remove
	sorry, use device = GOMP_DEVICE_HOST_FALLBACK and create
	empty-body nohost function.
	* omp-offload.cc (execute_omp_device_lower): Handle
	IFN_GOMP_TARGET_REV.
	(pass_omp_target_link::execute): For ACCEL_COMPILER, don't
	nullify fn argument for reverse offload

libgomp/ChangeLog:

	* libgomp.texi (OpenMP 5.0): Mark 'ancestor' as implemented but
	refer to 'requires'.
	* testsuite/libgomp.c-c++-common/reverse-offload-1-aux.c: New test.
	* testsuite/libgomp.c-c++-common/reverse-offload-1.c: New test.
	* testsuite/libgomp.fortran/reverse-offload-1-aux.f90: New test.
	* testsuite/libgomp.fortran/reverse-offload-1.f90: New test.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/reverse-offload-1.c: Remove dg-sorry.
	* c-c++-common/gomp/target-device-ancestor-4.c: Likewise.
	* gfortran.dg/gomp/target-device-ancestor-4.f90: Likewise.
	* gfortran.dg/gomp/target-device-ancestor-5.f90: Likewise.

 gcc/internal-fn.cc                                 |   8 ++
 gcc/internal-fn.def                                |   1 +
 gcc/lto-cgraph.cc                                  |  20 +++-
 gcc/omp-expand.cc                                  | 107 +++++++++++++++++++--
 gcc/omp-low.cc                                     |   4 +-
 gcc/omp-offload.cc                                 |  50 ++++++++++
 .../c-c++-common/gomp/reverse-offload-1.c          |   2 +-
 .../c-c++-common/gomp/target-device-ancestor-4.c   |   2 +-
 .../gfortran.dg/gomp/target-device-ancestor-4.f90  |   2 +-
 .../gfortran.dg/gomp/target-device-ancestor-5.f90  |   2 +-
 libgomp/libgomp.texi                               |   2 +-
 .../libgomp.c-c++-common/reverse-offload-1-aux.c   |  10 ++
 .../libgomp.c-c++-common/reverse-offload-1.c       |  83 ++++++++++++++++
 .../libgomp.fortran/reverse-offload-1-aux.f90      |  12 +++
 .../libgomp.fortran/reverse-offload-1.f90          |  88 +++++++++++++++++
 15 files changed, 375 insertions(+), 18 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 28973d957fb..44530142340 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -368,6 +368,14 @@ expand_GOMP_SIMT_VF (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+/* This should get expanded in omp_device_lower pass.  */
+
+static void
+expand_GOMP_TARGET_REV (internal_fn, gcall *)
+{
+  gcc_unreachable ();
+}
+
 /* Lane index of the first SIMT lane that supplies a non-zero argument.
    This is a SIMT counterpart to GOMP_SIMD_LAST_LANE, used to represent the
    lane that executed the last iteration for handling OpenMP lastprivate.  */
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 7c398baadc8..891bb8c363b 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -336,6 +336,7 @@ DEF_INTERNAL_INT_FN (FFS, ECF_CONST | ECF_NOTHROW, ffs, unary)
 DEF_INTERNAL_INT_FN (PARITY, ECF_CONST | ECF_NOTHROW, parity, unary)
 DEF_INTERNAL_INT_FN (POPCOUNT, ECF_CONST | ECF_NOTHROW, popcount, unary)
 
+DEF_INTERNAL_FN (GOMP_TARGET_REV, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_USE_SIMT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_ENTER, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_ENTER_ALLOC, ECF_LEAF | ECF_NOTHROW, NULL)
diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index 6d9c36ea8b6..062677a32eb 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -430,6 +430,13 @@ lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node,
 	 after reading back.  */
       in_other_partition = 1;
     }
+  else if (UNLIKELY (lto_stream_offload_p
+		     && lookup_attribute ("omp target device_ancestor_host",
+					  DECL_ATTRIBUTES (node->decl))))
+    /* This symbol is only used as argument to IFN_GOMP_TARGET_REV; this IFN
+       is ignored on ACCEL_COMPILER.  Thus, mark it as in_other_partition to silence
+       verify_node_partition diagnostic.  */
+    in_other_partition = 1;
 
   clone_of = node->clone_of;
   while (clone_of
@@ -1140,10 +1147,15 @@ verify_node_partition (symtab_node *node)
   if (node->in_other_partition)
     {
       if (TREE_CODE (node->decl) == FUNCTION_DECL)
-	error_at (DECL_SOURCE_LOCATION (node->decl),
-		  "function %qs has been referenced in offloaded code but"
-		  " hasn%'t been marked to be included in the offloaded code",
-		  node->name ());
+	{
+	  if (lookup_attribute ("omp target device_ancestor_host",
+				DECL_ATTRIBUTES (node->decl)) != NULL)
+	    return;
+	  error_at (DECL_SOURCE_LOCATION (node->decl),
+		    "function %qs has been referenced in offloaded code but"
+		    " hasn%'t been marked to be included in the offloaded code",
+		    node->name ());
+	}
       else if (VAR_P (node->decl))
 	error_at (DECL_SOURCE_LOCATION (node->decl),
 		  "variable %qs has been referenced in offloaded code but"
diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index 1023c56fc3d..74b1588e35e 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -9651,7 +9651,7 @@ expand_omp_target (struct omp_region *region)
 {
   basic_block entry_bb, exit_bb, new_bb;
   struct function *child_cfun;
-  tree child_fn, block, t;
+  tree child_fn, child_fn2, block, t, c;
   gimple_stmt_iterator gsi;
   gomp_target *entry_stmt;
   gimple *stmt;
@@ -9688,10 +9688,16 @@ expand_omp_target (struct omp_region *region)
       gcc_unreachable ();
     }
 
-  child_fn = NULL_TREE;
+  tree clauses = gimple_omp_target_clauses (entry_stmt);
+
+  bool is_ancestor = false;
+  child_fn = child_fn2 = NULL_TREE;
   child_cfun = NULL;
   if (offloaded)
     {
+      c = omp_find_clause (clauses, OMP_CLAUSE_DEVICE);
+      if (ENABLE_OFFLOADING && c)
+	is_ancestor = OMP_CLAUSE_DEVICE_ANCESTOR (c);
       child_fn = gimple_omp_target_child_fn (entry_stmt);
       child_cfun = DECL_STRUCT_FUNCTION (child_fn);
     }
@@ -9879,7 +9885,8 @@ expand_omp_target (struct omp_region *region)
 	{
 	  if (in_lto_p)
 	    DECL_PRESERVE_P (child_fn) = 1;
-	  vec_safe_push (offload_funcs, child_fn);
+	  if (!is_ancestor)
+	    vec_safe_push (offload_funcs, child_fn);
 	}
 
       bool need_asm = DECL_ASSEMBLER_NAME_SET_P (current_function_decl)
@@ -9918,11 +9925,88 @@ expand_omp_target (struct omp_region *region)
 	}
 
       adjust_context_and_scope (region, gimple_block (entry_stmt), child_fn);
+
+      /* Handle the case that an inner ancestor:1 target is called by an outer
+	 target region. */
+      if (!is_ancestor)
+	cgraph_node::get (child_fn)->calls_declare_variant_alt
+	  |= cgraph_node::get (cfun->decl)->calls_declare_variant_alt;
+      else  /* Duplicate function to create empty nonhost variant. */
+	{
+	  /* Enable pass_omp_device_lower pass.  */
+	  cgraph_node::get (cfun->decl)->calls_declare_variant_alt = 1;
+	  cgraph_node *fn2_node;
+	  child_fn2 = build_decl (DECL_SOURCE_LOCATION (child_fn),
+				  FUNCTION_DECL,
+				  clone_function_name (child_fn, "nohost"),
+				  TREE_TYPE (child_fn));
+	  if (in_lto_p)
+	    DECL_PRESERVE_P (child_fn2) = 1;
+	  TREE_STATIC (child_fn2) = 1;
+	  DECL_ARTIFICIAL (child_fn2) = 1;
+	  DECL_IGNORED_P (child_fn2) = 0;
+	  TREE_PUBLIC (child_fn2) = 0;
+	  DECL_UNINLINABLE (child_fn2) = 1;
+	  DECL_EXTERNAL (child_fn2) = 0;
+	  DECL_CONTEXT (child_fn2) = NULL_TREE;
+	  DECL_INITIAL (child_fn2) = make_node (BLOCK);
+	  BLOCK_SUPERCONTEXT (DECL_INITIAL (child_fn2)) = child_fn2;
+	  DECL_ATTRIBUTES (child_fn)
+	    = remove_attribute ("omp target entrypoint",
+				DECL_ATTRIBUTES (child_fn));
+	  DECL_ATTRIBUTES (child_fn2)
+	    = tree_cons (get_identifier ("omp target device_ancestor_nohost"),
+			 NULL_TREE, copy_list (DECL_ATTRIBUTES (child_fn)));
+	  DECL_ATTRIBUTES (child_fn)
+	    = tree_cons (get_identifier ("omp target device_ancestor_host"),
+			 NULL_TREE, DECL_ATTRIBUTES (child_fn));
+	  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (child_fn2)
+	    = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (current_function_decl);
+	  DECL_FUNCTION_SPECIFIC_TARGET (child_fn2)
+	    = DECL_FUNCTION_SPECIFIC_TARGET (current_function_decl);
+	  DECL_FUNCTION_VERSIONED (child_fn2)
+	    = DECL_FUNCTION_VERSIONED (current_function_decl);
+
+	  fn2_node = cgraph_node::get_create (child_fn2);
+	  fn2_node->offloadable = 1;
+	  fn2_node->force_output = 1;
+	  node->offloadable = 0;
+
+	  t = build_decl (DECL_SOURCE_LOCATION (child_fn),
+			  RESULT_DECL, NULL_TREE, void_type_node);
+	  DECL_ARTIFICIAL (t) = 1;
+	  DECL_IGNORED_P (t) = 1;
+	  DECL_CONTEXT (t) = child_fn2;
+	  DECL_RESULT (child_fn2) = t;
+	  DECL_SAVED_TREE (child_fn2) = build1 (RETURN_EXPR,
+						void_type_node, NULL);
+	  tree tmp = DECL_ARGUMENTS (child_fn);
+	  t = build_decl (DECL_SOURCE_LOCATION (child_fn), PARM_DECL,
+			  DECL_NAME (tmp), TREE_TYPE (tmp));
+	  DECL_ARTIFICIAL (t) = 1;
+	  DECL_NAMELESS (t) = 1;
+	  DECL_ARG_TYPE (t) = ptr_type_node;
+	  DECL_CONTEXT (t) = current_function_decl;
+	  TREE_USED (t) = 1;
+	  TREE_READONLY (t) = 1;
+	  DECL_ARGUMENTS (child_fn2) = t;
+	  gcc_assert (TREE_CHAIN (tmp) == NULL_TREE);
+
+	  gimplify_function_tree (child_fn2);
+	  cgraph_node::add_new_function (child_fn2, true);
+
+	  vec_safe_push (offload_funcs, child_fn2);
+	  if (dump_file && !gimple_in_ssa_p (cfun))
+	    {
+	      dump_function_header (dump_file, child_fn2, dump_flags);
+	      dump_function_to_file (child_fn2, dump_file, dump_flags);
+	    }
+	}
     }
 
   /* Emit a library call to launch the offloading region, or do data
      transfers.  */
-  tree t1, t2, t3, t4, depend, c, clauses;
+  tree t1, t2, t3, t4, depend;
   enum built_in_function start_ix;
   unsigned int flags_i = 0;
 
@@ -9972,8 +10056,6 @@ expand_omp_target (struct omp_region *region)
       gcc_unreachable ();
     }
 
-  clauses = gimple_omp_target_clauses (entry_stmt);
-
   tree device = NULL_TREE;
   location_t device_loc = UNKNOWN_LOCATION;
   tree goacc_flags = NULL_TREE;
@@ -10005,7 +10087,8 @@ expand_omp_target (struct omp_region *region)
 	    need_device_adjustment = true;
 	  device_loc = OMP_CLAUSE_LOCATION (c);
 	  if (OMP_CLAUSE_DEVICE_ANCESTOR (c))
-	    sorry_at (device_loc, "%<ancestor%> not yet supported");
+	    device = build_int_cst (integer_type_node,
+				    GOMP_DEVICE_HOST_FALLBACK);
 	}
       else
 	{
@@ -10182,7 +10265,7 @@ expand_omp_target (struct omp_region *region)
   else
     args.quick_push (device);
   if (offloaded)
-    args.quick_push (build_fold_addr_expr (child_fn));
+    args.quick_push (build_fold_addr_expr (child_fn2 ? child_fn2 : child_fn));
   args.quick_push (t1);
   args.quick_push (t2);
   args.quick_push (t3);
@@ -10304,6 +10387,14 @@ expand_omp_target (struct omp_region *region)
     /*  Push terminal marker - zero.  */
     args.safe_push (oacc_launch_pack (0, NULL_TREE, 0));
 
+  if (child_fn2)
+    {
+      g = gimple_build_call_internal (IFN_GOMP_TARGET_REV, 1,
+				      build_fold_addr_expr (child_fn));
+      gimple_set_location (g, gimple_location (entry_stmt));
+      gsi_insert_before (&gsi, g, GSI_SAME_STMT);
+    }
+
   g = gimple_build_call_vec (builtin_decl_explicit (start_ix), args);
   gimple_set_location (g, gimple_location (entry_stmt));
   gsi_insert_before (&gsi, g, GSI_SAME_STMT);
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index d73c165f029..64a8a1ac07b 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -2104,7 +2104,9 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
       if (target_attr)
 	DECL_ATTRIBUTES (decl)
 	  = tree_cons (get_identifier (target_attr),
-		       NULL_TREE, DECL_ATTRIBUTES (decl));
+		       NULL_TREE,
+		       tree_cons (get_identifier ("noclone"), NULL_TREE,
+				  DECL_ATTRIBUTES (decl)));
     }
 
   t = build_decl (DECL_SOURCE_LOCATION (decl),
diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc
index 3a89119371c..77be0665267 100644
--- a/gcc/omp-offload.cc
+++ b/gcc/omp-offload.cc
@@ -2627,6 +2627,47 @@ execute_omp_device_lower ()
 	tree type = lhs ? TREE_TYPE (lhs) : integer_type_node;
 	switch (gimple_call_internal_fn (stmt))
 	  {
+	  case IFN_GOMP_TARGET_REV:
+	    {
+#ifndef ACCEL_COMPILER
+	      gimple_stmt_iterator gsi2 = gsi;
+	      gsi_next (&gsi2);
+	      gcc_assert (!gsi_end_p (gsi2));
+	      gcc_assert (gimple_call_builtin_p (gsi_stmt (gsi2),
+						 BUILT_IN_GOMP_TARGET));
+	      tree old_decl
+		= TREE_OPERAND (gimple_call_arg (gsi_stmt (gsi2), 1), 0);
+	      tree new_decl = gimple_call_arg (gsi_stmt (gsi), 0);
+	      gimple_call_set_arg (gsi_stmt (gsi2), 1, new_decl);
+	      update_stmt (gsi_stmt (gsi2));
+	      new_decl = TREE_OPERAND (new_decl, 0);
+	      unsigned i;
+	      unsigned num_funcs = vec_safe_length (offload_funcs);
+	      for (i = 0; i < num_funcs; i++)
+		{
+		  if ((*offload_funcs)[i] == old_decl)
+		    {
+		      (*offload_funcs)[i] = new_decl;
+		      break;
+		    }
+		  else if ((*offload_funcs)[i] == new_decl)
+		    break;  /* This can happen due to inlining.  */
+		}
+	      gcc_assert (i < num_funcs);
+#else
+	      tree old_decl = TREE_OPERAND (gimple_call_arg (gsi_stmt (gsi), 0),
+					    0);
+#endif
+	      /* FIXME: Find a way to actually prevent outputting the empty-body
+		 old_decl as debug symbol + function in the assembly file.  */
+	      cgraph_node *node = cgraph_node::get (old_decl);
+	      node->address_taken = false;
+	      node->need_lto_streaming = false;
+	      node->offloadable = false;
+
+	      unlink_stmt_vdef (stmt);
+	    }
+	    break;
 	  case IFN_GOMP_USE_SIMT:
 	    rhs = vf == 1 ? integer_zero_node : integer_one_node;
 	    break;
@@ -2803,6 +2844,15 @@ pass_omp_target_link::execute (function *fun)
 	{
 	  if (gimple_call_builtin_p (gsi_stmt (gsi), BUILT_IN_GOMP_TARGET))
 	    {
+	      tree dev = gimple_call_arg (gsi_stmt (gsi), 0);
+	      tree fn = gimple_call_arg (gsi_stmt (gsi), 1);
+	      if (POINTER_TYPE_P (TREE_TYPE (fn)))
+		fn = TREE_OPERAND (fn, 0);
+	      if (TREE_CODE (dev) == INTEGER_CST
+		  && wi::to_wide (dev) == GOMP_DEVICE_HOST_FALLBACK
+		  && lookup_attribute ("omp target device_ancestor_nohost",
+				       DECL_ATTRIBUTES (fn)) != NULL_TREE)
+		continue;  /* ancestor:1  */
 	      /* Nullify the second argument of __builtin_GOMP_target_ext.  */
 	      gimple_call_set_arg (gsi_stmt (gsi), 1, null_pointer_node);
 	      update_stmt (gsi_stmt (gsi));
diff --git a/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c b/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
index 3452156f948..9a3fa5230f8 100644
--- a/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
@@ -43,7 +43,7 @@ tg_fn (int *x, int *y)
   x2 = x2 + 2 + called_in_target1 ();
   y2 = y2 + 7;
 
-  #pragma omp target device(ancestor : 1) map(tofrom: x2)  /* { dg-message "sorry, unimplemented: 'ancestor' not yet supported" } */
+  #pragma omp target device(ancestor : 1) map(tofrom: x2)
     check_offload(&x2, &y2);
 
   if (x2 != 2+2+3+42 || y2 != 3 + 7)
diff --git a/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-4.c b/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-4.c
index 241234f8daf..87ac7548c23 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-4.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-4.c
@@ -9,7 +9,7 @@
 void
 foo (void)
 {
-  #pragma omp target device (ancestor: 1) /* { dg-message "sorry, unimplemented: 'ancestor' not yet supported" } */
+  #pragma omp target device (ancestor: 1)
   ;
 
 }
diff --git a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90 b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90
index ab56e2d1d52..d73adf2c5a7 100644
--- a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90
@@ -6,7 +6,7 @@
 
 !$omp requires reverse_offload
 
-!$omp target device (ancestor : 1)  ! { dg-message "sorry, unimplemented: 'ancestor' not yet supported" }
+!$omp target device (ancestor : 1)
 !$omp end target
 
 end
diff --git a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-5.f90 b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-5.f90
index ca8d4b282a0..9596d61f6fa 100644
--- a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-5.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-5.f90
@@ -17,7 +17,7 @@ contains
     block
       block
         block
-          !$omp target device(ancestor:1)  ! { dg-message "sorry, unimplemented: 'ancestor' not yet supported" }
+          !$omp target device(ancestor:1)
           !$omp end target
         end block
       end block
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index e88fe89a5b1..0f2998cf8f1 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -225,7 +225,7 @@ The OpenMP 4.5 specification is fully supported.
 @item @code{allocate} clause @tab P @tab Initial support
 @item @code{use_device_addr} clause on @code{target data} @tab Y @tab
 @item @code{ancestor} modifier on @code{device} clause
-      @tab P @tab Reverse offload unsupported
+      @tab Y @tab See comment for @code{requires}
 @item Implicit declare target directive @tab Y @tab
 @item Discontiguous array section with @code{target update} construct
       @tab N @tab
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1-aux.c b/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1-aux.c
new file mode 100644
index 00000000000..b3a331d12da
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1-aux.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target skip-all-targets } }  */
+
+/* Declare the following function in a separare translation unit
+   to ensure it won't have a device version.  */
+
+int
+add_3 (int x)
+{
+  return x + 3;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c b/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c
new file mode 100644
index 00000000000..976e129f560
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c
@@ -0,0 +1,83 @@
+/* { dg-do run }  */
+/* { dg-additional-sources reverse-offload-1-aux.c } */
+
+/* Check that reverse offload works in particular:
+   - no code is generated on the device side (i.e. no
+     implicit declare target of called functions and no
+     code gen for the target-region body)
+     -> would otherwise fail due to 'add_3' symbol
+   - Plus the usual (compiles, runs, produces correct result)
+
+   Note: Running also the non-reverse-offload target regions
+   on the host (host fallback) is valid and will pass.  */
+
+#pragma omp requires reverse_offload
+
+extern int add_3 (int);
+
+static int global_var = 5;
+
+void
+check_offload (int *x, int *y)
+{
+  *x = add_3 (*x);
+  *y = add_3 (*y);
+}
+
+#pragma omp declare target
+void
+tg_fn (int *x, int *y)
+{
+  int x2 = *x, y2 = *y;
+  if (x2 != 2 || y2 != 3)
+    __builtin_abort ();
+  x2 = x2 + 2;
+  y2 = y2 + 7;
+
+  #pragma omp target device(ancestor : 1) map(tofrom: x2)
+    check_offload(&x2, &y2);
+
+  if (x2 != 2+2+3 || y2 != 3 + 7)
+    __builtin_abort ();
+  *x = x2, *y = y2;
+}
+#pragma omp end declare target
+
+void
+my_func (int *x, int *y)
+{
+  if (global_var != 5)
+    __builtin_abort ();
+  global_var = 242;
+  *x = 2*add_3(*x);
+  *y = 3*add_3(*y);
+}
+
+int
+main ()
+{
+  #pragma omp target
+  {
+     int x = 2, y = 3;
+     tg_fn (&x, &y);
+  }
+
+  #pragma omp target
+  {
+     int x = -2, y = -1;
+     #pragma omp target device ( ancestor:1 ) firstprivate(y) map(tofrom:x)
+     {
+       if (x != -2 || y != -1)
+         __builtin_abort ();
+       my_func (&x, &y);
+       if (x != 2*(3-2) || y != 3*(3-1))
+         __builtin_abort ();
+     }
+     if (x != 2*(3-2) || y != -1)
+       __builtin_abort ();
+  }
+
+  if (global_var != 242)
+    __builtin_abort ();
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/reverse-offload-1-aux.f90 b/libgomp/testsuite/libgomp.fortran/reverse-offload-1-aux.f90
new file mode 100644
index 00000000000..1807f063d5a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/reverse-offload-1-aux.f90
@@ -0,0 +1,12 @@
+! { dg-do compile { target skip-all-targets } }
+
+! Declare the following function in a separare translation unit
+! to ensure it won't have a device version.
+
+
+integer function add_3 (x)
+  implicit none
+  integer, value :: x
+
+  add_3 = x + 3
+end function
diff --git a/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90 b/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90
new file mode 100644
index 00000000000..7cfb8b6552e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90
@@ -0,0 +1,88 @@
+! { dg-do run }
+! { dg-additional-sources reverse-offload-1-aux.f90 }
+
+! Check that reverse offload works in particular:
+! - no code is generated on the device side (i.e. no
+!   implicit declare target of called functions and no
+!   code gen for the target-region body)
+!   -> would otherwise fail due to 'add_3' symbol
+! - Plus the usual (compiles, runs, produces correct result)
+
+! Note: Running also the non-reverse-offload target regions
+! on the host (host fallback) is valid and will pass.
+
+module m
+  interface
+    integer function add_3 (x)
+      implicit none
+      integer, value :: x
+    end function
+  end interface
+  integer :: global_var = 5
+end module m
+
+module m2
+  use m
+  !$omp requires reverse_offload
+  implicit none (type, external)
+contains
+  subroutine check_offload (x, y)
+    integer :: x, y
+    x = add_3(x)
+    y = add_3(y)
+  end subroutine check_offload
+  subroutine m2_tg_fn(x, y)
+    integer :: x, y
+    !$omp declare target
+    if (x /= 2 .or. y /= 3) stop 1
+    x = x + 2
+    y = y + 7
+    !$omp target device(ancestor : 1) map(tofrom: x)
+      call check_offload(x, y)
+    !$omp end target
+    if (x /= 2+2+3 .or. y /= 3 + 7) stop 2
+  end subroutine
+end module m2
+
+program main
+  use m
+  !$omp requires reverse_offload
+  implicit none (type, external)
+
+  integer :: prog_var = 99
+
+  !$omp target
+  block
+    use m2
+    integer :: x, y
+    x = 2; y = 3
+    call m2_tg_fn (x, y)
+  end block
+
+  !$omp target
+  block
+    use m2
+    integer :: x, y
+    x = -2; y = -1
+    !$omp target device ( ancestor:1 ) firstprivate(y) map(tofrom:x)
+      if (x /= -2 .or. y /= -1) stop 3
+      call my_func (x, y)
+      if (x /= 2*(3-2) .or. y /= 3*(3-1)) stop 5
+    !$omp end target
+    if (x /= 2*(3-2) .or. y /= -1) stop 6
+  end block
+
+  if (prog_var /= 41 .or. global_var /= 242) stop 7
+
+contains
+
+  subroutine my_func(x, y)
+    integer :: x, y
+    if (prog_var /= 99) stop 8
+    if (global_var /= 5) stop 9
+    prog_var = 41
+    global_var = 242
+    x = 2*add_3(x)
+    y = 3*add_3(y)
+  end subroutine my_func
+end

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Patch] OpenMP: Support reverse offload (middle end part)
  2022-07-21 10:55 [Patch] OpenMP: Support reverse offload (middle end part) Tobias Burnus
@ 2022-07-21 12:33 ` Tobias Burnus
  2022-08-26  9:53   ` Jakub Jelinek
  0 siblings, 1 reply; 3+ messages in thread
From: Tobias Burnus @ 2022-07-21 12:33 UTC (permalink / raw)
  To: gcc-patches, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 3182 bytes --]

Ups to quick/wrong patch file. I had found an issue related to 'noclone'
(duplicated entries, dg-scan-dump issues with OpenACC) – but ended up to
attach the wrong file...  Changes: omp-low.cc and
gcc/testsuite/*/goacc/. The rest is the same.

Tobias

On 21.07.22 12:55, Tobias Burnus wrote:
> This patch does three things:
>
> (a) It removes a 'sorry' for 'device(ancestor:1)' and passes
>     GOMP_DEVICE_HOST_FALLBACK as device number.
>
> This is sufficient for full "reverse" offload support with
> ENABLE_OFFLOADING
> being false - and -foffload=disable. And for simple hello-world cases.
>
>
> On the libgomp side, the 'requires reverse_offload' currently implies
> that
> the initial device is the only device. While that's all fine, this change
> is insufficient if offloading devices are enabled during compilation as:
>
>
> (b.1) The offload-device lto1 should not see the content of the
> ancestor:1 target
> region and all the calls it does. If it does, there will be link
> errors for
> functions not available and it also would pointlessly increase the
> code size.
>
> Thus, the second part is to create an empty function for devices and a
> full
> version for the host.
>
> The general idea is: The device version can be used as lookup pointer
> in the
> offload_funcs table; thus, we both need a function on the device and a
> call to
> GOMP_target_ext.
>
> It turned out to be quite difficult as late in the processing changing a
> FUNCTION_DECL is not that easy – nor removing it after all analysis
> has been
> done. I hope the current version is not too hackish – and maybe
> someone has
> an idea how to best not to assembly the 'nonhost' version on the host.
> (Not critical as it is small (having an empty body) - but still it
> would be
> nicer not to write it to .s file.)
>
>
> (b.2) The omp-offload.cc assert showed that cloning and inlining happened
> for the included libgomp example. While inlining should be okay (of
> 'subroutine m2_tg_fn' (and for C/C++ 'tg_fn')) - cloning will break
> the offload_func table lookup - and, hence, had to be excluded →
> "noclone".
> I think it could also affect non-anchestor:1 code - but did not try to
> create an example.
>
>
> (c) Prepare for actual reverse offloading
> While (b) already does some prep work for real offloading, at least
> one more
> step is needed: In order to allow that the function pointer can be
> used for
> offload_func table lookup, it has to be passed to libgomp.
>
> Currently, the 'fn' argument is nullified in on-device calls to
> GOMP_target_ext.
> The third part of this patch nullifies it now only for non-reverse
> offloads.
>
> OK for mainline?
>
>  * * *
>
> Next steps: Implement reverse offloading for devices. In theory, this
> only
> requires libgomp work, but let's see what else will be required.
>
> Tobias
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955

[-- Attachment #2: omp-ancestor-ME-v2.diff --]
[-- Type: text/x-patch, Size: 72200 bytes --]

OpenMP: Support reverse offload (middle end part)

gcc/ChangeLog:

	* internal-fn.cc (expand_GOMP_TARGET_REV): New.
	* internal-fn.def (GOMP_TARGET_REV): New.
	* lto-cgraph.cc (lto_output_node, verify_node_partition): Mark
	'omp target device_ancestor_host' as in_other_partition and don't
	error if absent.
	* omp-low.cc (create_omp_child_function): Mark as 'noclone'.
	* omp-expand.cc (expand_omp_target): For reverse offload, remove
	sorry, use device = GOMP_DEVICE_HOST_FALLBACK and create
	empty-body nohost function.
	* omp-offload.cc (execute_omp_device_lower): Handle
	IFN_GOMP_TARGET_REV.
	(pass_omp_target_link::execute): For ACCEL_COMPILER, don't
	nullify fn argument for reverse offload

libgomp/ChangeLog:

	* libgomp.texi (OpenMP 5.0): Mark 'ancestor' as implemented but
	refer to 'requires'.
	* testsuite/libgomp.c-c++-common/reverse-offload-1-aux.c: New test.
	* testsuite/libgomp.c-c++-common/reverse-offload-1.c: New test.
	* testsuite/libgomp.fortran/reverse-offload-1-aux.f90: New test.
	* testsuite/libgomp.fortran/reverse-offload-1.f90: New test.

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/reverse-offload-1.c: Remove dg-sorry.
	* c-c++-common/gomp/target-device-ancestor-4.c: Likewise.
	* gfortran.dg/gomp/target-device-ancestor-4.f90: Likewise.
	* gfortran.dg/gomp/target-device-ancestor-5.f90: Likewise.
	* c-c++-common/goacc/classify-kernels-parloops.c: Add 'noclone' to
	scan-tree-dump-times.
	* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
	Likewise.
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/classify-parallel.c: Likewise.
	* c-c++-common/goacc/classify-serial.c: Likewise.
	* c-c++-common/goacc/kernels-counter-vars-function-scope.c: Likewise.
	* c-c++-common/goacc/kernels-loop-2.c: Likewise.
	* c-c++-common/goacc/kernels-loop-3.c: Likewise.
	* c-c++-common/goacc/kernels-loop-data-2.c: Likewise.
	* c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Likewise.
	* c-c++-common/goacc/kernels-loop-data-enter-exit.c: Likewise.
	* c-c++-common/goacc/kernels-loop-data-update.c: Likewise.
	* c-c++-common/goacc/kernels-loop-data.c: Likewise.
	* c-c++-common/goacc/kernels-loop-g.c: Likewise.
	* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
	* c-c++-common/goacc/kernels-loop-n.c: Likewise.
	* c-c++-common/goacc/kernels-loop-nest.c: Likewise.
	* c-c++-common/goacc/kernels-loop.c: Likewise.
	* c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
	* c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c: Likewise.
	* gfortran.dg/goacc/classify-kernels-parloops.f95: Likewise.
	* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
	Likewise.
	* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise.
	* gfortran.dg/goacc/classify-kernels.f95: Likewise.
	* gfortran.dg/goacc/classify-parallel.f95: Likewise.
	* gfortran.dg/goacc/classify-serial.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-2.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-data.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop-n.f95: Likewise.
	* gfortran.dg/goacc/kernels-loop.f95: Likewise.
	* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95: Likewise.

 gcc/internal-fn.cc                                 |   8 ++
 gcc/internal-fn.def                                |   1 +
 gcc/lto-cgraph.cc                                  |  20 +++-
 gcc/omp-expand.cc                                  | 107 +++++++++++++++++++--
 gcc/omp-low.cc                                     |   5 +
 gcc/omp-offload.cc                                 |  50 ++++++++++
 .../c-c++-common/goacc/classify-kernels-parloops.c |   6 +-
 .../classify-kernels-unparallelized-parloops.c     |   6 +-
 .../goacc/classify-kernels-unparallelized.c        |   6 +-
 .../c-c++-common/goacc/classify-kernels.c          |   6 +-
 .../c-c++-common/goacc/classify-parallel.c         |   4 +-
 gcc/testsuite/c-c++-common/goacc/classify-serial.c |   4 +-
 .../goacc/kernels-counter-vars-function-scope.c    |   2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c  |   2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c  |   2 +-
 .../c-c++-common/goacc/kernels-loop-data-2.c       |   2 +-
 .../goacc/kernels-loop-data-enter-exit-2.c         |   2 +-
 .../goacc/kernels-loop-data-enter-exit.c           |   2 +-
 .../c-c++-common/goacc/kernels-loop-data-update.c  |   2 +-
 .../c-c++-common/goacc/kernels-loop-data.c         |   2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c  |   2 +-
 .../c-c++-common/goacc/kernels-loop-mod-not-zero.c |   2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c  |   2 +-
 .../c-c++-common/goacc/kernels-loop-nest.c         |   2 +-
 gcc/testsuite/c-c++-common/goacc/kernels-loop.c    |   2 +-
 .../c-c++-common/goacc/kernels-one-counter-var.c   |   2 +-
 .../goacc/kernels-parallel-loop-data-enter-exit.c  |   2 +-
 .../c-c++-common/gomp/reverse-offload-1.c          |   2 +-
 .../c-c++-common/gomp/target-device-ancestor-4.c   |   2 +-
 .../goacc/classify-kernels-parloops.f95            |   6 +-
 .../classify-kernels-unparallelized-parloops.f95   |   6 +-
 .../goacc/classify-kernels-unparallelized.f95      |   6 +-
 .../gfortran.dg/goacc/classify-kernels.f95         |   6 +-
 .../gfortran.dg/goacc/classify-parallel.f95        |   4 +-
 .../gfortran.dg/goacc/classify-serial.f95          |   4 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 |   2 +-
 .../gfortran.dg/goacc/kernels-loop-data-2.f95      |   2 +-
 .../goacc/kernels-loop-data-enter-exit-2.f95       |   2 +-
 .../goacc/kernels-loop-data-enter-exit.f95         |   2 +-
 .../gfortran.dg/goacc/kernels-loop-data-update.f95 |   2 +-
 .../gfortran.dg/goacc/kernels-loop-data.f95        |   2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95 |   2 +-
 gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95   |   2 +-
 .../kernels-parallel-loop-data-enter-exit.f95      |   2 +-
 .../gfortran.dg/gomp/target-device-ancestor-4.f90  |   2 +-
 .../gfortran.dg/gomp/target-device-ancestor-5.f90  |   2 +-
 libgomp/libgomp.texi                               |   2 +-
 .../libgomp.c-c++-common/reverse-offload-1-aux.c   |  10 ++
 .../libgomp.c-c++-common/reverse-offload-1.c       |  83 ++++++++++++++++
 .../libgomp.fortran/reverse-offload-1-aux.f90      |  12 +++
 .../libgomp.fortran/reverse-offload-1.f90          |  88 +++++++++++++++++
 51 files changed, 433 insertions(+), 73 deletions(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 28973d957fb..44530142340 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -368,6 +368,14 @@ expand_GOMP_SIMT_VF (internal_fn, gcall *)
   gcc_unreachable ();
 }
 
+/* This should get expanded in omp_device_lower pass.  */
+
+static void
+expand_GOMP_TARGET_REV (internal_fn, gcall *)
+{
+  gcc_unreachable ();
+}
+
 /* Lane index of the first SIMT lane that supplies a non-zero argument.
    This is a SIMT counterpart to GOMP_SIMD_LAST_LANE, used to represent the
    lane that executed the last iteration for handling OpenMP lastprivate.  */
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 7c398baadc8..891bb8c363b 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -336,6 +336,7 @@ DEF_INTERNAL_INT_FN (FFS, ECF_CONST | ECF_NOTHROW, ffs, unary)
 DEF_INTERNAL_INT_FN (PARITY, ECF_CONST | ECF_NOTHROW, parity, unary)
 DEF_INTERNAL_INT_FN (POPCOUNT, ECF_CONST | ECF_NOTHROW, popcount, unary)
 
+DEF_INTERNAL_FN (GOMP_TARGET_REV, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_USE_SIMT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_ENTER, ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (GOMP_SIMT_ENTER_ALLOC, ECF_LEAF | ECF_NOTHROW, NULL)
diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc
index 6d9c36ea8b6..062677a32eb 100644
--- a/gcc/lto-cgraph.cc
+++ b/gcc/lto-cgraph.cc
@@ -430,6 +430,13 @@ lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node,
 	 after reading back.  */
       in_other_partition = 1;
     }
+  else if (UNLIKELY (lto_stream_offload_p
+		     && lookup_attribute ("omp target device_ancestor_host",
+					  DECL_ATTRIBUTES (node->decl))))
+    /* This symbol is only used as argument to IFN_GOMP_TARGET_REV; this IFN
+       is ignored on ACCEL_COMPILER.  Thus, mark it as in_other_partition to silence
+       verify_node_partition diagnostic.  */
+    in_other_partition = 1;
 
   clone_of = node->clone_of;
   while (clone_of
@@ -1140,10 +1147,15 @@ verify_node_partition (symtab_node *node)
   if (node->in_other_partition)
     {
       if (TREE_CODE (node->decl) == FUNCTION_DECL)
-	error_at (DECL_SOURCE_LOCATION (node->decl),
-		  "function %qs has been referenced in offloaded code but"
-		  " hasn%'t been marked to be included in the offloaded code",
-		  node->name ());
+	{
+	  if (lookup_attribute ("omp target device_ancestor_host",
+				DECL_ATTRIBUTES (node->decl)) != NULL)
+	    return;
+	  error_at (DECL_SOURCE_LOCATION (node->decl),
+		    "function %qs has been referenced in offloaded code but"
+		    " hasn%'t been marked to be included in the offloaded code",
+		    node->name ());
+	}
       else if (VAR_P (node->decl))
 	error_at (DECL_SOURCE_LOCATION (node->decl),
 		  "variable %qs has been referenced in offloaded code but"
diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc
index 1023c56fc3d..74b1588e35e 100644
--- a/gcc/omp-expand.cc
+++ b/gcc/omp-expand.cc
@@ -9651,7 +9651,7 @@ expand_omp_target (struct omp_region *region)
 {
   basic_block entry_bb, exit_bb, new_bb;
   struct function *child_cfun;
-  tree child_fn, block, t;
+  tree child_fn, child_fn2, block, t, c;
   gimple_stmt_iterator gsi;
   gomp_target *entry_stmt;
   gimple *stmt;
@@ -9688,10 +9688,16 @@ expand_omp_target (struct omp_region *region)
       gcc_unreachable ();
     }
 
-  child_fn = NULL_TREE;
+  tree clauses = gimple_omp_target_clauses (entry_stmt);
+
+  bool is_ancestor = false;
+  child_fn = child_fn2 = NULL_TREE;
   child_cfun = NULL;
   if (offloaded)
     {
+      c = omp_find_clause (clauses, OMP_CLAUSE_DEVICE);
+      if (ENABLE_OFFLOADING && c)
+	is_ancestor = OMP_CLAUSE_DEVICE_ANCESTOR (c);
       child_fn = gimple_omp_target_child_fn (entry_stmt);
       child_cfun = DECL_STRUCT_FUNCTION (child_fn);
     }
@@ -9879,7 +9885,8 @@ expand_omp_target (struct omp_region *region)
 	{
 	  if (in_lto_p)
 	    DECL_PRESERVE_P (child_fn) = 1;
-	  vec_safe_push (offload_funcs, child_fn);
+	  if (!is_ancestor)
+	    vec_safe_push (offload_funcs, child_fn);
 	}
 
       bool need_asm = DECL_ASSEMBLER_NAME_SET_P (current_function_decl)
@@ -9918,11 +9925,88 @@ expand_omp_target (struct omp_region *region)
 	}
 
       adjust_context_and_scope (region, gimple_block (entry_stmt), child_fn);
+
+      /* Handle the case that an inner ancestor:1 target is called by an outer
+	 target region. */
+      if (!is_ancestor)
+	cgraph_node::get (child_fn)->calls_declare_variant_alt
+	  |= cgraph_node::get (cfun->decl)->calls_declare_variant_alt;
+      else  /* Duplicate function to create empty nonhost variant. */
+	{
+	  /* Enable pass_omp_device_lower pass.  */
+	  cgraph_node::get (cfun->decl)->calls_declare_variant_alt = 1;
+	  cgraph_node *fn2_node;
+	  child_fn2 = build_decl (DECL_SOURCE_LOCATION (child_fn),
+				  FUNCTION_DECL,
+				  clone_function_name (child_fn, "nohost"),
+				  TREE_TYPE (child_fn));
+	  if (in_lto_p)
+	    DECL_PRESERVE_P (child_fn2) = 1;
+	  TREE_STATIC (child_fn2) = 1;
+	  DECL_ARTIFICIAL (child_fn2) = 1;
+	  DECL_IGNORED_P (child_fn2) = 0;
+	  TREE_PUBLIC (child_fn2) = 0;
+	  DECL_UNINLINABLE (child_fn2) = 1;
+	  DECL_EXTERNAL (child_fn2) = 0;
+	  DECL_CONTEXT (child_fn2) = NULL_TREE;
+	  DECL_INITIAL (child_fn2) = make_node (BLOCK);
+	  BLOCK_SUPERCONTEXT (DECL_INITIAL (child_fn2)) = child_fn2;
+	  DECL_ATTRIBUTES (child_fn)
+	    = remove_attribute ("omp target entrypoint",
+				DECL_ATTRIBUTES (child_fn));
+	  DECL_ATTRIBUTES (child_fn2)
+	    = tree_cons (get_identifier ("omp target device_ancestor_nohost"),
+			 NULL_TREE, copy_list (DECL_ATTRIBUTES (child_fn)));
+	  DECL_ATTRIBUTES (child_fn)
+	    = tree_cons (get_identifier ("omp target device_ancestor_host"),
+			 NULL_TREE, DECL_ATTRIBUTES (child_fn));
+	  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (child_fn2)
+	    = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (current_function_decl);
+	  DECL_FUNCTION_SPECIFIC_TARGET (child_fn2)
+	    = DECL_FUNCTION_SPECIFIC_TARGET (current_function_decl);
+	  DECL_FUNCTION_VERSIONED (child_fn2)
+	    = DECL_FUNCTION_VERSIONED (current_function_decl);
+
+	  fn2_node = cgraph_node::get_create (child_fn2);
+	  fn2_node->offloadable = 1;
+	  fn2_node->force_output = 1;
+	  node->offloadable = 0;
+
+	  t = build_decl (DECL_SOURCE_LOCATION (child_fn),
+			  RESULT_DECL, NULL_TREE, void_type_node);
+	  DECL_ARTIFICIAL (t) = 1;
+	  DECL_IGNORED_P (t) = 1;
+	  DECL_CONTEXT (t) = child_fn2;
+	  DECL_RESULT (child_fn2) = t;
+	  DECL_SAVED_TREE (child_fn2) = build1 (RETURN_EXPR,
+						void_type_node, NULL);
+	  tree tmp = DECL_ARGUMENTS (child_fn);
+	  t = build_decl (DECL_SOURCE_LOCATION (child_fn), PARM_DECL,
+			  DECL_NAME (tmp), TREE_TYPE (tmp));
+	  DECL_ARTIFICIAL (t) = 1;
+	  DECL_NAMELESS (t) = 1;
+	  DECL_ARG_TYPE (t) = ptr_type_node;
+	  DECL_CONTEXT (t) = current_function_decl;
+	  TREE_USED (t) = 1;
+	  TREE_READONLY (t) = 1;
+	  DECL_ARGUMENTS (child_fn2) = t;
+	  gcc_assert (TREE_CHAIN (tmp) == NULL_TREE);
+
+	  gimplify_function_tree (child_fn2);
+	  cgraph_node::add_new_function (child_fn2, true);
+
+	  vec_safe_push (offload_funcs, child_fn2);
+	  if (dump_file && !gimple_in_ssa_p (cfun))
+	    {
+	      dump_function_header (dump_file, child_fn2, dump_flags);
+	      dump_function_to_file (child_fn2, dump_file, dump_flags);
+	    }
+	}
     }
 
   /* Emit a library call to launch the offloading region, or do data
      transfers.  */
-  tree t1, t2, t3, t4, depend, c, clauses;
+  tree t1, t2, t3, t4, depend;
   enum built_in_function start_ix;
   unsigned int flags_i = 0;
 
@@ -9972,8 +10056,6 @@ expand_omp_target (struct omp_region *region)
       gcc_unreachable ();
     }
 
-  clauses = gimple_omp_target_clauses (entry_stmt);
-
   tree device = NULL_TREE;
   location_t device_loc = UNKNOWN_LOCATION;
   tree goacc_flags = NULL_TREE;
@@ -10005,7 +10087,8 @@ expand_omp_target (struct omp_region *region)
 	    need_device_adjustment = true;
 	  device_loc = OMP_CLAUSE_LOCATION (c);
 	  if (OMP_CLAUSE_DEVICE_ANCESTOR (c))
-	    sorry_at (device_loc, "%<ancestor%> not yet supported");
+	    device = build_int_cst (integer_type_node,
+				    GOMP_DEVICE_HOST_FALLBACK);
 	}
       else
 	{
@@ -10182,7 +10265,7 @@ expand_omp_target (struct omp_region *region)
   else
     args.quick_push (device);
   if (offloaded)
-    args.quick_push (build_fold_addr_expr (child_fn));
+    args.quick_push (build_fold_addr_expr (child_fn2 ? child_fn2 : child_fn));
   args.quick_push (t1);
   args.quick_push (t2);
   args.quick_push (t3);
@@ -10304,6 +10387,14 @@ expand_omp_target (struct omp_region *region)
     /*  Push terminal marker - zero.  */
     args.safe_push (oacc_launch_pack (0, NULL_TREE, 0));
 
+  if (child_fn2)
+    {
+      g = gimple_build_call_internal (IFN_GOMP_TARGET_REV, 1,
+				      build_fold_addr_expr (child_fn));
+      gimple_set_location (g, gimple_location (entry_stmt));
+      gsi_insert_before (&gsi, g, GSI_SAME_STMT);
+    }
+
   g = gimple_build_call_vec (builtin_decl_explicit (start_ix), args);
   gimple_set_location (g, gimple_location (entry_stmt));
   gsi_insert_before (&gsi, g, GSI_SAME_STMT);
diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc
index d73c165f029..a26aab677b9 100644
--- a/gcc/omp-low.cc
+++ b/gcc/omp-low.cc
@@ -2101,6 +2101,11 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
 	  else
 	    target_attr = NULL;
 	}
+      if (target_attr
+	  && is_gimple_omp_offloaded (ctx->stmt)
+	  && lookup_attribute ("noclone", DECL_ATTRIBUTES (decl)) == NULL_TREE)
+	DECL_ATTRIBUTES (decl) = tree_cons (get_identifier ("noclone"),
+					   NULL_TREE, DECL_ATTRIBUTES (decl));
       if (target_attr)
 	DECL_ATTRIBUTES (decl)
 	  = tree_cons (get_identifier (target_attr),
diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc
index 3a89119371c..77be0665267 100644
--- a/gcc/omp-offload.cc
+++ b/gcc/omp-offload.cc
@@ -2627,6 +2627,47 @@ execute_omp_device_lower ()
 	tree type = lhs ? TREE_TYPE (lhs) : integer_type_node;
 	switch (gimple_call_internal_fn (stmt))
 	  {
+	  case IFN_GOMP_TARGET_REV:
+	    {
+#ifndef ACCEL_COMPILER
+	      gimple_stmt_iterator gsi2 = gsi;
+	      gsi_next (&gsi2);
+	      gcc_assert (!gsi_end_p (gsi2));
+	      gcc_assert (gimple_call_builtin_p (gsi_stmt (gsi2),
+						 BUILT_IN_GOMP_TARGET));
+	      tree old_decl
+		= TREE_OPERAND (gimple_call_arg (gsi_stmt (gsi2), 1), 0);
+	      tree new_decl = gimple_call_arg (gsi_stmt (gsi), 0);
+	      gimple_call_set_arg (gsi_stmt (gsi2), 1, new_decl);
+	      update_stmt (gsi_stmt (gsi2));
+	      new_decl = TREE_OPERAND (new_decl, 0);
+	      unsigned i;
+	      unsigned num_funcs = vec_safe_length (offload_funcs);
+	      for (i = 0; i < num_funcs; i++)
+		{
+		  if ((*offload_funcs)[i] == old_decl)
+		    {
+		      (*offload_funcs)[i] = new_decl;
+		      break;
+		    }
+		  else if ((*offload_funcs)[i] == new_decl)
+		    break;  /* This can happen due to inlining.  */
+		}
+	      gcc_assert (i < num_funcs);
+#else
+	      tree old_decl = TREE_OPERAND (gimple_call_arg (gsi_stmt (gsi), 0),
+					    0);
+#endif
+	      /* FIXME: Find a way to actually prevent outputting the empty-body
+		 old_decl as debug symbol + function in the assembly file.  */
+	      cgraph_node *node = cgraph_node::get (old_decl);
+	      node->address_taken = false;
+	      node->need_lto_streaming = false;
+	      node->offloadable = false;
+
+	      unlink_stmt_vdef (stmt);
+	    }
+	    break;
 	  case IFN_GOMP_USE_SIMT:
 	    rhs = vf == 1 ? integer_zero_node : integer_one_node;
 	    break;
@@ -2803,6 +2844,15 @@ pass_omp_target_link::execute (function *fun)
 	{
 	  if (gimple_call_builtin_p (gsi_stmt (gsi), BUILT_IN_GOMP_TARGET))
 	    {
+	      tree dev = gimple_call_arg (gsi_stmt (gsi), 0);
+	      tree fn = gimple_call_arg (gsi_stmt (gsi), 1);
+	      if (POINTER_TYPE_P (TREE_TYPE (fn)))
+		fn = TREE_OPERAND (fn, 0);
+	      if (TREE_CODE (dev) == INTEGER_CST
+		  && wi::to_wide (dev) == GOMP_DEVICE_HOST_FALLBACK
+		  && lookup_attribute ("omp target device_ancestor_nohost",
+				       DECL_ATTRIBUTES (fn)) != NULL_TREE)
+		continue;  /* ancestor:1  */
 	      /* Nullify the second argument of __builtin_GOMP_target_ext.  */
 	      gimple_call_set_arg (gsi_stmt (gsi), 1, null_pointer_node);
 	      update_stmt (gsi_stmt (gsi));
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-parloops.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-parloops.c
index 5f470eb86bc..fc2b6375002 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-parloops.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-parloops.c
@@ -27,16 +27,16 @@ void KERNELS ()
 }
 
 /* Check the offloaded function's attributes.
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint\\)\\)" 1 "ompexp" } } */
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "ompexp" } } */
 
 /* Check that exactly one OpenACC kernels construct is analyzed, and that it
    can be parallelized.
    { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "parloops1" } }
    { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
    { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccloops" } }
    { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } } */
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "oaccloops" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c
index 06c70fb9d9f..08e69c4062c 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c
@@ -31,16 +31,16 @@ void KERNELS ()
 }
 
 /* Check the offloaded function's attributes.
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint\\)\\)" 1 "ompexp" } } */
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "ompexp" } } */
 
 /* Check that exactly one OpenACC kernels construct is analyzed, and that it
    can't be parallelized.
    { dg-final { scan-tree-dump-times "FAILED:" 1 "parloops1" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "parloops1" } }
    { dg-final { scan-tree-dump-not "SUCCESS: may be parallelized" "parloops1" } } */
 
 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
    { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccloops" } }
    { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } } */
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "oaccloops" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
index 61871d118a9..ce38079f2de 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
@@ -40,16 +40,16 @@ void KERNELS ()
 }
 
 /* Check the offloaded function's attributes.
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint\\)\\)" 1 "ompexp" } } */
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "ompexp" } } */
 
 /* Check that exactly one OpenACC kernels construct is analyzed, and that it
    can't be parallelized.
    { dg-final { scan-tree-dump-times "FAILED:" 1 "parloops1" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "parloops1" } }
    { dg-final { scan-tree-dump-not "SUCCESS: may be parallelized" "parloops1" } } */
 
 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
    { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccloops" } }
    { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } } */
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "oaccloops" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
index 1473337986f..31c12005749 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-kernels.c
@@ -36,16 +36,16 @@ void KERNELS ()
 }
 
 /* Check the offloaded function's attributes.
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint\\)\\)" 1 "ompexp" } } */
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "ompexp" } } */
 
 /* Check that exactly one OpenACC kernels construct is analyzed, and that it
    can be parallelized.
    { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "parloops1" } }
    { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
    { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccloops" } }
    { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } } */
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "oaccloops" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-parallel.c b/gcc/testsuite/c-c++-common/goacc/classify-parallel.c
index 61d03c0a5c4..19101fa9505 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-parallel.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-parallel.c
@@ -24,10 +24,10 @@ void PARALLEL ()
 }
 
 /* Check the offloaded function's attributes.
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc parallel, omp target entrypoint\\)\\)" 1 "ompexp" } } */
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc parallel, omp target entrypoint, noclone\\)\\)" 1 "ompexp" } } */
 
 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
    { dg-final { scan-tree-dump-times "(?n)Function is OpenACC parallel offload" 1 "oaccloops" } }
    { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc parallel, omp target entrypoint\\)\\)" 1 "oaccloops" } } */
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc parallel, omp target entrypoint, noclone\\)\\)" 1 "oaccloops" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/classify-serial.c b/gcc/testsuite/c-c++-common/goacc/classify-serial.c
index 71b8c727cdf..6480fe30046 100644
--- a/gcc/testsuite/c-c++-common/goacc/classify-serial.c
+++ b/gcc/testsuite/c-c++-common/goacc/classify-serial.c
@@ -29,10 +29,10 @@ void SERIAL ()
 }
 
 /* Check the offloaded function's attributes.
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc serial, omp target entrypoint\\)\\)" 1 "ompexp" } } */
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc serial, omp target entrypoint, noclone\\)\\)" 1 "ompexp" } } */
 
 /* Check the offloaded function's classification and compute dimensions (will
    always be 1 x 1 x 1 for non-offloading compilation).
    { dg-final { scan-tree-dump-times "(?n)Function is OpenACC serial offload" 1 "oaccloops" } }
    { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc serial, omp target entrypoint\\)\\)" 1 "oaccloops" } } */
+   { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc serial, omp target entrypoint, noclone\\)\\)" 1 "oaccloops" } } */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c b/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
index c475333f1ae..bce17a1011c 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-counter-vars-function-scope.c
@@ -45,7 +45,7 @@ main (void)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
index acef6a1a179..456a95451fa 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-2.c
@@ -59,7 +59,7 @@ main (void)
 /* Check that only three loops are analyzed, and that all can be
    parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 3 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
index 75e2bb78cea..ef0c481dea0 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-3.c
@@ -39,7 +39,7 @@ main (void)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c
index 71800217991..a64007815c1 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-2.c
@@ -59,7 +59,7 @@ main (void)
 /* Check that only three loops are analyzed, and that all can be
    parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 3 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c
index 0c9f8331240..d46bba8ec3a 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit-2.c
@@ -57,7 +57,7 @@ main (void)
 /* Check that only three loops are analyzed, and that all can be
    parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 3 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c
index 0bd21b68d31..6698980a86f 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-enter-exit.c
@@ -54,7 +54,7 @@ main (void)
 /* Check that only three loops are analyzed, and that all can be
    parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 3 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c
index dd5a84146a8..9c6997f86b0 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data-update.c
@@ -55,7 +55,7 @@ main (void)
 /* Check that only two loops are analyzed, and that both can be
    parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops1" } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 2 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 2 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c
index a658182de90..0ef576521b0 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-data.c
@@ -53,7 +53,7 @@ main (void)
 /* Check that only three loops are analyzed, and that all can be
    parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 3 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
index 5bdaa40b02c..cf9f1ee6b28 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-g.c
@@ -10,7 +10,7 @@
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
index 55926230d57..eea9a735e8b 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-mod-not-zero.c
@@ -43,7 +43,7 @@ main (void)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
index e86be1b1cdc..a63ed8ab1fd 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-n.c
@@ -46,7 +46,7 @@ foo (COUNTERTYPE n)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
index 2b0e186ae29..b3bb5caaa5f 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop-nest.c
@@ -30,7 +30,7 @@ main (void)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-loop.c b/gcc/testsuite/c-c++-common/goacc/kernels-loop.c
index 9619d53b43d..2602081f2c6 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-loop.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-loop.c
@@ -46,7 +46,7 @@ main (void)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c b/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
index 69539b24a78..0d9c875f3d9 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-one-counter-var.c
@@ -44,7 +44,7 @@ main (void)
 
 /* Check that only one loop is analyzed, and that it can be parallelized.  */
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "parloops1" } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } } */
 
 /* Check that the loop has been split off into a function.  */
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c b/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c
index 81b0fee5a44..1dcd32fb8dc 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c
@@ -57,7 +57,7 @@ main (void)
 // FIXME: OpenACC kernels stopped working with the firstprivate subarray
 // changes.
 /* { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops1" { xfail *-*-* } } } */
-/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 2 "parloops1" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 2 "parloops1" { xfail *-*-* } } } */
 /* { dg-final { scan-tree-dump-not "FAILED:" "parloops1" { xfail *-*-* } } } */
 
 /* Check that the loop has been split off into a function.  */
diff --git a/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c b/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
index 3452156f948..9a3fa5230f8 100644
--- a/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
@@ -43,7 +43,7 @@ tg_fn (int *x, int *y)
   x2 = x2 + 2 + called_in_target1 ();
   y2 = y2 + 7;
 
-  #pragma omp target device(ancestor : 1) map(tofrom: x2)  /* { dg-message "sorry, unimplemented: 'ancestor' not yet supported" } */
+  #pragma omp target device(ancestor : 1) map(tofrom: x2)
     check_offload(&x2, &y2);
 
   if (x2 != 2+2+3+42 || y2 != 3 + 7)
diff --git a/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-4.c b/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-4.c
index 241234f8daf..87ac7548c23 100644
--- a/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-4.c
+++ b/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-4.c
@@ -9,7 +9,7 @@
 void
 foo (void)
 {
-  #pragma omp target device (ancestor: 1) /* { dg-message "sorry, unimplemented: 'ancestor' not yet supported" } */
+  #pragma omp target device (ancestor: 1)
   ;
 
 }
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-parloops.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-parloops.f95
index 96814a1697d..5dd763faffe 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-parloops.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-parloops.f95
@@ -29,16 +29,16 @@ program main
 end program main
 
 ! Check the offloaded function's attributes.
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint\\)\\)" 1 "ompexp" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "ompexp" } }
 
 ! Check that exactly one OpenACC kernels construct is analyzed, and that it
 ! can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check the offloaded function's classification and compute dimensions (will
 ! always be 1 x 1 x 1 for non-offloading compilation).
 ! { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccloops" } }
 ! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "oaccloops" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95
index e858617bbc6..eb165e552c6 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95
@@ -33,16 +33,16 @@ program main
 end program main
 
 ! Check the offloaded function's attributes.
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint\\)\\)" 1 "ompexp" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "ompexp" } }
 
 ! Check that exactly one OpenACC kernels construct is analyzed, and that it
 ! can't be parallelized.
 ! { dg-final { scan-tree-dump-times "FAILED:" 1 "parloops1" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "SUCCESS: may be parallelized" "parloops1" } }
 
 ! Check the offloaded function's classification and compute dimensions (will
 ! always be 1 x 1 x 1 for non-offloading compilation).
 ! { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccloops" } }
 ! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "oaccloops" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
index f8897fc5b34..e8ceda9b877 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized.f95
@@ -34,16 +34,16 @@ program main
 end program main
 
 ! Check the offloaded function's attributes.
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint\\)\\)" 1 "ompexp" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "ompexp" } }
 
 ! Check that exactly one OpenACC kernels construct is analyzed, and that it
 ! can't be parallelized.
 ! { dg-final { scan-tree-dump-times "FAILED:" 1 "parloops1" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "SUCCESS: may be parallelized" "parloops1" } }
 
 ! Check the offloaded function's classification and compute dimensions (will
 ! always be 1 x 1 x 1 for non-offloading compilation).
 ! { dg-final { scan-tree-dump-times "(?n)Function is unparallelized OpenACC kernels offload" 1 "oaccloops" } }
 ! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "oaccloops" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
index 2ed6cdb6115..7eb79188b82 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-kernels.f95
@@ -32,16 +32,16 @@ program main
 end program main
 
 ! Check the offloaded function's attributes.
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint\\)\\)" 1 "ompexp" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "ompexp" } }
 
 ! Check that exactly one OpenACC kernels construct is analyzed, and that it
 ! can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check the offloaded function's classification and compute dimensions (will
 ! always be 1 x 1 x 1 for non-offloading compilation).
 ! { dg-final { scan-tree-dump-times "(?n)Function is parallelized OpenACC kernels offload" 1 "oaccloops" } }
 ! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "oaccloops" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "oaccloops" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-parallel.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-parallel.f95
index adc38465d52..8fa1e946e8c 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-parallel.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-parallel.f95
@@ -26,10 +26,10 @@ program main
 end program main
 
 ! Check the offloaded function's attributes.
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc parallel, omp target entrypoint\\)\\)" 1 "ompexp" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc parallel, omp target entrypoint, noclone\\)\\)" 1 "ompexp" } }
 
 ! Check the offloaded function's classification and compute dimensions (will
 ! always be 1 x 1 x 1 for non-offloading compilation).
 ! { dg-final { scan-tree-dump-times "(?n)Function is OpenACC parallel offload" 1 "oaccloops" } }
 ! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc parallel, omp target entrypoint\\)\\)" 1 "oaccloops" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc parallel, omp target entrypoint, noclone\\)\\)" 1 "oaccloops" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/classify-serial.f95 b/gcc/testsuite/gfortran.dg/goacc/classify-serial.f95
index 21015879703..996b2ae2270 100644
--- a/gcc/testsuite/gfortran.dg/goacc/classify-serial.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/classify-serial.f95
@@ -29,10 +29,10 @@ program main
 end program main
 
 ! Check the offloaded function's attributes.
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc serial, omp target entrypoint\\)\\)" 1 "ompexp" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc serial, omp target entrypoint, noclone\\)\\)" 1 "ompexp" } }
 
 ! Check the offloaded function's classification and compute dimensions (will
 ! always be 1 x 1 x 1 for non-offloading compilation).
 ! { dg-final { scan-tree-dump-times "(?n)Function is OpenACC serial offload" 1 "oaccloops" } }
 ! { dg-final { scan-tree-dump-times "(?n)Compute dimensions \\\[1, 1, 1\\\]" 1 "oaccloops" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc serial, omp target entrypoint\\)\\)" 1 "oaccloops" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc function \\(1, 1, 1\\), oacc serial, omp target entrypoint, noclone\\)\\)" 1 "oaccloops" } }
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
index ef53324dd2a..8f54db7d654 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-2.f95
@@ -34,7 +34,7 @@ end program main
 
 ! Check that only three loops are analyzed, and that all can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 3 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check that the loop has been split off into a function.
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
index 2f1dcd603a1..22282ec4640 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-2.f95
@@ -40,7 +40,7 @@ end program main
 
 ! Check that only three loops are analyzed, and that all can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 3 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check that the loop has been split off into a function.
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
index 447e85d6448..9d057f3db17 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95
@@ -40,7 +40,7 @@ end program main
 
 ! Check that only three loops are analyzed, and that all can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 3 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check that the loop has been split off into a function.
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
index 4edb2889b7b..e0e856451f7 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-enter-exit.f95
@@ -38,7 +38,7 @@ end program main
 
 ! Check that only three loops are analyzed, and that all can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 3 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check that the loop has been split off into a function.
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
index fc113e1f660..53283715e8f 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-update.f95
@@ -38,7 +38,7 @@ end program main
 
 ! Check that only three loops are analyzed, and that all can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops1" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 2 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 2 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check that the loop has been split off into a function.
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
index 94522f58636..da9fe2e35b3 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-data.f95
@@ -38,7 +38,7 @@ end program main
 
 ! Check that only three loops are analyzed, and that all can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 3 "parloops1" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 3 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 3 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check that the loop has been split off into a function.
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95
index b9c4aea074d..0104c086032 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop-n.f95
@@ -33,7 +33,7 @@ end module test
 ! Check that only one loop is analyzed, and that it can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } }
 ! TODO, PR70545.
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" { xfail *-*-* } } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "parloops1" { xfail *-*-* } } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check that the loop has been split off into a function.
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
index 6dc7b2e0f28..e3e74e8d789 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-loop.f95
@@ -30,7 +30,7 @@ end program main
 
 ! Check that only one loop is analyzed, and that it can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 1 "parloops1" } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 1 "parloops1" } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 1 "parloops1" } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" } }
 
 ! Check that the loop has been split off into a function.
diff --git a/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95 b/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95
index 48c20b99942..5b6ae0573f5 100644
--- a/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95
+++ b/gcc/testsuite/gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95
@@ -39,7 +39,7 @@ end program main
 
 ! Check that only three loops are analyzed, and that all can be parallelized.
 ! { dg-final { scan-tree-dump-times "SUCCESS: may be parallelized" 2 "parloops1" { xfail *-*-* } } }
-! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint\\)\\)" 2 "parloops1" { xfail *-*-* } } }
+! { dg-final { scan-tree-dump-times "(?n)__attribute__\\(\\(oacc kernels parallelized, oacc function \\(, , \\), oacc kernels, omp target entrypoint, noclone\\)\\)" 2 "parloops1" { xfail *-*-* } } }
 ! { dg-final { scan-tree-dump-not "FAILED:" "parloops1" { xfail *-*-* } } }
 
 ! Check that the loop has been split off into a function.
diff --git a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90 b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90
index ab56e2d1d52..d73adf2c5a7 100644
--- a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90
@@ -6,7 +6,7 @@
 
 !$omp requires reverse_offload
 
-!$omp target device (ancestor : 1)  ! { dg-message "sorry, unimplemented: 'ancestor' not yet supported" }
+!$omp target device (ancestor : 1)
 !$omp end target
 
 end
diff --git a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-5.f90 b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-5.f90
index ca8d4b282a0..9596d61f6fa 100644
--- a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-5.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-5.f90
@@ -17,7 +17,7 @@ contains
     block
       block
         block
-          !$omp target device(ancestor:1)  ! { dg-message "sorry, unimplemented: 'ancestor' not yet supported" }
+          !$omp target device(ancestor:1)
           !$omp end target
         end block
       end block
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index e88fe89a5b1..0f2998cf8f1 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -225,7 +225,7 @@ The OpenMP 4.5 specification is fully supported.
 @item @code{allocate} clause @tab P @tab Initial support
 @item @code{use_device_addr} clause on @code{target data} @tab Y @tab
 @item @code{ancestor} modifier on @code{device} clause
-      @tab P @tab Reverse offload unsupported
+      @tab Y @tab See comment for @code{requires}
 @item Implicit declare target directive @tab Y @tab
 @item Discontiguous array section with @code{target update} construct
       @tab N @tab
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1-aux.c b/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1-aux.c
new file mode 100644
index 00000000000..b3a331d12da
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1-aux.c
@@ -0,0 +1,10 @@
+/* { dg-do compile { target skip-all-targets } }  */
+
+/* Declare the following function in a separare translation unit
+   to ensure it won't have a device version.  */
+
+int
+add_3 (int x)
+{
+  return x + 3;
+}
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c b/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c
new file mode 100644
index 00000000000..976e129f560
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c
@@ -0,0 +1,83 @@
+/* { dg-do run }  */
+/* { dg-additional-sources reverse-offload-1-aux.c } */
+
+/* Check that reverse offload works in particular:
+   - no code is generated on the device side (i.e. no
+     implicit declare target of called functions and no
+     code gen for the target-region body)
+     -> would otherwise fail due to 'add_3' symbol
+   - Plus the usual (compiles, runs, produces correct result)
+
+   Note: Running also the non-reverse-offload target regions
+   on the host (host fallback) is valid and will pass.  */
+
+#pragma omp requires reverse_offload
+
+extern int add_3 (int);
+
+static int global_var = 5;
+
+void
+check_offload (int *x, int *y)
+{
+  *x = add_3 (*x);
+  *y = add_3 (*y);
+}
+
+#pragma omp declare target
+void
+tg_fn (int *x, int *y)
+{
+  int x2 = *x, y2 = *y;
+  if (x2 != 2 || y2 != 3)
+    __builtin_abort ();
+  x2 = x2 + 2;
+  y2 = y2 + 7;
+
+  #pragma omp target device(ancestor : 1) map(tofrom: x2)
+    check_offload(&x2, &y2);
+
+  if (x2 != 2+2+3 || y2 != 3 + 7)
+    __builtin_abort ();
+  *x = x2, *y = y2;
+}
+#pragma omp end declare target
+
+void
+my_func (int *x, int *y)
+{
+  if (global_var != 5)
+    __builtin_abort ();
+  global_var = 242;
+  *x = 2*add_3(*x);
+  *y = 3*add_3(*y);
+}
+
+int
+main ()
+{
+  #pragma omp target
+  {
+     int x = 2, y = 3;
+     tg_fn (&x, &y);
+  }
+
+  #pragma omp target
+  {
+     int x = -2, y = -1;
+     #pragma omp target device ( ancestor:1 ) firstprivate(y) map(tofrom:x)
+     {
+       if (x != -2 || y != -1)
+         __builtin_abort ();
+       my_func (&x, &y);
+       if (x != 2*(3-2) || y != 3*(3-1))
+         __builtin_abort ();
+     }
+     if (x != 2*(3-2) || y != -1)
+       __builtin_abort ();
+  }
+
+  if (global_var != 242)
+    __builtin_abort ();
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/reverse-offload-1-aux.f90 b/libgomp/testsuite/libgomp.fortran/reverse-offload-1-aux.f90
new file mode 100644
index 00000000000..1807f063d5a
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/reverse-offload-1-aux.f90
@@ -0,0 +1,12 @@
+! { dg-do compile { target skip-all-targets } }
+
+! Declare the following function in a separare translation unit
+! to ensure it won't have a device version.
+
+
+integer function add_3 (x)
+  implicit none
+  integer, value :: x
+
+  add_3 = x + 3
+end function
diff --git a/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90 b/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90
new file mode 100644
index 00000000000..7cfb8b6552e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90
@@ -0,0 +1,88 @@
+! { dg-do run }
+! { dg-additional-sources reverse-offload-1-aux.f90 }
+
+! Check that reverse offload works in particular:
+! - no code is generated on the device side (i.e. no
+!   implicit declare target of called functions and no
+!   code gen for the target-region body)
+!   -> would otherwise fail due to 'add_3' symbol
+! - Plus the usual (compiles, runs, produces correct result)
+
+! Note: Running also the non-reverse-offload target regions
+! on the host (host fallback) is valid and will pass.
+
+module m
+  interface
+    integer function add_3 (x)
+      implicit none
+      integer, value :: x
+    end function
+  end interface
+  integer :: global_var = 5
+end module m
+
+module m2
+  use m
+  !$omp requires reverse_offload
+  implicit none (type, external)
+contains
+  subroutine check_offload (x, y)
+    integer :: x, y
+    x = add_3(x)
+    y = add_3(y)
+  end subroutine check_offload
+  subroutine m2_tg_fn(x, y)
+    integer :: x, y
+    !$omp declare target
+    if (x /= 2 .or. y /= 3) stop 1
+    x = x + 2
+    y = y + 7
+    !$omp target device(ancestor : 1) map(tofrom: x)
+      call check_offload(x, y)
+    !$omp end target
+    if (x /= 2+2+3 .or. y /= 3 + 7) stop 2
+  end subroutine
+end module m2
+
+program main
+  use m
+  !$omp requires reverse_offload
+  implicit none (type, external)
+
+  integer :: prog_var = 99
+
+  !$omp target
+  block
+    use m2
+    integer :: x, y
+    x = 2; y = 3
+    call m2_tg_fn (x, y)
+  end block
+
+  !$omp target
+  block
+    use m2
+    integer :: x, y
+    x = -2; y = -1
+    !$omp target device ( ancestor:1 ) firstprivate(y) map(tofrom:x)
+      if (x /= -2 .or. y /= -1) stop 3
+      call my_func (x, y)
+      if (x /= 2*(3-2) .or. y /= 3*(3-1)) stop 5
+    !$omp end target
+    if (x /= 2*(3-2) .or. y /= -1) stop 6
+  end block
+
+  if (prog_var /= 41 .or. global_var /= 242) stop 7
+
+contains
+
+  subroutine my_func(x, y)
+    integer :: x, y
+    if (prog_var /= 99) stop 8
+    if (global_var /= 5) stop 9
+    prog_var = 41
+    global_var = 242
+    x = 2*add_3(x)
+    y = 3*add_3(y)
+  end subroutine my_func
+end

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Patch] OpenMP: Support reverse offload (middle end part)
  2022-07-21 12:33 ` Tobias Burnus
@ 2022-08-26  9:53   ` Jakub Jelinek
  0 siblings, 0 replies; 3+ messages in thread
From: Jakub Jelinek @ 2022-08-26  9:53 UTC (permalink / raw)
  To: Tobias Burnus; +Cc: gcc-patches

On Thu, Jul 21, 2022 at 02:33:32PM +0200, Tobias Burnus wrote:
> OpenMP: Support reverse offload (middle end part)
> 
> gcc/ChangeLog:
> 
> 	* internal-fn.cc (expand_GOMP_TARGET_REV): New.
> 	* internal-fn.def (GOMP_TARGET_REV): New.
> 	* lto-cgraph.cc (lto_output_node, verify_node_partition): Mark
> 	'omp target device_ancestor_host' as in_other_partition and don't
> 	error if absent.
> 	* omp-low.cc (create_omp_child_function): Mark as 'noclone'.
> 	* omp-expand.cc (expand_omp_target): For reverse offload, remove
> 	sorry, use device = GOMP_DEVICE_HOST_FALLBACK and create
> 	empty-body nohost function.
> 	* omp-offload.cc (execute_omp_device_lower): Handle
> 	IFN_GOMP_TARGET_REV.
> 	(pass_omp_target_link::execute): For ACCEL_COMPILER, don't
> 	nullify fn argument for reverse offload
> 
> libgomp/ChangeLog:
> 
> 	* libgomp.texi (OpenMP 5.0): Mark 'ancestor' as implemented but
> 	refer to 'requires'.
> 	* testsuite/libgomp.c-c++-common/reverse-offload-1-aux.c: New test.
> 	* testsuite/libgomp.c-c++-common/reverse-offload-1.c: New test.
> 	* testsuite/libgomp.fortran/reverse-offload-1-aux.f90: New test.
> 	* testsuite/libgomp.fortran/reverse-offload-1.f90: New test.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* c-c++-common/gomp/reverse-offload-1.c: Remove dg-sorry.
> 	* c-c++-common/gomp/target-device-ancestor-4.c: Likewise.
> 	* gfortran.dg/gomp/target-device-ancestor-4.f90: Likewise.
> 	* gfortran.dg/gomp/target-device-ancestor-5.f90: Likewise.
> 	* c-c++-common/goacc/classify-kernels-parloops.c: Add 'noclone' to
> 	scan-tree-dump-times.
> 	* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
> 	Likewise.
> 	* c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise.
> 	* c-c++-common/goacc/classify-kernels.c: Likewise.
> 	* c-c++-common/goacc/classify-parallel.c: Likewise.
> 	* c-c++-common/goacc/classify-serial.c: Likewise.
> 	* c-c++-common/goacc/kernels-counter-vars-function-scope.c: Likewise.
> 	* c-c++-common/goacc/kernels-loop-2.c: Likewise.
> 	* c-c++-common/goacc/kernels-loop-3.c: Likewise.
> 	* c-c++-common/goacc/kernels-loop-data-2.c: Likewise.
> 	* c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Likewise.
> 	* c-c++-common/goacc/kernels-loop-data-enter-exit.c: Likewise.
> 	* c-c++-common/goacc/kernels-loop-data-update.c: Likewise.
> 	* c-c++-common/goacc/kernels-loop-data.c: Likewise.
> 	* c-c++-common/goacc/kernels-loop-g.c: Likewise.
> 	* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
> 	* c-c++-common/goacc/kernels-loop-n.c: Likewise.
> 	* c-c++-common/goacc/kernels-loop-nest.c: Likewise.
> 	* c-c++-common/goacc/kernels-loop.c: Likewise.
> 	* c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
> 	* c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c: Likewise.
> 	* gfortran.dg/goacc/classify-kernels-parloops.f95: Likewise.
> 	* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
> 	Likewise.
> 	* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Likewise.
> 	* gfortran.dg/goacc/classify-kernels.f95: Likewise.
> 	* gfortran.dg/goacc/classify-parallel.f95: Likewise.
> 	* gfortran.dg/goacc/classify-serial.f95: Likewise.
> 	* gfortran.dg/goacc/kernels-loop-2.f95: Likewise.
> 	* gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise.
> 	* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise.
> 	* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise.
> 	* gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise.
> 	* gfortran.dg/goacc/kernels-loop-data.f95: Likewise.
> 	* gfortran.dg/goacc/kernels-loop-n.f95: Likewise.
> 	* gfortran.dg/goacc/kernels-loop.f95: Likewise.
> 	* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95: Likewise.

Ok for trunk, just a comment regarding the FIXME below (can be handled
incrementally).

> +	  case IFN_GOMP_TARGET_REV:
> +	    {
> +#ifndef ACCEL_COMPILER
> +	      gimple_stmt_iterator gsi2 = gsi;
> +	      gsi_next (&gsi2);
> +	      gcc_assert (!gsi_end_p (gsi2));
> +	      gcc_assert (gimple_call_builtin_p (gsi_stmt (gsi2),
> +						 BUILT_IN_GOMP_TARGET));
> +	      tree old_decl
> +		= TREE_OPERAND (gimple_call_arg (gsi_stmt (gsi2), 1), 0);
> +	      tree new_decl = gimple_call_arg (gsi_stmt (gsi), 0);
> +	      gimple_call_set_arg (gsi_stmt (gsi2), 1, new_decl);
> +	      update_stmt (gsi_stmt (gsi2));
> +	      new_decl = TREE_OPERAND (new_decl, 0);
> +	      unsigned i;
> +	      unsigned num_funcs = vec_safe_length (offload_funcs);
> +	      for (i = 0; i < num_funcs; i++)
> +		{
> +		  if ((*offload_funcs)[i] == old_decl)
> +		    {
> +		      (*offload_funcs)[i] = new_decl;
> +		      break;
> +		    }
> +		  else if ((*offload_funcs)[i] == new_decl)
> +		    break;  /* This can happen due to inlining.  */
> +		}
> +	      gcc_assert (i < num_funcs);
> +#else
> +	      tree old_decl = TREE_OPERAND (gimple_call_arg (gsi_stmt (gsi), 0),
> +					    0);
> +#endif
> +	      /* FIXME: Find a way to actually prevent outputting the empty-body
> +		 old_decl as debug symbol + function in the assembly file.  */

The debug stuff ought to be through DECL_IGNORED_P on the FUNCTION_DECL.
If you want it set just on one side and clear on the other side, perhaps set
or clear it during lto streaming it in in offload lto1?
As for emitting it, perhaps turning it into an external declaration from
definition afterwards?

	Jakub


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2022-08-26  9:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-21 10:55 [Patch] OpenMP: Support reverse offload (middle end part) Tobias Burnus
2022-07-21 12:33 ` Tobias Burnus
2022-08-26  9:53   ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).