public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
@ 2014-09-27 18:17 Ilya Verbin
  2014-09-29  1:10 ` Jan Hubicka
  0 siblings, 1 reply; 62+ messages in thread
From: Ilya Verbin @ 2014-09-27 18:17 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener, Jan Hubicka, gcc-patches
  Cc: Kirill Yukhin, Ilya Tocar, Andrey Turetskiy, Bernd Schmidt,
	Thomas Schwinge

Hello,

This patch enables the streaming of LTO bytecode, needed by offload target,
using existing LTO infrastructure.  It creates new prefix for the section names
(.gnu.target_lto_) and streams out the functions and variables with "omp declare
target" attribute, including the functions for outlined '#pragma omp target'
regions.  The offload compiler (under ifdef ACCEL_COMPILER) reads and compiles
these new sections.

But I have doubts regarding the offload_lto_mode switch.  Why I added it:
The outlined target regions (say omp_fn0) contains references from the parent
functions.  And that's correct for the case when we stream out the host-side
version of omp_fn0.  But for the target version there are no parent functions,
node->used_from_other_partition gets incorrect value (always 1), and offload
compiler crashes on streaming in.

Another solution is to remain referenced_from_other_partition_p and
reachable_from_other_partition_p unchanged, then used_from_other_partition will
have incorrect value for target regions, but the offload compiler will just
ignore it.  Which approach is better?
Anyway, now it's bootstrapped and regtested on i686-linux and x86_64-linux.


2014-09-27  Ilya Verbin  <ilya.verbin@intel.com>
	    Ilya Tocar  <ilya.tocar@intel.com>
	    Andrey Turetskiy  <andrey.turetskiy@intel.com>
	    Bernd Schmidt  <bernds@codesourcery.com>
gcc/
	* cgraph.h (symtab_node): Add need_dump flag.
	* cgraphunit.c: Include lto-section-names.h.
	(initialize_offload): New function.
	(ipa_passes): Initialize offload and call ipa_write_summaries if there
	is something to write to OMP_SECTION_NAME_PREFIX sections.
	(symbol_table::compile): Call lto_streamer_hooks_init under flag_openmp.
	* ipa-inline-analysis.c (inline_generate_summary): Do not exit under
	flag_openmp.
	(inline_free_summary): Always remove hooks.
	* lto-cgraph.c (lto_set_symtab_encoder_in_partition): Exit if there is
	no need to encode the node.
	(referenced_from_other_partition_p, reachable_from_other_partition_p):
	Ignore references from non-target functions to target functions if we
	are streaming out target-side bytecode (offload lto mode).
	(select_what_to_dump): New function.
	* lto-section-names.h (OMP_SECTION_NAME_PREFIX): Define.
	(section_name_prefix): Declare.
	* lto-streamer.c (offload_lto_mode): New variable.
	(section_name_prefix): New variable.
	(lto_get_section_name): Use section_name_prefix instead of
	LTO_SECTION_NAME_PREFIX.
	* lto-streamer.h (select_what_to_dump): Declare.
	(offload_lto_mode): Declare.
	* omp-low.c (is_targetreg_ctx): New function.
	(create_omp_child_function, check_omp_nesting_restrictions): Use it.
	(expand_omp_target): Set mark_force_output for the target functions.
	(lower_omp_critical): Add target attribute for omp critical symbol.
	* passes.c (ipa_write_summaries): Call select_what_to_dump.
gcc/lto/
	* lto-object.c (lto_obj_add_section): Use section_name_prefix instead of
	LTO_SECTION_NAME_PREFIX.
	* lto-partition.c (add_symbol_to_partition_1): Always set
	node->need_dump to true.
	(lto_promote_cross_file_statics): Call select_what_to_dump.
	* lto.c (lto_section_with_id): Use section_name_prefix instead of
	LTO_SECTION_NAME_PREFIX.
	(read_cgraph_and_symbols): Read OMP_SECTION_NAME_PREFIX sections, if
	being built as an offload compiler.

Thanks,
  -- Ilya

---

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 7481906..9ab970d 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -444,6 +444,11 @@ public:
   /* Set when init priority is set.  */
   unsigned in_init_priority_hash : 1;
 
+  /* Set when symbol needs to be dumped into LTO bytecode for LTO,
+     or in pragma omp target case, for separate compilation targeting
+     a different architecture.  */
+  unsigned need_dump : 1;
+
 
   /* Ordering of all symtab entries.  */
   int order;
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index b854e4b..4ab4c57 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -211,6 +211,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-nested.h"
 #include "gimplify.h"
 #include "dbgcnt.h"
+#include "lto-section-names.h"
 
 /* Queue of cgraph nodes scheduled to be added into cgraph.  This is a
    secondary queue used during optimization to accommodate passes that
@@ -1994,9 +1995,40 @@ output_in_order (bool no_reorder)
   free (nodes);
 }
 
+/* Check whether there is at least one function or global variable to offload.
+   */
+
+static bool
+initialize_offload (void)
+{
+  bool have_offload = false;
+  struct cgraph_node *node;
+  struct varpool_node *vnode;
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (node->decl)))
+      {
+	have_offload = true;
+	break;
+      }
+
+  FOR_EACH_DEFINED_VARIABLE (vnode)
+    {
+      if (!lookup_attribute ("omp declare target",
+			     DECL_ATTRIBUTES (vnode->decl))
+	  || TREE_CODE (vnode->decl) != VAR_DECL
+	  || DECL_SIZE (vnode->decl) == 0)
+	continue;
+      have_offload = true;
+    }
+
+  return have_offload;
+}
+
 static void
 ipa_passes (void)
 {
+  bool have_offload = false;
   gcc::pass_manager *passes = g->get_passes ();
 
   set_cfun (NULL);
@@ -2004,6 +2036,14 @@ ipa_passes (void)
   gimple_register_cfg_hooks ();
   bitmap_obstack_initialize (NULL);
 
+  if (!in_lto_p && flag_openmp)
+    {
+      have_offload = initialize_offload ();
+      /* OpenMP offloading requires LTO infrastructure.  */
+      if (have_offload)
+	flag_generate_lto = 1;
+    }
+
   invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_START, NULL);
 
   if (!in_lto_p)
@@ -2041,7 +2081,20 @@ ipa_passes (void)
     targetm.asm_out.lto_start ();
 
   if (!in_lto_p)
-    ipa_write_summaries ();
+    {
+      if (have_offload)
+	{
+	  offload_lto_mode = true;
+	  section_name_prefix = OMP_SECTION_NAME_PREFIX;
+	  ipa_write_summaries ();
+	}
+      if (flag_lto)
+	{
+	  offload_lto_mode = false;
+	  section_name_prefix = LTO_SECTION_NAME_PREFIX;
+	  ipa_write_summaries ();
+	}
+    }
 
   if (flag_generate_lto)
     targetm.asm_out.lto_end ();
@@ -2122,7 +2175,7 @@ symbol_table::compile (void)
   state = IPA;
 
   /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
-  if (flag_lto)
+  if (flag_lto || flag_openmp)
     lto_streamer_hooks_init ();
 
   /* Don't run the IPA passes if there was any error or sorry messages.  */
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 38f56d2..076a1e8 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -4010,7 +4010,7 @@ inline_generate_summary (void)
 
   /* When not optimizing, do not bother to analyze.  Inlining is still done
      because edge redirection needs to happen there.  */
-  if (!optimize && !flag_lto && !flag_wpa)
+  if (!optimize && !flag_lto && !flag_wpa && !flag_openmp)
     return;
 
   function_insertion_hook_holder =
@@ -4325,11 +4325,6 @@ void
 inline_free_summary (void)
 {
   struct cgraph_node *node;
-  if (!inline_edge_summary_vec.exists ())
-    return;
-  FOR_EACH_DEFINED_FUNCTION (node)
-    if (!node->alias)
-      reset_inline_summary (node);
   if (function_insertion_hook_holder)
     symtab->remove_cgraph_insertion_hook (function_insertion_hook_holder);
   function_insertion_hook_holder = NULL;
@@ -4345,6 +4340,11 @@ inline_free_summary (void)
   if (edge_duplication_hook_holder)
     symtab->remove_edge_duplication_hook (edge_duplication_hook_holder);
   edge_duplication_hook_holder = NULL;
+  if (!inline_edge_summary_vec.exists ())
+    return;
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (!node->alias)
+      reset_inline_summary (node);
   vec_free (inline_summary_vec);
   inline_edge_summary_vec.release ();
   if (edge_predicate_pool)
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 0584946..78b7fc8 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -239,6 +239,9 @@ void
 lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t encoder,
 				     symtab_node *node)
 {
+  /* Ignore not needed nodes.  */
+  if (!node->need_dump)
+    return;
   int index = lto_symtab_encoder_encode (encoder, node);
   encoder->nodes[index].in_partition = true;
 }
@@ -321,6 +324,12 @@ referenced_from_other_partition_p (symtab_node *node, lto_symtab_encoder_t encod
 
   for (i = 0; node->iterate_referring (i, ref); i++)
     {
+      /* Ignore references from non-target functions in offload lto mode.  */
+      if (offload_lto_mode
+	  && !lookup_attribute ("omp declare target",
+				DECL_ATTRIBUTES (ref->referring->decl)))
+	continue;
+
       if (ref->referring->in_other_partition
           || !lto_symtab_encoder_in_partition_p (encoder, ref->referring))
 	return true;
@@ -339,9 +348,17 @@ reachable_from_other_partition_p (struct cgraph_node *node, lto_symtab_encoder_t
   if (node->global.inlined_to)
     return false;
   for (e = node->callers; e; e = e->next_caller)
-    if (e->caller->in_other_partition
-	|| !lto_symtab_encoder_in_partition_p (encoder, e->caller))
-      return true;
+    {
+      /* Ignore references from non-target functions in offload lto mode.  */
+      if (offload_lto_mode
+	  && !lookup_attribute ("omp declare target",
+				DECL_ATTRIBUTES (e->caller->decl)))
+	continue;
+
+      if (e->caller->in_other_partition
+	  || !lto_symtab_encoder_in_partition_p (encoder, e->caller))
+	return true;
+    }
   return false;
 }
 
@@ -802,6 +819,18 @@ create_references (lto_symtab_encoder_t encoder, symtab_node *node)
       lto_symtab_encoder_encode (encoder, ref->referred);
 }
 
+/* Select what needs to be streamed out.  In regular lto mode stream everything.
+   In offload lto mode stream only stuff marked with an attribute.  */
+void
+select_what_to_dump (void)
+{
+  struct symtab_node *snode;
+  FOR_EACH_SYMBOL (snode)
+    snode->need_dump = !offload_lto_mode
+		       || lookup_attribute ("omp declare target",
+					    DECL_ATTRIBUTES (snode->decl));
+}
+
 /* Find all symbols we want to stream into given partition and insert them
    to encoders.
 
diff --git a/gcc/lto-section-names.h b/gcc/lto-section-names.h
index cb75230..06d2caf 100644
--- a/gcc/lto-section-names.h
+++ b/gcc/lto-section-names.h
@@ -25,6 +25,11 @@ along with GCC; see the file COPYING3.  If not see
    name for the functions and static_initializers.  For other types of
    sections a '.' and the section type are appended.  */
 #define LTO_SECTION_NAME_PREFIX ".gnu.lto_"
+#define OMP_SECTION_NAME_PREFIX ".gnu.target_lto_"
+
+/* Can be either OMP_SECTION_NAME_PREFIX when we stream 'pragma omp target'
+   stuff, or LTO_SECTION_NAME_PREFIX for LTO case.  */
+extern const char *section_name_prefix;
 
 /* Segment name for LTO sections.  This is only used for Mach-O.  */
 
diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
index 3480723..95232f9 100644
--- a/gcc/lto-streamer.c
+++ b/gcc/lto-streamer.c
@@ -48,6 +48,8 @@ struct lto_stats_d lto_stats;
 static bitmap_obstack lto_obstack;
 static bool lto_obstack_initialized;
 
+bool offload_lto_mode = false;
+const char *section_name_prefix = LTO_SECTION_NAME_PREFIX;
 
 /* Return a string representing LTO tag TAG.  */
 
@@ -177,7 +179,7 @@ lto_get_section_name (int section_type, const char *name, struct lto_file_decl_d
     sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
   else
     sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
-  return concat (LTO_SECTION_NAME_PREFIX, sep, add, post, NULL);
+  return concat (section_name_prefix, sep, add, post, NULL);
 }
 
 
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 4bec969..0016eef 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -831,6 +831,7 @@ bool referenced_from_this_partition_p (symtab_node *,
 bool reachable_from_this_partition_p (struct cgraph_node *,
 				      lto_symtab_encoder_t);
 lto_symtab_encoder_t compute_ltrans_boundary (lto_symtab_encoder_t encoder);
+void select_what_to_dump (void);
 
 
 /* In lto-symtab.c.  */
@@ -846,6 +847,9 @@ extern void lto_write_options (void);
 /* Statistics gathered during LTO, WPA and LTRANS.  */
 extern struct lto_stats_d lto_stats;
 
+/* Regular or offload mode of LTO.  */
+extern bool offload_lto_mode;
+
 /* Section names corresponding to the values of enum lto_section_type.  */
 extern const char *lto_section_name[];
 
diff --git a/gcc/lto/lto-object.c b/gcc/lto/lto-object.c
index 323f7b2..4ee752f 100644
--- a/gcc/lto/lto-object.c
+++ b/gcc/lto/lto-object.c
@@ -230,8 +230,7 @@ lto_obj_add_section (void *data, const char *name, off_t offset,
   void **slot;
   struct lto_section_list *list = loasd->list;
 
-  if (strncmp (name, LTO_SECTION_NAME_PREFIX,
-	       strlen (LTO_SECTION_NAME_PREFIX)) != 0)
+  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
     return 1;
 
   new_name = xstrdup (name);
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 0451a66..332562f 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -134,6 +134,7 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node)
   gcc_assert (c != SYMBOL_EXTERNAL
 	      && (c == SYMBOL_DUPLICATE || !symbol_partitioned_p (node)));
 
+  node->need_dump = true;
   lto_set_symtab_encoder_in_partition (part->encoder, node);
 
   if (symbol_partitioned_p (node))
@@ -920,6 +921,8 @@ lto_promote_cross_file_statics (void)
 
   gcc_assert (flag_wpa);
 
+  select_what_to_dump ();
+
   /* First compute boundaries.  */
   n_sets = ltrans_partitions.length ();
   for (i = 0; i < n_sets; i++)
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 6cbb178..f23d997 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -2125,7 +2125,7 @@ lto_section_with_id (const char *name, unsigned HOST_WIDE_INT *id)
 {
   const char *s;
 
-  if (strncmp (name, LTO_SECTION_NAME_PREFIX, strlen (LTO_SECTION_NAME_PREFIX)))
+  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
     return 0;
   s = strrchr (name, '.');
   return s && sscanf (s, "." HOST_WIDE_INT_PRINT_HEX_PURE, id) == 1;
@@ -2899,6 +2899,10 @@ read_cgraph_and_symbols (unsigned nfiles, const char **fnames)
 
   timevar_push (TV_IPA_LTO_DECL_IN);
 
+#ifdef ACCEL_COMPILER
+    section_name_prefix = OMP_SECTION_NAME_PREFIX;
+#endif
+
   real_file_decl_data
     = decl_data = ggc_cleared_vec_alloc<lto_file_decl_data_ptr> (nfiles + 1);
   real_file_count = nfiles;
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 82651ea..7d587b3 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -257,6 +257,16 @@ is_parallel_ctx (omp_context *ctx)
 }
 
 
+/* Return true if CTX is for an omp target region.  */
+
+static inline bool
+is_targetreg_ctx (omp_context *ctx)
+{
+  return gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
+	 && gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_REGION;
+}
+
+
 /* Return true if CTX is for an omp task.  */
 
 static inline bool
@@ -1930,9 +1940,7 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
     {
       omp_context *octx;
       for (octx = ctx; octx; octx = octx->outer)
-	if (gimple_code (octx->stmt) == GIMPLE_OMP_TARGET
-	    && gimple_omp_target_kind (octx->stmt)
-	       == GF_OMP_TARGET_KIND_REGION)
+	if (is_targetreg_ctx (octx))
 	  {
 	    target_p = true;
 	    break;
@@ -2588,8 +2596,7 @@ check_omp_nesting_restrictions (gimple stmt, omp_context *ctx)
       break;
     case GIMPLE_OMP_TARGET:
       for (; ctx != NULL; ctx = ctx->outer)
-	if (gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
-	    && gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_REGION)
+	if (is_targetreg_ctx (ctx))
 	  {
 	    const char *name;
 	    switch (gimple_omp_target_kind (stmt))
@@ -8206,6 +8213,7 @@ expand_omp_target (struct omp_region *region)
   if (kind == GF_OMP_TARGET_KIND_REGION)
     {
       unsigned srcidx, dstidx, num;
+      struct cgraph_node *node;
 
       /* If the target region needs data sent from the parent
 	 function, then the very first statement (except possible
@@ -8337,6 +8345,11 @@ expand_omp_target (struct omp_region *region)
       push_cfun (child_cfun);
       cgraph_edge::rebuild_edges ();
 
+      /* Prevent IPA from removing child_fn as unreachable, since there are no
+	 refs from the parent function to the target side child_fn.  */
+      node = cgraph_node::get (child_fn);
+      node->mark_force_output ();
+
       /* Some EH regions might become dead, see PR34608.  If
 	 pass_cleanup_cfg isn't the first pass to happen with the
 	 new child, these dead EH edges might cause problems.
@@ -9207,6 +9220,19 @@ lower_omp_critical (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	  DECL_COMMON (decl) = 1;
 	  DECL_ARTIFICIAL (decl) = 1;
 	  DECL_IGNORED_P (decl) = 1;
+
+	  /* If '#pragma omp critical' is inside target region, the symbol must
+	     have an 'omp declare target' attribute.  */
+	  omp_context *octx;
+	  for (octx = ctx->outer; octx; octx = octx->outer)
+	    if (is_targetreg_ctx (octx))
+	      {
+		DECL_ATTRIBUTES (decl)
+		  = tree_cons (get_identifier ("omp declare target"),
+			       NULL_TREE, DECL_ATTRIBUTES (decl));
+		break;
+	      }
+
 	  varpool_node::finalize_decl (decl);
 
 	  splay_tree_insert (critical_name_mutexes, (splay_tree_key) name,
diff --git a/gcc/passes.c b/gcc/passes.c
index 5001c3d..d63c913 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -2308,6 +2308,8 @@ ipa_write_summaries (void)
   if (!flag_generate_lto || seen_error ())
     return;
 
+  select_what_to_dump ();
+
   encoder = lto_symtab_encoder_new (false);
 
   /* Create the callgraph set in the same order used in
-- 
1.7.1

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-09-27 18:17 [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming Ilya Verbin
@ 2014-09-29  1:10 ` Jan Hubicka
  2014-09-29 17:37   ` Ilya Verbin
  0 siblings, 1 reply; 62+ messages in thread
From: Jan Hubicka @ 2014-09-29  1:10 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Jakub Jelinek, Richard Biener, Jan Hubicka, gcc-patches,
	Kirill Yukhin, Ilya Tocar, Andrey Turetskiy, Bernd Schmidt,
	Thomas Schwinge

> 2014-09-27  Ilya Verbin  <ilya.verbin@intel.com>
> 	    Ilya Tocar  <ilya.tocar@intel.com>
> 	    Andrey Turetskiy  <andrey.turetskiy@intel.com>
> 	    Bernd Schmidt  <bernds@codesourcery.com>
> gcc/
> 	* cgraph.h (symtab_node): Add need_dump flag.
> 	* cgraphunit.c: Include lto-section-names.h.
> 	(initialize_offload): New function.
> 	(ipa_passes): Initialize offload and call ipa_write_summaries if there
> 	is something to write to OMP_SECTION_NAME_PREFIX sections.
> 	(symbol_table::compile): Call lto_streamer_hooks_init under flag_openmp.
> 	* ipa-inline-analysis.c (inline_generate_summary): Do not exit under
> 	flag_openmp.
> 	(inline_free_summary): Always remove hooks.
> 	* lto-cgraph.c (lto_set_symtab_encoder_in_partition): Exit if there is
> 	no need to encode the node.
> 	(referenced_from_other_partition_p, reachable_from_other_partition_p):
> 	Ignore references from non-target functions to target functions if we
> 	are streaming out target-side bytecode (offload lto mode).
> 	(select_what_to_dump): New function.
> 	* lto-section-names.h (OMP_SECTION_NAME_PREFIX): Define.
> 	(section_name_prefix): Declare.
> 	* lto-streamer.c (offload_lto_mode): New variable.
> 	(section_name_prefix): New variable.
> 	(lto_get_section_name): Use section_name_prefix instead of
> 	LTO_SECTION_NAME_PREFIX.
> 	* lto-streamer.h (select_what_to_dump): Declare.
> 	(offload_lto_mode): Declare.
> 	* omp-low.c (is_targetreg_ctx): New function.
> 	(create_omp_child_function, check_omp_nesting_restrictions): Use it.
> 	(expand_omp_target): Set mark_force_output for the target functions.
> 	(lower_omp_critical): Add target attribute for omp critical symbol.
> 	* passes.c (ipa_write_summaries): Call select_what_to_dump.
> gcc/lto/
> 	* lto-object.c (lto_obj_add_section): Use section_name_prefix instead of
> 	LTO_SECTION_NAME_PREFIX.
> 	* lto-partition.c (add_symbol_to_partition_1): Always set
> 	node->need_dump to true.
> 	(lto_promote_cross_file_statics): Call select_what_to_dump.
> 	* lto.c (lto_section_with_id): Use section_name_prefix instead of
> 	LTO_SECTION_NAME_PREFIX.
> 	(read_cgraph_and_symbols): Read OMP_SECTION_NAME_PREFIX sections, if
> 	being built as an offload compiler.
> 
> Thanks,
>   -- Ilya
> 
> ---
> 
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 7481906..9ab970d 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -444,6 +444,11 @@ public:
>    /* Set when init priority is set.  */
>    unsigned in_init_priority_hash : 1;
>  
> +  /* Set when symbol needs to be dumped into LTO bytecode for LTO,
> +     or in pragma omp target case, for separate compilation targeting
> +     a different architecture.  */
> +  unsigned need_dump : 1;

dump for me implied debug dump. LTO is usually called streaming, so prehaps
need_lto_stremaing?

> +/* Check whether there is at least one function or global variable to offload.
> +   */
> +
> +static bool
> +initialize_offload (void)

Perhaps have_offload_p? Nothing is initialized here...
> +{
> +  bool have_offload = false;
> +  struct cgraph_node *node;
> +  struct varpool_node *vnode;
> +
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +    if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (node->decl)))
> +      {
> +	have_offload = true;
> +	break;
> +      }
> +
> +  FOR_EACH_DEFINED_VARIABLE (vnode)
> +    {
> +      if (!lookup_attribute ("omp declare target",
> +			     DECL_ATTRIBUTES (vnode->decl))
> +	  || TREE_CODE (vnode->decl) != VAR_DECL
> +	  || DECL_SIZE (vnode->decl) == 0)
> +	continue;
> +      have_offload = true;
> +    }
> +
> +  return have_offload;
> +}
> +
>  static void
>  ipa_passes (void)
>  {
> +  bool have_offload = false;
>    gcc::pass_manager *passes = g->get_passes ();
>  
>    set_cfun (NULL);
> @@ -2004,6 +2036,14 @@ ipa_passes (void)
>    gimple_register_cfg_hooks ();
>    bitmap_obstack_initialize (NULL);
>  
> +  if (!in_lto_p && flag_openmp)
> +    {
> +      have_offload = initialize_offload ();
> +      /* OpenMP offloading requires LTO infrastructure.  */
> +      if (have_offload)
> +	flag_generate_lto = 1;
> +    }
> +
>    invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_START, NULL);
>  
>    if (!in_lto_p)
> @@ -2041,7 +2081,20 @@ ipa_passes (void)
>      targetm.asm_out.lto_start ();
>  
>    if (!in_lto_p)
> -    ipa_write_summaries ();
> +    {
> +      if (have_offload)
> +	{
> +	  offload_lto_mode = true;
> +	  section_name_prefix = OMP_SECTION_NAME_PREFIX;
> +	  ipa_write_summaries ();
> +	}
> +      if (flag_lto)
> +	{
> +	  offload_lto_mode = false;
> +	  section_name_prefix = LTO_SECTION_NAME_PREFIX;
> +	  ipa_write_summaries ();
> +	}

How does LTO combine with offloading?
> @@ -4325,11 +4325,6 @@ void
>  inline_free_summary (void)
>  {
>    struct cgraph_node *node;
> -  if (!inline_edge_summary_vec.exists ())
> -    return;
> -  FOR_EACH_DEFINED_FUNCTION (node)
> -    if (!node->alias)
> -      reset_inline_summary (node);
>    if (function_insertion_hook_holder)
>      symtab->remove_cgraph_insertion_hook (function_insertion_hook_holder);
>    function_insertion_hook_holder = NULL;
> @@ -4345,6 +4340,11 @@ inline_free_summary (void)
>    if (edge_duplication_hook_holder)
>      symtab->remove_edge_duplication_hook (edge_duplication_hook_holder);
>    edge_duplication_hook_holder = NULL;
> +  if (!inline_edge_summary_vec.exists ())
> +    return;
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +    if (!node->alias)
> +      reset_inline_summary (node);

Why this is needed?
> diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> index 0584946..78b7fc8 100644
> --- a/gcc/lto-cgraph.c
> +++ b/gcc/lto-cgraph.c
> @@ -239,6 +239,9 @@ void
>  lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t encoder,
>  				     symtab_node *node)
>  {
> +  /* Ignore not needed nodes.  */
> +  if (!node->need_dump)
> +    return;

I think it should be rather done at caller side (in the loop setting what to output)
rather than in this simple datastructure accestor.

>    int index = lto_symtab_encoder_encode (encoder, node);
>    encoder->nodes[index].in_partition = true;
>  }
> @@ -321,6 +324,12 @@ referenced_from_other_partition_p (symtab_node *node, lto_symtab_encoder_t encod
>  
>    for (i = 0; node->iterate_referring (i, ref); i++)
>      {
> +      /* Ignore references from non-target functions in offload lto mode.  */
> +      if (offload_lto_mode
> +	  && !lookup_attribute ("omp declare target",
> +				DECL_ATTRIBUTES (ref->referring->decl)))
> +	continue;

Those are quite busy loops, you may consder making offload a flag.  Why you can't test
need_dump here?

I think you also need to run free lang data when you decide to stream something.

Otherwise the cgraph bits seems resonable. I think Richi will want to comment on LTO
part.
Honza

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-09-29  1:10 ` Jan Hubicka
@ 2014-09-29 17:37   ` Ilya Verbin
  2014-09-30 11:40     ` Thomas Schwinge
  0 siblings, 1 reply; 62+ messages in thread
From: Ilya Verbin @ 2014-09-29 17:37 UTC (permalink / raw)
  To: Jan Hubicka, Richard Biener
  Cc: Jakub Jelinek, gcc-patches, Kirill Yukhin, Ilya Tocar,
	Andrey Turetskiy, Bernd Schmidt, Thomas Schwinge

On 29 Sep 03:10, Jan Hubicka wrote:
> dump for me implied debug dump. LTO is usually called streaming, so prehaps
> need_lto_stremaing?

Fixed.

> > +initialize_offload (void)
> Perhaps have_offload_p? Nothing is initialized here...

The next patch will add some initialization to this function.  And they'll be
committed in a series.  So, I'd prefer to keep this name.

> How does LTO combine with offloading?

Both .gnu.lto_ and .gnu.target_lto_ sections are created.  LTO just ignores
target sections, and offload compiler ignores .gnu.lto_ sections.  Everything
works fine on my testcases.

> > @@ -4325,11 +4325,6 @@ void
> >  inline_free_summary (void)
> >  {
> >    struct cgraph_node *node;
> > -  if (!inline_edge_summary_vec.exists ())
> > -    return;
> > -  FOR_EACH_DEFINED_FUNCTION (node)
> > -    if (!node->alias)
> > -      reset_inline_summary (node);
> >    if (function_insertion_hook_holder)
> >      symtab->remove_cgraph_insertion_hook (function_insertion_hook_holder);
> >    function_insertion_hook_holder = NULL;
> > @@ -4345,6 +4340,11 @@ inline_free_summary (void)
> >    if (edge_duplication_hook_holder)
> >      symtab->remove_edge_duplication_hook (edge_duplication_hook_holder);
> >    edge_duplication_hook_holder = NULL;
> > +  if (!inline_edge_summary_vec.exists ())
> > +    return;
> > +  FOR_EACH_DEFINED_FUNCTION (node)
> > +    if (!node->alias)
> > +      reset_inline_summary (node);
> 
> Why this is needed?

Without this change gcc/testsuite/g++.dg/gomp/declare-simd-1.C will fail at -O0,
since inline_generate_summary adds add_new_function hook, but at -O0 the
inline_edge_summary_vec is empty, and we don't call remove_cgraph_insertion_hook
( https://gcc.gnu.org/ml/gcc-patches/2014-02/msg00055.html )

> >  lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t encoder,
> >  				     symtab_node *node)
> >  {
> > +  /* Ignore not needed nodes.  */
> > +  if (!node->need_dump)
> > +    return;
> 
> I think it should be rather done at caller side (in the loop setting what to output)
> rather than in this simple datastructure accestor.

Done.

> > +      /* Ignore references from non-target functions in offload lto mode.  */
> > +      if (offload_lto_mode
> > +	  && !lookup_attribute ("omp declare target",
> > +				DECL_ATTRIBUTES (ref->referring->decl)))
> > +	continue;
> 
> Those are quite busy loops, you may consder making offload a flag.  Why you can't test
> need_dump here?

Definitely.  I have no idea why I did not used this flag here :)  Fixed.

> I think you also need to run free lang data when you decide to stream something.

When I compile a file with offloading, but without -flto, I see free lang data,
executed during all_small_ipa_passes:

#0  free_lang_data () at gcc/tree.c:5655
#1  in (anonymous namespace)::pass_ipa_free_lang_data::execute (this=0x20ce470) at gcc/tree.c:5708
#2  in execute_one_pass (pass=0x20ce470) at gcc/passes.c:2151
#3  in execute_ipa_pass_list (pass=0x20ce470) at gcc/passes.c:2543
#4  in ipa_passes () at gcc/cgraphunit.c:2055
#5  in symbol_table::compile (this=0x7ffff19fd000) at gcc/cgraphunit.c:2187
#6  in symbol_table::finalize_compilation_unit (this=0x7ffff19fd000) at gcc/cgraphunit.c:2340
#7  in c_write_global_declarations () at gcc/c/c-decl.c:10431
#8  in compile_file () at gcc/toplev.c:566
#9  in do_compile () at gcc/toplev.c:1949
#10 in toplev_main (argc=17, argv=0x7fffffffe3a8) at gcc/toplev.c:2025
#11 in main (argc=17, argv=0x7fffffffe3a8) at gcc/main.c:36

> Otherwise the cgraph bits seems resonable. I think Richi will want to comment on LTO
> part.

Here is updated patch.  Bootstrapped and regtested.
OK for trunk (after all patches from the series will be approved)?

Thanks,
  -- Ilya


gcc/
	* cgraph.h (symtab_node): Add need_lto_streaming flag.
	* cgraphunit.c: Include lto-section-names.h.
	(initialize_offload): New function.
	(ipa_passes): Initialize offload and call ipa_write_summaries if there
	is something to write to OMP_SECTION_NAME_PREFIX sections.
	(symbol_table::compile): Call lto_streamer_hooks_init under flag_openmp.
	* ipa-inline-analysis.c (inline_generate_summary): Do not exit under
	flag_openmp.
	(inline_free_summary): Always remove hooks.
	* lto-cgraph.c (referenced_from_other_partition_p): Ignore references
	from non-target functions to target functions if we are streaming out
	target-side bytecode (offload lto mode).
	(reachable_from_other_partition_p): Likewise.
	(select_what_to_stream): New function.
	(compute_ltrans_boundary): Do not call
	lto_set_symtab_encoder_in_partition if the node should not be streamed.
	* lto-section-names.h (OMP_SECTION_NAME_PREFIX): Define.
	(section_name_prefix): Declare.
	* lto-streamer.c (section_name_prefix): New variable.
	(lto_get_section_name): Use section_name_prefix instead of
	LTO_SECTION_NAME_PREFIX.
	* lto-streamer.h (select_what_to_stream): Declare.
	* omp-low.c (is_targetreg_ctx): New function.
	(create_omp_child_function, check_omp_nesting_restrictions): Use it.
	(expand_omp_target): Set mark_force_output for the target functions.
	(lower_omp_critical): Add target attribute for omp critical symbol.
	* passes.c (ipa_write_summaries): New argument offload_lto_mode.  Call
	select_what_to_stream.  Do not call lto_set_symtab_encoder_in_partition
	if the node should not be streamed out.
	* tree-pass.h (ipa_write_summaries): New bool argument.
gcc/lto/
	* lto-object.c (lto_obj_add_section): Use section_name_prefix instead of
	LTO_SECTION_NAME_PREFIX.
	* lto-partition.c (lto_promote_cross_file_statics): Call
	select_what_to_stream.
	* lto.c (lto_section_with_id): Use section_name_prefix instead of
	LTO_SECTION_NAME_PREFIX.
	(read_cgraph_and_symbols): Read OMP_SECTION_NAME_PREFIX sections, if
	being built as an offload compiler.

---

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 4fd58a5..df0b0e2 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -444,6 +444,10 @@ public:
   /* Set when init priority is set.  */
   unsigned in_init_priority_hash : 1;
 
+  /* Set when symbol needs to be streamed into LTO bytecode for LTO, or in case
+     of offloading, for separate compilation for a different target.  */
+  unsigned need_lto_streaming : 1;
+
 
   /* Ordering of all symtab entries.  */
   int order;
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index d463505..a6b0bac 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -211,6 +211,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-nested.h"
 #include "gimplify.h"
 #include "dbgcnt.h"
+#include "lto-section-names.h"
 
 /* Queue of cgraph nodes scheduled to be added into cgraph.  This is a
    secondary queue used during optimization to accommodate passes that
@@ -1994,9 +1995,40 @@ output_in_order (bool no_reorder)
   free (nodes);
 }
 
+/* Check whether there is at least one function or global variable to offload.
+   */
+
+static bool
+initialize_offload (void)
+{
+  bool have_offload = false;
+  struct cgraph_node *node;
+  struct varpool_node *vnode;
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (node->decl)))
+      {
+	have_offload = true;
+	break;
+      }
+
+  FOR_EACH_DEFINED_VARIABLE (vnode)
+    {
+      if (!lookup_attribute ("omp declare target",
+			     DECL_ATTRIBUTES (vnode->decl))
+	  || TREE_CODE (vnode->decl) != VAR_DECL
+	  || DECL_SIZE (vnode->decl) == 0)
+	continue;
+      have_offload = true;
+    }
+
+  return have_offload;
+}
+
 static void
 ipa_passes (void)
 {
+  bool have_offload = false;
   gcc::pass_manager *passes = g->get_passes ();
 
   set_cfun (NULL);
@@ -2004,6 +2036,14 @@ ipa_passes (void)
   gimple_register_cfg_hooks ();
   bitmap_obstack_initialize (NULL);
 
+  if (!in_lto_p && flag_openmp)
+    {
+      have_offload = initialize_offload ();
+      /* OpenMP offloading requires LTO infrastructure.  */
+      if (have_offload)
+	flag_generate_lto = 1;
+    }
+
   invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_START, NULL);
 
   if (!in_lto_p)
@@ -2041,7 +2081,18 @@ ipa_passes (void)
     targetm.asm_out.lto_start ();
 
   if (!in_lto_p)
-    ipa_write_summaries ();
+    {
+      if (have_offload)
+	{
+	  section_name_prefix = OMP_SECTION_NAME_PREFIX;
+	  ipa_write_summaries (true);
+	}
+      if (flag_lto)
+	{
+	  section_name_prefix = LTO_SECTION_NAME_PREFIX;
+	  ipa_write_summaries (false);
+	}
+    }
 
   if (flag_generate_lto)
     targetm.asm_out.lto_end ();
@@ -2122,7 +2173,7 @@ symbol_table::compile (void)
   state = IPA;
 
   /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
-  if (flag_lto)
+  if (flag_lto || flag_openmp)
     lto_streamer_hooks_init ();
 
   /* Don't run the IPA passes if there was any error or sorry messages.  */
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 38f56d2..076a1e8 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -4010,7 +4010,7 @@ inline_generate_summary (void)
 
   /* When not optimizing, do not bother to analyze.  Inlining is still done
      because edge redirection needs to happen there.  */
-  if (!optimize && !flag_lto && !flag_wpa)
+  if (!optimize && !flag_lto && !flag_wpa && !flag_openmp)
     return;
 
   function_insertion_hook_holder =
@@ -4325,11 +4325,6 @@ void
 inline_free_summary (void)
 {
   struct cgraph_node *node;
-  if (!inline_edge_summary_vec.exists ())
-    return;
-  FOR_EACH_DEFINED_FUNCTION (node)
-    if (!node->alias)
-      reset_inline_summary (node);
   if (function_insertion_hook_holder)
     symtab->remove_cgraph_insertion_hook (function_insertion_hook_holder);
   function_insertion_hook_holder = NULL;
@@ -4345,6 +4340,11 @@ inline_free_summary (void)
   if (edge_duplication_hook_holder)
     symtab->remove_edge_duplication_hook (edge_duplication_hook_holder);
   edge_duplication_hook_holder = NULL;
+  if (!inline_edge_summary_vec.exists ())
+    return;
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (!node->alias)
+      reset_inline_summary (node);
   vec_free (inline_summary_vec);
   inline_edge_summary_vec.release ();
   if (edge_predicate_pool)
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 0584946..ed22289 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -321,6 +321,11 @@ referenced_from_other_partition_p (symtab_node *node, lto_symtab_encoder_t encod
 
   for (i = 0; node->iterate_referring (i, ref); i++)
     {
+      /* Ignore references from non-target nodes while streaming NODE into
+	 offload target section.  */
+      if (!ref->referring->need_lto_streaming)
+	continue;
+
       if (ref->referring->in_other_partition
           || !lto_symtab_encoder_in_partition_p (encoder, ref->referring))
 	return true;
@@ -339,9 +344,16 @@ reachable_from_other_partition_p (struct cgraph_node *node, lto_symtab_encoder_t
   if (node->global.inlined_to)
     return false;
   for (e = node->callers; e; e = e->next_caller)
-    if (e->caller->in_other_partition
-	|| !lto_symtab_encoder_in_partition_p (encoder, e->caller))
-      return true;
+    {
+      /* Ignore references from non-target nodes while streaming NODE into
+	 offload target section.  */
+      if (!e->caller->need_lto_streaming)
+	continue;
+
+      if (e->caller->in_other_partition
+	  || !lto_symtab_encoder_in_partition_p (encoder, e->caller))
+	return true;
+    }
   return false;
 }
 
@@ -802,6 +814,18 @@ create_references (lto_symtab_encoder_t encoder, symtab_node *node)
       lto_symtab_encoder_encode (encoder, ref->referred);
 }
 
+/* Select what needs to be streamed out.  In regular lto mode stream everything.
+   In offload lto mode stream only stuff marked with an attribute.  */
+void
+select_what_to_stream (bool offload_lto_mode)
+{
+  struct symtab_node *snode;
+  FOR_EACH_SYMBOL (snode)
+    snode->need_lto_streaming
+      = !offload_lto_mode || lookup_attribute ("omp declare target",
+					       DECL_ATTRIBUTES (snode->decl));
+}
+
 /* Find all symbols we want to stream into given partition and insert them
    to encoders.
 
@@ -828,6 +852,8 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
        !lsei_end_p (lsei); lsei_next_function_in_partition (&lsei))
     {
       struct cgraph_node *node = lsei_cgraph_node (lsei);
+      if (!node->need_lto_streaming)
+	continue;
       add_node_to (encoder, node, true);
       lto_set_symtab_encoder_in_partition (encoder, node);
       create_references (encoder, node);
@@ -844,6 +870,8 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
     {
       varpool_node *vnode = lsei_varpool_node (lsei);
 
+      if (!vnode->need_lto_streaming)
+	continue;
       lto_set_symtab_encoder_in_partition (encoder, vnode);
       lto_set_symtab_encoder_encode_initializer (encoder, vnode);
       create_references (encoder, vnode);
diff --git a/gcc/lto-section-names.h b/gcc/lto-section-names.h
index cb75230..06d2caf 100644
--- a/gcc/lto-section-names.h
+++ b/gcc/lto-section-names.h
@@ -25,6 +25,11 @@ along with GCC; see the file COPYING3.  If not see
    name for the functions and static_initializers.  For other types of
    sections a '.' and the section type are appended.  */
 #define LTO_SECTION_NAME_PREFIX ".gnu.lto_"
+#define OMP_SECTION_NAME_PREFIX ".gnu.target_lto_"
+
+/* Can be either OMP_SECTION_NAME_PREFIX when we stream 'pragma omp target'
+   stuff, or LTO_SECTION_NAME_PREFIX for LTO case.  */
+extern const char *section_name_prefix;
 
 /* Segment name for LTO sections.  This is only used for Mach-O.  */
 
diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
index 3480723..161e12d 100644
--- a/gcc/lto-streamer.c
+++ b/gcc/lto-streamer.c
@@ -48,6 +48,7 @@ struct lto_stats_d lto_stats;
 static bitmap_obstack lto_obstack;
 static bool lto_obstack_initialized;
 
+const char *section_name_prefix = LTO_SECTION_NAME_PREFIX;
 
 /* Return a string representing LTO tag TAG.  */
 
@@ -177,7 +178,7 @@ lto_get_section_name (int section_type, const char *name, struct lto_file_decl_d
     sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
   else
     sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
-  return concat (LTO_SECTION_NAME_PREFIX, sep, add, post, NULL);
+  return concat (section_name_prefix, sep, add, post, NULL);
 }
 
 
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 4bec969..ba00ab4 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -831,6 +831,7 @@ bool referenced_from_this_partition_p (symtab_node *,
 bool reachable_from_this_partition_p (struct cgraph_node *,
 				      lto_symtab_encoder_t);
 lto_symtab_encoder_t compute_ltrans_boundary (lto_symtab_encoder_t encoder);
+void select_what_to_stream (bool);
 
 
 /* In lto-symtab.c.  */
diff --git a/gcc/lto/lto-object.c b/gcc/lto/lto-object.c
index 323f7b2..4ee752f 100644
--- a/gcc/lto/lto-object.c
+++ b/gcc/lto/lto-object.c
@@ -230,8 +230,7 @@ lto_obj_add_section (void *data, const char *name, off_t offset,
   void **slot;
   struct lto_section_list *list = loasd->list;
 
-  if (strncmp (name, LTO_SECTION_NAME_PREFIX,
-	       strlen (LTO_SECTION_NAME_PREFIX)) != 0)
+  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
     return 1;
 
   new_name = xstrdup (name);
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 0451a66..aae2be9 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -920,6 +920,8 @@ lto_promote_cross_file_statics (void)
 
   gcc_assert (flag_wpa);
 
+  select_what_to_stream (false);
+
   /* First compute boundaries.  */
   n_sets = ltrans_partitions.length ();
   for (i = 0; i < n_sets; i++)
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 6cbb178..f23d997 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -2125,7 +2125,7 @@ lto_section_with_id (const char *name, unsigned HOST_WIDE_INT *id)
 {
   const char *s;
 
-  if (strncmp (name, LTO_SECTION_NAME_PREFIX, strlen (LTO_SECTION_NAME_PREFIX)))
+  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
     return 0;
   s = strrchr (name, '.');
   return s && sscanf (s, "." HOST_WIDE_INT_PRINT_HEX_PURE, id) == 1;
@@ -2899,6 +2899,10 @@ read_cgraph_and_symbols (unsigned nfiles, const char **fnames)
 
   timevar_push (TV_IPA_LTO_DECL_IN);
 
+#ifdef ACCEL_COMPILER
+    section_name_prefix = OMP_SECTION_NAME_PREFIX;
+#endif
+
   real_file_decl_data
     = decl_data = ggc_cleared_vec_alloc<lto_file_decl_data_ptr> (nfiles + 1);
   real_file_count = nfiles;
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index eb0a7ee..c0a6393 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -257,6 +257,16 @@ is_parallel_ctx (omp_context *ctx)
 }
 
 
+/* Return true if CTX is for an omp target region.  */
+
+static inline bool
+is_targetreg_ctx (omp_context *ctx)
+{
+  return gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
+	 && gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_REGION;
+}
+
+
 /* Return true if CTX is for an omp task.  */
 
 static inline bool
@@ -1930,9 +1940,7 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
     {
       omp_context *octx;
       for (octx = ctx; octx; octx = octx->outer)
-	if (gimple_code (octx->stmt) == GIMPLE_OMP_TARGET
-	    && gimple_omp_target_kind (octx->stmt)
-	       == GF_OMP_TARGET_KIND_REGION)
+	if (is_targetreg_ctx (octx))
 	  {
 	    target_p = true;
 	    break;
@@ -2588,8 +2596,7 @@ check_omp_nesting_restrictions (gimple stmt, omp_context *ctx)
       break;
     case GIMPLE_OMP_TARGET:
       for (; ctx != NULL; ctx = ctx->outer)
-	if (gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
-	    && gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_REGION)
+	if (is_targetreg_ctx (ctx))
 	  {
 	    const char *name;
 	    switch (gimple_omp_target_kind (stmt))
@@ -8206,6 +8213,7 @@ expand_omp_target (struct omp_region *region)
   if (kind == GF_OMP_TARGET_KIND_REGION)
     {
       unsigned srcidx, dstidx, num;
+      struct cgraph_node *node;
 
       /* If the target region needs data sent from the parent
 	 function, then the very first statement (except possible
@@ -8337,6 +8345,11 @@ expand_omp_target (struct omp_region *region)
       push_cfun (child_cfun);
       cgraph_edge::rebuild_edges ();
 
+      /* Prevent IPA from removing child_fn as unreachable, since there are no
+	 refs from the parent function to the target side child_fn.  */
+      node = cgraph_node::get (child_fn);
+      node->mark_force_output ();
+
       /* Some EH regions might become dead, see PR34608.  If
 	 pass_cleanup_cfg isn't the first pass to happen with the
 	 new child, these dead EH edges might cause problems.
@@ -9207,6 +9220,19 @@ lower_omp_critical (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	  DECL_COMMON (decl) = 1;
 	  DECL_ARTIFICIAL (decl) = 1;
 	  DECL_IGNORED_P (decl) = 1;
+
+	  /* If '#pragma omp critical' is inside target region, the symbol must
+	     have an 'omp declare target' attribute.  */
+	  omp_context *octx;
+	  for (octx = ctx->outer; octx; octx = octx->outer)
+	    if (is_targetreg_ctx (octx))
+	      {
+		DECL_ATTRIBUTES (decl)
+		  = tree_cons (get_identifier ("omp declare target"),
+			       NULL_TREE, DECL_ATTRIBUTES (decl));
+		break;
+	      }
+
 	  varpool_node::finalize_decl (decl);
 
 	  splay_tree_insert (critical_name_mutexes, (splay_tree_key) name,
diff --git a/gcc/passes.c b/gcc/passes.c
index 5001c3d..0d5667d 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -2297,7 +2297,7 @@ ipa_write_summaries_1 (lto_symtab_encoder_t encoder)
 /* Write out summaries for all the nodes in the callgraph.  */
 
 void
-ipa_write_summaries (void)
+ipa_write_summaries (bool offload_lto_mode)
 {
   lto_symtab_encoder_t encoder;
   int i, order_pos;
@@ -2308,6 +2308,8 @@ ipa_write_summaries (void)
   if (!flag_generate_lto || seen_error ())
     return;
 
+  select_what_to_stream (offload_lto_mode);
+
   encoder = lto_symtab_encoder_new (false);
 
   /* Create the callgraph set in the same order used in
@@ -2334,15 +2336,16 @@ ipa_write_summaries (void)
 	  renumber_gimple_stmt_uids ();
 	  pop_cfun ();
 	}
-      if (node->definition)
+      if (node->definition && node->need_lto_streaming)
         lto_set_symtab_encoder_in_partition (encoder, node);
     }
 
   FOR_EACH_DEFINED_FUNCTION (node)
-    if (node->alias)
+    if (node->alias && node->need_lto_streaming)
       lto_set_symtab_encoder_in_partition (encoder, node);
   FOR_EACH_DEFINED_VARIABLE (vnode)
-    lto_set_symtab_encoder_in_partition (encoder, vnode);
+    if (vnode->need_lto_streaming)
+      lto_set_symtab_encoder_in_partition (encoder, vnode);
 
   ipa_write_summaries_1 (compute_ltrans_boundary (encoder));
 
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index ed109c3..0bc5ca1 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -592,7 +592,7 @@ extern void pass_fini_dump_file (opt_pass *);
 extern const char *get_current_pass_name (void);
 extern void print_current_pass (FILE *);
 extern void debug_pass (void);
-extern void ipa_write_summaries (void);
+extern void ipa_write_summaries (bool);
 extern void ipa_write_optimization_summaries (struct lto_symtab_encoder_d *);
 extern void ipa_read_summaries (void);
 extern void ipa_read_optimization_summaries (void);
-- 
1.7.1

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-09-29 17:37   ` Ilya Verbin
@ 2014-09-30 11:40     ` Thomas Schwinge
  2014-10-01 16:13       ` Ilya Verbin
  0 siblings, 1 reply; 62+ messages in thread
From: Thomas Schwinge @ 2014-09-30 11:40 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Jakub Jelinek, gcc-patches, Kirill Yukhin, Ilya Tocar,
	Andrey Turetskiy, Bernd Schmidt, Jan Hubicka, Richard Biener

[-- Attachment #1: Type: text/plain, Size: 2826 bytes --]

Hi!

As just discussed for the libgcc changes in
<http://news.gmane.org/find-root.php?message_id=%3C87d2ad73ze.fsf%40schwinge.name%3E>,
just some suggestions regarding the terminology, where I think that the
term »target« might be confusing in comments or symbols' names.  That is,
in the following, »target« should possibly be replaced by »offload[ing]«
or similar:

On Mon, 29 Sep 2014 21:37:04 +0400, Ilya Verbin <iverbin@gmail.com> wrote:
> --- a/gcc/lto-cgraph.c
> +++ b/gcc/lto-cgraph.c
> @@ -321,6 +321,11 @@ referenced_from_other_partition_p (symtab_node *node, lto_symtab_encoder_t encod
>  
>    for (i = 0; node->iterate_referring (i, ref); i++)
>      {
> +      /* Ignore references from non-target nodes while streaming NODE into
> +	 offload target section.  */
> +      if (!ref->referring->need_lto_streaming)
> +	continue;
> +
>        if (ref->referring->in_other_partition
>            || !lto_symtab_encoder_in_partition_p (encoder, ref->referring))
>  	return true;
> @@ -339,9 +344,16 @@ reachable_from_other_partition_p (struct cgraph_node *node, lto_symtab_encoder_t
>    if (node->global.inlined_to)
>      return false;
>    for (e = node->callers; e; e = e->next_caller)
> -    if (e->caller->in_other_partition
> -	|| !lto_symtab_encoder_in_partition_p (encoder, e->caller))
> -      return true;
> +    {
> +      /* Ignore references from non-target nodes while streaming NODE into
> +	 offload target section.  */
> +      if (!e->caller->need_lto_streaming)
> +	continue;
> +
> +      if (e->caller->in_other_partition
> +	  || !lto_symtab_encoder_in_partition_p (encoder, e->caller))
> +	return true;
> +    }
>    return false;
>  }

> --- a/gcc/lto-section-names.h
> +++ b/gcc/lto-section-names.h
> @@ -25,6 +25,11 @@ along with GCC; see the file COPYING3.  If not see
>     name for the functions and static_initializers.  For other types of
>     sections a '.' and the section type are appended.  */
>  #define LTO_SECTION_NAME_PREFIX ".gnu.lto_"
> +#define OMP_SECTION_NAME_PREFIX ".gnu.target_lto_"

What about:

    #define OFFLOAD_SECTION_NAME_PREFIX ".gnu.offload_lto_"

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -8337,6 +8345,11 @@ expand_omp_target (struct omp_region *region)
>        push_cfun (child_cfun);
>        cgraph_edge::rebuild_edges ();
>  
> +      /* Prevent IPA from removing child_fn as unreachable, since there are no
> +	 refs from the parent function to the target side child_fn.  */
> +      node = cgraph_node::get (child_fn);
> +      node->mark_force_output ();
> +
>        /* Some EH regions might become dead, see PR34608.  If
>  	 pass_cleanup_cfg isn't the first pass to happen with the
>  	 new child, these dead EH edges might cause problems.


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-09-30 11:40     ` Thomas Schwinge
@ 2014-10-01 16:13       ` Ilya Verbin
  2014-10-08  8:45         ` Jakub Jelinek
  2014-10-15 14:28         ` Richard Biener
  0 siblings, 2 replies; 62+ messages in thread
From: Ilya Verbin @ 2014-10-01 16:13 UTC (permalink / raw)
  To: Thomas Schwinge, Jan Hubicka, Richard Biener
  Cc: Jakub Jelinek, gcc-patches, Kirill Yukhin, Ilya Tocar,
	Andrey Turetskiy, Bernd Schmidt

On 30 Sep 13:40, Thomas Schwinge wrote:
> As just discussed for the libgcc changes in
> <http://news.gmane.org/find-root.php?message_id=%3C87d2ad73ze.fsf%40schwinge.name%3E>,
> just some suggestions regarding the terminology, where I think that the
> term »target« might be confusing in comments or symbols' names.  That is,
> in the following, »target« should possibly be replaced by »offload[ing]«
> or similar:
> 
> What about:
> 
>     #define OFFLOAD_SECTION_NAME_PREFIX ".gnu.offload_lto_"

Renamed, patch is updated.

Thanks,
  -- Ilya


gcc/
	* cgraph.h (symtab_node): Add need_lto_streaming flag.
	* cgraphunit.c: Include lto-section-names.h.
	(initialize_offload): New function.
	(ipa_passes): Initialize offload and call ipa_write_summaries if there
	is something to write to OFFLOAD_SECTION_NAME_PREFIX sections.
	(symbol_table::compile): Call lto_streamer_hooks_init under flag_openmp.
	* ipa-inline-analysis.c (inline_generate_summary): Do not exit under
	flag_openmp.
	(inline_free_summary): Always remove hooks.
	* lto-cgraph.c (referenced_from_other_partition_p): Ignore references
	from non-offloadable nodes while streaming a node into offload section.
	(reachable_from_other_partition_p): Likewise.
	(select_what_to_stream): New function.
	(compute_ltrans_boundary): Do not call
	lto_set_symtab_encoder_in_partition if the node should not be streamed.
	* lto-section-names.h (OFFLOAD_SECTION_NAME_PREFIX): Define.
	(section_name_prefix): Declare.
	* lto-streamer.c (section_name_prefix): New variable.
	(lto_get_section_name): Use section_name_prefix instead of
	LTO_SECTION_NAME_PREFIX.
	* lto-streamer.h (select_what_to_stream): Declare.
	* omp-low.c (is_targetreg_ctx): New function.
	(create_omp_child_function, check_omp_nesting_restrictions): Use it.
	(expand_omp_target): Set mark_force_output for the offloaded functions.
	(lower_omp_critical): Add target attribute for omp critical symbol.
	* passes.c (ipa_write_summaries): New argument offload_lto_mode.  Call
	select_what_to_stream.  Do not call lto_set_symtab_encoder_in_partition
	if the node should not be streamed out.
	* tree-pass.h (ipa_write_summaries): New bool argument.
gcc/lto/
	* lto-object.c (lto_obj_add_section): Use section_name_prefix instead of
	LTO_SECTION_NAME_PREFIX.
	* lto-partition.c (lto_promote_cross_file_statics): Call
	select_what_to_stream.
	* lto.c (lto_section_with_id): Use section_name_prefix instead of
	LTO_SECTION_NAME_PREFIX.
	(read_cgraph_and_symbols): Read OFFLOAD_SECTION_NAME_PREFIX sections, if
	being built as an offload compiler.

---

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 4fd58a5..df0b0e2 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -444,6 +444,10 @@ public:
   /* Set when init priority is set.  */
   unsigned in_init_priority_hash : 1;
 
+  /* Set when symbol needs to be streamed into LTO bytecode for LTO, or in case
+     of offloading, for separate compilation for a different target.  */
+  unsigned need_lto_streaming : 1;
+
 
   /* Ordering of all symtab entries.  */
   int order;
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index d463505..5eb9d64 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -211,6 +211,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-nested.h"
 #include "gimplify.h"
 #include "dbgcnt.h"
+#include "lto-section-names.h"
 
 /* Queue of cgraph nodes scheduled to be added into cgraph.  This is a
    secondary queue used during optimization to accommodate passes that
@@ -1994,9 +1995,40 @@ output_in_order (bool no_reorder)
   free (nodes);
 }
 
+/* Check whether there is at least one function or global variable to offload.
+   */
+
+static bool
+initialize_offload (void)
+{
+  bool have_offload = false;
+  struct cgraph_node *node;
+  struct varpool_node *vnode;
+
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (node->decl)))
+      {
+	have_offload = true;
+	break;
+      }
+
+  FOR_EACH_DEFINED_VARIABLE (vnode)
+    {
+      if (!lookup_attribute ("omp declare target",
+			     DECL_ATTRIBUTES (vnode->decl))
+	  || TREE_CODE (vnode->decl) != VAR_DECL
+	  || DECL_SIZE (vnode->decl) == 0)
+	continue;
+      have_offload = true;
+    }
+
+  return have_offload;
+}
+
 static void
 ipa_passes (void)
 {
+  bool have_offload = false;
   gcc::pass_manager *passes = g->get_passes ();
 
   set_cfun (NULL);
@@ -2004,6 +2036,14 @@ ipa_passes (void)
   gimple_register_cfg_hooks ();
   bitmap_obstack_initialize (NULL);
 
+  if (!in_lto_p && flag_openmp)
+    {
+      have_offload = initialize_offload ();
+      /* OpenMP offloading requires LTO infrastructure.  */
+      if (have_offload)
+	flag_generate_lto = 1;
+    }
+
   invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_START, NULL);
 
   if (!in_lto_p)
@@ -2041,7 +2081,18 @@ ipa_passes (void)
     targetm.asm_out.lto_start ();
 
   if (!in_lto_p)
-    ipa_write_summaries ();
+    {
+      if (have_offload)
+	{
+	  section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
+	  ipa_write_summaries (true);
+	}
+      if (flag_lto)
+	{
+	  section_name_prefix = LTO_SECTION_NAME_PREFIX;
+	  ipa_write_summaries (false);
+	}
+    }
 
   if (flag_generate_lto)
     targetm.asm_out.lto_end ();
@@ -2122,7 +2173,7 @@ symbol_table::compile (void)
   state = IPA;
 
   /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
-  if (flag_lto)
+  if (flag_lto || flag_openmp)
     lto_streamer_hooks_init ();
 
   /* Don't run the IPA passes if there was any error or sorry messages.  */
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 38f56d2..076a1e8 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -4010,7 +4010,7 @@ inline_generate_summary (void)
 
   /* When not optimizing, do not bother to analyze.  Inlining is still done
      because edge redirection needs to happen there.  */
-  if (!optimize && !flag_lto && !flag_wpa)
+  if (!optimize && !flag_lto && !flag_wpa && !flag_openmp)
     return;
 
   function_insertion_hook_holder =
@@ -4325,11 +4325,6 @@ void
 inline_free_summary (void)
 {
   struct cgraph_node *node;
-  if (!inline_edge_summary_vec.exists ())
-    return;
-  FOR_EACH_DEFINED_FUNCTION (node)
-    if (!node->alias)
-      reset_inline_summary (node);
   if (function_insertion_hook_holder)
     symtab->remove_cgraph_insertion_hook (function_insertion_hook_holder);
   function_insertion_hook_holder = NULL;
@@ -4345,6 +4340,11 @@ inline_free_summary (void)
   if (edge_duplication_hook_holder)
     symtab->remove_edge_duplication_hook (edge_duplication_hook_holder);
   edge_duplication_hook_holder = NULL;
+  if (!inline_edge_summary_vec.exists ())
+    return;
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (!node->alias)
+      reset_inline_summary (node);
   vec_free (inline_summary_vec);
   inline_edge_summary_vec.release ();
   if (edge_predicate_pool)
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 0584946..c1fccfa 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -321,6 +321,11 @@ referenced_from_other_partition_p (symtab_node *node, lto_symtab_encoder_t encod
 
   for (i = 0; node->iterate_referring (i, ref); i++)
     {
+      /* Ignore references from non-offloadable nodes while streaming NODE into
+	 offload LTO section.  */
+      if (!ref->referring->need_lto_streaming)
+	continue;
+
       if (ref->referring->in_other_partition
           || !lto_symtab_encoder_in_partition_p (encoder, ref->referring))
 	return true;
@@ -339,9 +344,16 @@ reachable_from_other_partition_p (struct cgraph_node *node, lto_symtab_encoder_t
   if (node->global.inlined_to)
     return false;
   for (e = node->callers; e; e = e->next_caller)
-    if (e->caller->in_other_partition
-	|| !lto_symtab_encoder_in_partition_p (encoder, e->caller))
-      return true;
+    {
+      /* Ignore references from non-offloadable nodes while streaming NODE into
+	 offload LTO section.  */
+      if (!e->caller->need_lto_streaming)
+	continue;
+
+      if (e->caller->in_other_partition
+	  || !lto_symtab_encoder_in_partition_p (encoder, e->caller))
+	return true;
+    }
   return false;
 }
 
@@ -802,6 +814,18 @@ create_references (lto_symtab_encoder_t encoder, symtab_node *node)
       lto_symtab_encoder_encode (encoder, ref->referred);
 }
 
+/* Select what needs to be streamed out.  In regular lto mode stream everything.
+   In offload lto mode stream only stuff marked with an attribute.  */
+void
+select_what_to_stream (bool offload_lto_mode)
+{
+  struct symtab_node *snode;
+  FOR_EACH_SYMBOL (snode)
+    snode->need_lto_streaming
+      = !offload_lto_mode || lookup_attribute ("omp declare target",
+					       DECL_ATTRIBUTES (snode->decl));
+}
+
 /* Find all symbols we want to stream into given partition and insert them
    to encoders.
 
@@ -828,6 +852,8 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
        !lsei_end_p (lsei); lsei_next_function_in_partition (&lsei))
     {
       struct cgraph_node *node = lsei_cgraph_node (lsei);
+      if (!node->need_lto_streaming)
+	continue;
       add_node_to (encoder, node, true);
       lto_set_symtab_encoder_in_partition (encoder, node);
       create_references (encoder, node);
@@ -844,6 +870,8 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
     {
       varpool_node *vnode = lsei_varpool_node (lsei);
 
+      if (!vnode->need_lto_streaming)
+	continue;
       lto_set_symtab_encoder_in_partition (encoder, vnode);
       lto_set_symtab_encoder_encode_initializer (encoder, vnode);
       create_references (encoder, vnode);
diff --git a/gcc/lto-section-names.h b/gcc/lto-section-names.h
index cb75230..f5dbed2 100644
--- a/gcc/lto-section-names.h
+++ b/gcc/lto-section-names.h
@@ -25,6 +25,11 @@ along with GCC; see the file COPYING3.  If not see
    name for the functions and static_initializers.  For other types of
    sections a '.' and the section type are appended.  */
 #define LTO_SECTION_NAME_PREFIX ".gnu.lto_"
+#define OFFLOAD_SECTION_NAME_PREFIX ".gnu.offload_lto_"
+
+/* Can be either OFFLOAD_SECTION_NAME_PREFIX when we stream IR for offload
+   compiler, or LTO_SECTION_NAME_PREFIX for LTO case.  */
+extern const char *section_name_prefix;
 
 /* Segment name for LTO sections.  This is only used for Mach-O.  */
 
diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
index 3480723..161e12d 100644
--- a/gcc/lto-streamer.c
+++ b/gcc/lto-streamer.c
@@ -48,6 +48,7 @@ struct lto_stats_d lto_stats;
 static bitmap_obstack lto_obstack;
 static bool lto_obstack_initialized;
 
+const char *section_name_prefix = LTO_SECTION_NAME_PREFIX;
 
 /* Return a string representing LTO tag TAG.  */
 
@@ -177,7 +178,7 @@ lto_get_section_name (int section_type, const char *name, struct lto_file_decl_d
     sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
   else
     sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
-  return concat (LTO_SECTION_NAME_PREFIX, sep, add, post, NULL);
+  return concat (section_name_prefix, sep, add, post, NULL);
 }
 
 
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 4bec969..ba00ab4 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -831,6 +831,7 @@ bool referenced_from_this_partition_p (symtab_node *,
 bool reachable_from_this_partition_p (struct cgraph_node *,
 				      lto_symtab_encoder_t);
 lto_symtab_encoder_t compute_ltrans_boundary (lto_symtab_encoder_t encoder);
+void select_what_to_stream (bool);
 
 
 /* In lto-symtab.c.  */
diff --git a/gcc/lto/lto-object.c b/gcc/lto/lto-object.c
index 323f7b2..4ee752f 100644
--- a/gcc/lto/lto-object.c
+++ b/gcc/lto/lto-object.c
@@ -230,8 +230,7 @@ lto_obj_add_section (void *data, const char *name, off_t offset,
   void **slot;
   struct lto_section_list *list = loasd->list;
 
-  if (strncmp (name, LTO_SECTION_NAME_PREFIX,
-	       strlen (LTO_SECTION_NAME_PREFIX)) != 0)
+  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
     return 1;
 
   new_name = xstrdup (name);
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 0451a66..aae2be9 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -920,6 +920,8 @@ lto_promote_cross_file_statics (void)
 
   gcc_assert (flag_wpa);
 
+  select_what_to_stream (false);
+
   /* First compute boundaries.  */
   n_sets = ltrans_partitions.length ();
   for (i = 0; i < n_sets; i++)
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 6cbb178..0646da5 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -2125,7 +2125,7 @@ lto_section_with_id (const char *name, unsigned HOST_WIDE_INT *id)
 {
   const char *s;
 
-  if (strncmp (name, LTO_SECTION_NAME_PREFIX, strlen (LTO_SECTION_NAME_PREFIX)))
+  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
     return 0;
   s = strrchr (name, '.');
   return s && sscanf (s, "." HOST_WIDE_INT_PRINT_HEX_PURE, id) == 1;
@@ -2899,6 +2899,10 @@ read_cgraph_and_symbols (unsigned nfiles, const char **fnames)
 
   timevar_push (TV_IPA_LTO_DECL_IN);
 
+#ifdef ACCEL_COMPILER
+    section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
+#endif
+
   real_file_decl_data
     = decl_data = ggc_cleared_vec_alloc<lto_file_decl_data_ptr> (nfiles + 1);
   real_file_count = nfiles;
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index eb0a7ee..6156e2f 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -257,6 +257,16 @@ is_parallel_ctx (omp_context *ctx)
 }
 
 
+/* Return true if CTX is for an omp target region.  */
+
+static inline bool
+is_targetreg_ctx (omp_context *ctx)
+{
+  return gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
+	 && gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_REGION;
+}
+
+
 /* Return true if CTX is for an omp task.  */
 
 static inline bool
@@ -1930,9 +1940,7 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
     {
       omp_context *octx;
       for (octx = ctx; octx; octx = octx->outer)
-	if (gimple_code (octx->stmt) == GIMPLE_OMP_TARGET
-	    && gimple_omp_target_kind (octx->stmt)
-	       == GF_OMP_TARGET_KIND_REGION)
+	if (is_targetreg_ctx (octx))
 	  {
 	    target_p = true;
 	    break;
@@ -2588,8 +2596,7 @@ check_omp_nesting_restrictions (gimple stmt, omp_context *ctx)
       break;
     case GIMPLE_OMP_TARGET:
       for (; ctx != NULL; ctx = ctx->outer)
-	if (gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
-	    && gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_REGION)
+	if (is_targetreg_ctx (ctx))
 	  {
 	    const char *name;
 	    switch (gimple_omp_target_kind (stmt))
@@ -8206,6 +8213,7 @@ expand_omp_target (struct omp_region *region)
   if (kind == GF_OMP_TARGET_KIND_REGION)
     {
       unsigned srcidx, dstidx, num;
+      struct cgraph_node *node;
 
       /* If the target region needs data sent from the parent
 	 function, then the very first statement (except possible
@@ -8337,6 +8345,11 @@ expand_omp_target (struct omp_region *region)
       push_cfun (child_cfun);
       cgraph_edge::rebuild_edges ();
 
+      /* Prevent IPA from removing child_fn as unreachable, since there are no
+	 refs from the parent function to child_fn in offload LTO mode.  */
+      node = cgraph_node::get (child_fn);
+      node->mark_force_output ();
+
       /* Some EH regions might become dead, see PR34608.  If
 	 pass_cleanup_cfg isn't the first pass to happen with the
 	 new child, these dead EH edges might cause problems.
@@ -9207,6 +9220,19 @@ lower_omp_critical (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	  DECL_COMMON (decl) = 1;
 	  DECL_ARTIFICIAL (decl) = 1;
 	  DECL_IGNORED_P (decl) = 1;
+
+	  /* If '#pragma omp critical' is inside target region, the symbol must
+	     have an 'omp declare target' attribute.  */
+	  omp_context *octx;
+	  for (octx = ctx->outer; octx; octx = octx->outer)
+	    if (is_targetreg_ctx (octx))
+	      {
+		DECL_ATTRIBUTES (decl)
+		  = tree_cons (get_identifier ("omp declare target"),
+			       NULL_TREE, DECL_ATTRIBUTES (decl));
+		break;
+	      }
+
 	  varpool_node::finalize_decl (decl);
 
 	  splay_tree_insert (critical_name_mutexes, (splay_tree_key) name,
diff --git a/gcc/passes.c b/gcc/passes.c
index 5001c3d..0d5667d 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -2297,7 +2297,7 @@ ipa_write_summaries_1 (lto_symtab_encoder_t encoder)
 /* Write out summaries for all the nodes in the callgraph.  */
 
 void
-ipa_write_summaries (void)
+ipa_write_summaries (bool offload_lto_mode)
 {
   lto_symtab_encoder_t encoder;
   int i, order_pos;
@@ -2308,6 +2308,8 @@ ipa_write_summaries (void)
   if (!flag_generate_lto || seen_error ())
     return;
 
+  select_what_to_stream (offload_lto_mode);
+
   encoder = lto_symtab_encoder_new (false);
 
   /* Create the callgraph set in the same order used in
@@ -2334,15 +2336,16 @@ ipa_write_summaries (void)
 	  renumber_gimple_stmt_uids ();
 	  pop_cfun ();
 	}
-      if (node->definition)
+      if (node->definition && node->need_lto_streaming)
         lto_set_symtab_encoder_in_partition (encoder, node);
     }
 
   FOR_EACH_DEFINED_FUNCTION (node)
-    if (node->alias)
+    if (node->alias && node->need_lto_streaming)
       lto_set_symtab_encoder_in_partition (encoder, node);
   FOR_EACH_DEFINED_VARIABLE (vnode)
-    lto_set_symtab_encoder_in_partition (encoder, vnode);
+    if (vnode->need_lto_streaming)
+      lto_set_symtab_encoder_in_partition (encoder, vnode);
 
   ipa_write_summaries_1 (compute_ltrans_boundary (encoder));
 
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index ed109c3..0bc5ca1 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -592,7 +592,7 @@ extern void pass_fini_dump_file (opt_pass *);
 extern const char *get_current_pass_name (void);
 extern void print_current_pass (FILE *);
 extern void debug_pass (void);
-extern void ipa_write_summaries (void);
+extern void ipa_write_summaries (bool);
 extern void ipa_write_optimization_summaries (struct lto_symtab_encoder_d *);
 extern void ipa_read_summaries (void);
 extern void ipa_read_optimization_summaries (void);
-- 
1.7.1

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-10-01 16:13       ` Ilya Verbin
@ 2014-10-08  8:45         ` Jakub Jelinek
  2014-10-08  9:13           ` Jakub Jelinek
  2014-10-15 14:28         ` Richard Biener
  1 sibling, 1 reply; 62+ messages in thread
From: Jakub Jelinek @ 2014-10-08  8:45 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Thomas Schwinge, Jan Hubicka, Richard Biener, gcc-patches,
	Kirill Yukhin, Ilya Tocar, Andrey Turetskiy, Bernd Schmidt

On Wed, Oct 01, 2014 at 08:13:32PM +0400, Ilya Verbin wrote:
> @@ -1994,9 +1995,40 @@ output_in_order (bool no_reorder)
>    free (nodes);
>  }
>  
> +/* Check whether there is at least one function or global variable to offload.
> +   */

The */ alone on a line is weird, put the last word on the next line too.

> +  FOR_EACH_DEFINED_VARIABLE (vnode)
> +    {
> +      if (!lookup_attribute ("omp declare target",
> +			     DECL_ATTRIBUTES (vnode->decl))
> +	  || TREE_CODE (vnode->decl) != VAR_DECL
> +	  || DECL_SIZE (vnode->decl) == 0)

While I hope the varpool code puts only decls that have DECL_ATTRIBUTES
into FOR_EACH_DEFINED_VARIABLE, it would be better to put the
less expensive tests first, i.e. the last two first, then lookup_attribute.
Also, DECL_SIZE is a tree, so == NULL_TREE?

And, once there is an option to select which offload targets to generate
code for (or none), initialize_offload () should supposedly return false
if the user requested no offloading on the command line.

The omp-low.c changes look good, for the cgraph/LTO stuff I defer to Honza
and/or Richard, if they are fine with the changes, so am I.

	Jakub

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-10-08  8:45         ` Jakub Jelinek
@ 2014-10-08  9:13           ` Jakub Jelinek
  0 siblings, 0 replies; 62+ messages in thread
From: Jakub Jelinek @ 2014-10-08  9:13 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Thomas Schwinge, Jan Hubicka, Richard Biener, gcc-patches,
	Kirill Yukhin, Ilya Tocar, Andrey Turetskiy, Bernd Schmidt

On Wed, Oct 08, 2014 at 10:45:22AM +0200, Jakub Jelinek wrote:
> And, once there is an option to select which offload targets to generate
> code for (or none), initialize_offload () should supposedly return false
> if the user requested no offloading on the command line.

After some thought, I take this back.  We should always stream
.gnu.offload_lto_* if we have any target regions or omp declare target
functions/vars.  The decision should be done during linking, if the user
wants only host fallback, during linking we should just throw away those
.gnu.offload_lto_* sections (could also tweak defaults for that e.g.
in libgomp.spec).  Thinking with a distro hat now, if gcc is configured
as offloading for say MIC, PTX and HSA?, it would be nice if the default
choice was dependent on which offloading compilers the user decided to actually
install.  So one could add a default for
%{!foffloading:-foffloading=x86_64-intelmic-linux-gnu}
if MIC offloading compiler is installed and PTX/HSA is not (for example,
don't remember the name of the option we discussed).

	Jakub

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-10-01 16:13       ` Ilya Verbin
  2014-10-08  8:45         ` Jakub Jelinek
@ 2014-10-15 14:28         ` Richard Biener
  2014-10-20 11:21           ` Ilya Verbin
  1 sibling, 1 reply; 62+ messages in thread
From: Richard Biener @ 2014-10-15 14:28 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Thomas Schwinge, Jan Hubicka, Jakub Jelinek, gcc-patches,
	Kirill Yukhin, Ilya Tocar, Andrey Turetskiy, Bernd Schmidt

[-- Attachment #1: Type: TEXT/PLAIN, Size: 19934 bytes --]

On Wed, 1 Oct 2014, Ilya Verbin wrote:

> On 30 Sep 13:40, Thomas Schwinge wrote:
> > As just discussed for the libgcc changes in
> > <http://news.gmane.org/find-root.php?message_id=%3C87d2ad73ze.fsf%40schwinge.name%3E>,
> > just some suggestions regarding the terminology, where I think that the
> > term »target« might be confusing in comments or symbols' names.  That is,
> > in the following, »target« should possibly be replaced by »offload[ing]«
> > or similar:
> > 
> > What about:
> > 
> >     #define OFFLOAD_SECTION_NAME_PREFIX ".gnu.offload_lto_"
> 
> Renamed, patch is updated.
> 
> Thanks,
>   -- Ilya
> 
> 
> gcc/
> 	* cgraph.h (symtab_node): Add need_lto_streaming flag.
> 	* cgraphunit.c: Include lto-section-names.h.
> 	(initialize_offload): New function.
> 	(ipa_passes): Initialize offload and call ipa_write_summaries if there
> 	is something to write to OFFLOAD_SECTION_NAME_PREFIX sections.
> 	(symbol_table::compile): Call lto_streamer_hooks_init under flag_openmp.
> 	* ipa-inline-analysis.c (inline_generate_summary): Do not exit under
> 	flag_openmp.
> 	(inline_free_summary): Always remove hooks.
> 	* lto-cgraph.c (referenced_from_other_partition_p): Ignore references
> 	from non-offloadable nodes while streaming a node into offload section.
> 	(reachable_from_other_partition_p): Likewise.
> 	(select_what_to_stream): New function.
> 	(compute_ltrans_boundary): Do not call
> 	lto_set_symtab_encoder_in_partition if the node should not be streamed.
> 	* lto-section-names.h (OFFLOAD_SECTION_NAME_PREFIX): Define.
> 	(section_name_prefix): Declare.
> 	* lto-streamer.c (section_name_prefix): New variable.
> 	(lto_get_section_name): Use section_name_prefix instead of
> 	LTO_SECTION_NAME_PREFIX.
> 	* lto-streamer.h (select_what_to_stream): Declare.
> 	* omp-low.c (is_targetreg_ctx): New function.
> 	(create_omp_child_function, check_omp_nesting_restrictions): Use it.
> 	(expand_omp_target): Set mark_force_output for the offloaded functions.
> 	(lower_omp_critical): Add target attribute for omp critical symbol.
> 	* passes.c (ipa_write_summaries): New argument offload_lto_mode.  Call
> 	select_what_to_stream.  Do not call lto_set_symtab_encoder_in_partition
> 	if the node should not be streamed out.
> 	* tree-pass.h (ipa_write_summaries): New bool argument.
> gcc/lto/
> 	* lto-object.c (lto_obj_add_section): Use section_name_prefix instead of
> 	LTO_SECTION_NAME_PREFIX.
> 	* lto-partition.c (lto_promote_cross_file_statics): Call
> 	select_what_to_stream.
> 	* lto.c (lto_section_with_id): Use section_name_prefix instead of
> 	LTO_SECTION_NAME_PREFIX.
> 	(read_cgraph_and_symbols): Read OFFLOAD_SECTION_NAME_PREFIX sections, if
> 	being built as an offload compiler.
> 
> ---
> 
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 4fd58a5..df0b0e2 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -444,6 +444,10 @@ public:
>    /* Set when init priority is set.  */
>    unsigned in_init_priority_hash : 1;
>  
> +  /* Set when symbol needs to be streamed into LTO bytecode for LTO, or in case
> +     of offloading, for separate compilation for a different target.  */
> +  unsigned need_lto_streaming : 1;
> +
>  
>    /* Ordering of all symtab entries.  */
>    int order;
> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> index d463505..5eb9d64 100644
> --- a/gcc/cgraphunit.c
> +++ b/gcc/cgraphunit.c
> @@ -211,6 +211,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-nested.h"
>  #include "gimplify.h"
>  #include "dbgcnt.h"
> +#include "lto-section-names.h"
>  
>  /* Queue of cgraph nodes scheduled to be added into cgraph.  This is a
>     secondary queue used during optimization to accommodate passes that
> @@ -1994,9 +1995,40 @@ output_in_order (bool no_reorder)
>    free (nodes);
>  }
>  
> +/* Check whether there is at least one function or global variable to offload.
> +   */
> +
> +static bool
> +initialize_offload (void)
> +{
> +  bool have_offload = false;
> +  struct cgraph_node *node;
> +  struct varpool_node *vnode;
> +
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +    if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (node->decl)))
> +      {
> +	have_offload = true;
> +	break;
> +      }
> +
> +  FOR_EACH_DEFINED_VARIABLE (vnode)
> +    {
> +      if (!lookup_attribute ("omp declare target",
> +			     DECL_ATTRIBUTES (vnode->decl))
> +	  || TREE_CODE (vnode->decl) != VAR_DECL
> +	  || DECL_SIZE (vnode->decl) == 0)
> +	continue;
> +      have_offload = true;
> +    }
> +
> +  return have_offload;
> +}
> +

I wonder if we can avoid the above by means of a global have_offload
flag?  (or inside gcc::context)

>  static void
>  ipa_passes (void)
>  {
> +  bool have_offload = false;
>    gcc::pass_manager *passes = g->get_passes ();
>  
>    set_cfun (NULL);
> @@ -2004,6 +2036,14 @@ ipa_passes (void)
>    gimple_register_cfg_hooks ();
>    bitmap_obstack_initialize (NULL);
>  
> +  if (!in_lto_p && flag_openmp)

As -fopenmp is not generally available it's odd to test
flag_openmp (though that is available everywhere as
implementation detail).  Doesn't offloading work
without -fopenmp?

> +    {
> +      have_offload = initialize_offload ();
> +      /* OpenMP offloading requires LTO infrastructure.  */
> +      if (have_offload)
> +	flag_generate_lto = 1;
> +    }
> +
>    invoke_plugin_callbacks (PLUGIN_ALL_IPA_PASSES_START, NULL);
>  
>    if (!in_lto_p)
> @@ -2041,7 +2081,18 @@ ipa_passes (void)
>      targetm.asm_out.lto_start ();
>  
>    if (!in_lto_p)
> -    ipa_write_summaries ();
> +    {
> +      if (have_offload)
> +	{
> +	  section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
> +	  ipa_write_summaries (true);
> +	}
> +      if (flag_lto)
> +	{
> +	  section_name_prefix = LTO_SECTION_NAME_PREFIX;
> +	  ipa_write_summaries (false);
> +	}
> +    }
>  
>    if (flag_generate_lto)
>      targetm.asm_out.lto_end ();
> @@ -2122,7 +2173,7 @@ symbol_table::compile (void)
>    state = IPA;
>  
>    /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
> -  if (flag_lto)
> +  if (flag_lto || flag_openmp)

flag_generate_lto?

>      lto_streamer_hooks_init ();
>  
>    /* Don't run the IPA passes if there was any error or sorry messages.  */
> diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
> index 38f56d2..076a1e8 100644
> --- a/gcc/ipa-inline-analysis.c
> +++ b/gcc/ipa-inline-analysis.c
> @@ -4010,7 +4010,7 @@ inline_generate_summary (void)
>  
>    /* When not optimizing, do not bother to analyze.  Inlining is still done
>       because edge redirection needs to happen there.  */
> -  if (!optimize && !flag_lto && !flag_wpa)
> +  if (!optimize && !flag_lto && !flag_wpa && !flag_openmp)
>      return;

Likewise !flag_generate_lto

>    function_insertion_hook_holder =
> @@ -4325,11 +4325,6 @@ void
>  inline_free_summary (void)
>  {
>    struct cgraph_node *node;
> -  if (!inline_edge_summary_vec.exists ())
> -    return;
> -  FOR_EACH_DEFINED_FUNCTION (node)
> -    if (!node->alias)
> -      reset_inline_summary (node);
>    if (function_insertion_hook_holder)
>      symtab->remove_cgraph_insertion_hook (function_insertion_hook_holder);
>    function_insertion_hook_holder = NULL;
> @@ -4345,6 +4340,11 @@ inline_free_summary (void)
>    if (edge_duplication_hook_holder)
>      symtab->remove_edge_duplication_hook (edge_duplication_hook_holder);
>    edge_duplication_hook_holder = NULL;
> +  if (!inline_edge_summary_vec.exists ())
> +    return;
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +    if (!node->alias)
> +      reset_inline_summary (node);
>    vec_free (inline_summary_vec);
>    inline_edge_summary_vec.release ();
>    if (edge_predicate_pool)
> diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> index 0584946..c1fccfa 100644
> --- a/gcc/lto-cgraph.c
> +++ b/gcc/lto-cgraph.c
> @@ -321,6 +321,11 @@ referenced_from_other_partition_p (symtab_node *node, lto_symtab_encoder_t encod
>  
>    for (i = 0; node->iterate_referring (i, ref); i++)
>      {
> +      /* Ignore references from non-offloadable nodes while streaming NODE into
> +	 offload LTO section.  */
> +      if (!ref->referring->need_lto_streaming)
> +	continue;
> +
>        if (ref->referring->in_other_partition
>            || !lto_symtab_encoder_in_partition_p (encoder, ref->referring))
>  	return true;
> @@ -339,9 +344,16 @@ reachable_from_other_partition_p (struct cgraph_node *node, lto_symtab_encoder_t
>    if (node->global.inlined_to)
>      return false;
>    for (e = node->callers; e; e = e->next_caller)
> -    if (e->caller->in_other_partition
> -	|| !lto_symtab_encoder_in_partition_p (encoder, e->caller))
> -      return true;
> +    {
> +      /* Ignore references from non-offloadable nodes while streaming NODE into
> +	 offload LTO section.  */
> +      if (!e->caller->need_lto_streaming)
> +	continue;
> +
> +      if (e->caller->in_other_partition
> +	  || !lto_symtab_encoder_in_partition_p (encoder, e->caller))
> +	return true;
> +    }
>    return false;
>  }
>  
> @@ -802,6 +814,18 @@ create_references (lto_symtab_encoder_t encoder, symtab_node *node)
>        lto_symtab_encoder_encode (encoder, ref->referred);
>  }
>  
> +/* Select what needs to be streamed out.  In regular lto mode stream everything.
> +   In offload lto mode stream only stuff marked with an attribute.  */
> +void
> +select_what_to_stream (bool offload_lto_mode)
> +{
> +  struct symtab_node *snode;
> +  FOR_EACH_SYMBOL (snode)
> +    snode->need_lto_streaming
> +      = !offload_lto_mode || lookup_attribute ("omp declare target",
> +					       DECL_ATTRIBUTES (snode->decl));

I suppose I suggested this already earlier this year.  Why keep this
artificial attribute when you have a cgraph node flag?

> +}
> +
>  /* Find all symbols we want to stream into given partition and insert them
>     to encoders.
>  
> @@ -828,6 +852,8 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
>         !lsei_end_p (lsei); lsei_next_function_in_partition (&lsei))
>      {
>        struct cgraph_node *node = lsei_cgraph_node (lsei);
> +      if (!node->need_lto_streaming)
> +	continue;
>        add_node_to (encoder, node, true);
>        lto_set_symtab_encoder_in_partition (encoder, node);
>        create_references (encoder, node);
> @@ -844,6 +870,8 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
>      {
>        varpool_node *vnode = lsei_varpool_node (lsei);
>  
> +      if (!vnode->need_lto_streaming)
> +	continue;
>        lto_set_symtab_encoder_in_partition (encoder, vnode);
>        lto_set_symtab_encoder_encode_initializer (encoder, vnode);
>        create_references (encoder, vnode);

I leave all the partitioning stuff to Honza to review.

> diff --git a/gcc/lto-section-names.h b/gcc/lto-section-names.h
> index cb75230..f5dbed2 100644
> --- a/gcc/lto-section-names.h
> +++ b/gcc/lto-section-names.h
> @@ -25,6 +25,11 @@ along with GCC; see the file COPYING3.  If not see
>     name for the functions and static_initializers.  For other types of
>     sections a '.' and the section type are appended.  */
>  #define LTO_SECTION_NAME_PREFIX ".gnu.lto_"
> +#define OFFLOAD_SECTION_NAME_PREFIX ".gnu.offload_lto_"
> +
> +/* Can be either OFFLOAD_SECTION_NAME_PREFIX when we stream IR for offload
> +   compiler, or LTO_SECTION_NAME_PREFIX for LTO case.  */
> +extern const char *section_name_prefix;
>  
>  /* Segment name for LTO sections.  This is only used for Mach-O.  */
>  
> diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
> index 3480723..161e12d 100644
> --- a/gcc/lto-streamer.c
> +++ b/gcc/lto-streamer.c
> @@ -48,6 +48,7 @@ struct lto_stats_d lto_stats;
>  static bitmap_obstack lto_obstack;
>  static bool lto_obstack_initialized;
>  
> +const char *section_name_prefix = LTO_SECTION_NAME_PREFIX;
>  
>  /* Return a string representing LTO tag TAG.  */
>  
> @@ -177,7 +178,7 @@ lto_get_section_name (int section_type, const char *name, struct lto_file_decl_d
>      sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
>    else
>      sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
> -  return concat (LTO_SECTION_NAME_PREFIX, sep, add, post, NULL);
> +  return concat (section_name_prefix, sep, add, post, NULL);
>  }
>  
>  
> diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
> index 4bec969..ba00ab4 100644
> --- a/gcc/lto-streamer.h
> +++ b/gcc/lto-streamer.h
> @@ -831,6 +831,7 @@ bool referenced_from_this_partition_p (symtab_node *,
>  bool reachable_from_this_partition_p (struct cgraph_node *,
>  				      lto_symtab_encoder_t);
>  lto_symtab_encoder_t compute_ltrans_boundary (lto_symtab_encoder_t encoder);
> +void select_what_to_stream (bool);
>  
>  
>  /* In lto-symtab.c.  */
> diff --git a/gcc/lto/lto-object.c b/gcc/lto/lto-object.c
> index 323f7b2..4ee752f 100644
> --- a/gcc/lto/lto-object.c
> +++ b/gcc/lto/lto-object.c
> @@ -230,8 +230,7 @@ lto_obj_add_section (void *data, const char *name, off_t offset,
>    void **slot;
>    struct lto_section_list *list = loasd->list;
>  
> -  if (strncmp (name, LTO_SECTION_NAME_PREFIX,
> -	       strlen (LTO_SECTION_NAME_PREFIX)) != 0)
> +  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
>      return 1;
>  
>    new_name = xstrdup (name);
> diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
> index 0451a66..aae2be9 100644
> --- a/gcc/lto/lto-partition.c
> +++ b/gcc/lto/lto-partition.c
> @@ -920,6 +920,8 @@ lto_promote_cross_file_statics (void)
>  
>    gcc_assert (flag_wpa);
>  
> +  select_what_to_stream (false);
> +
>    /* First compute boundaries.  */
>    n_sets = ltrans_partitions.length ();
>    for (i = 0; i < n_sets; i++)
> diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
> index 6cbb178..0646da5 100644
> --- a/gcc/lto/lto.c
> +++ b/gcc/lto/lto.c
> @@ -2125,7 +2125,7 @@ lto_section_with_id (const char *name, unsigned HOST_WIDE_INT *id)
>  {
>    const char *s;
>  
> -  if (strncmp (name, LTO_SECTION_NAME_PREFIX, strlen (LTO_SECTION_NAME_PREFIX)))
> +  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
>      return 0;
>    s = strrchr (name, '.');
>    return s && sscanf (s, "." HOST_WIDE_INT_PRINT_HEX_PURE, id) == 1;
> @@ -2899,6 +2899,10 @@ read_cgraph_and_symbols (unsigned nfiles, const char **fnames)
>  
>    timevar_push (TV_IPA_LTO_DECL_IN);
>  
> +#ifdef ACCEL_COMPILER
> +    section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
> +#endif
> +
>    real_file_decl_data
>      = decl_data = ggc_cleared_vec_alloc<lto_file_decl_data_ptr> (nfiles + 1);
>    real_file_count = nfiles;
> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index eb0a7ee..6156e2f 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -257,6 +257,16 @@ is_parallel_ctx (omp_context *ctx)
>  }
>  
>  
> +/* Return true if CTX is for an omp target region.  */
> +
> +static inline bool
> +is_targetreg_ctx (omp_context *ctx)
> +{
> +  return gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
> +	 && gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_REGION;
> +}
> +
> +
>  /* Return true if CTX is for an omp task.  */
>  
>  static inline bool
> @@ -1930,9 +1940,7 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
>      {
>        omp_context *octx;
>        for (octx = ctx; octx; octx = octx->outer)
> -	if (gimple_code (octx->stmt) == GIMPLE_OMP_TARGET
> -	    && gimple_omp_target_kind (octx->stmt)
> -	       == GF_OMP_TARGET_KIND_REGION)
> +	if (is_targetreg_ctx (octx))
>  	  {
>  	    target_p = true;
>  	    break;
> @@ -2588,8 +2596,7 @@ check_omp_nesting_restrictions (gimple stmt, omp_context *ctx)
>        break;
>      case GIMPLE_OMP_TARGET:
>        for (; ctx != NULL; ctx = ctx->outer)
> -	if (gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
> -	    && gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_REGION)
> +	if (is_targetreg_ctx (ctx))
>  	  {
>  	    const char *name;
>  	    switch (gimple_omp_target_kind (stmt))
> @@ -8206,6 +8213,7 @@ expand_omp_target (struct omp_region *region)
>    if (kind == GF_OMP_TARGET_KIND_REGION)
>      {
>        unsigned srcidx, dstidx, num;
> +      struct cgraph_node *node;
>  
>        /* If the target region needs data sent from the parent
>  	 function, then the very first statement (except possible
> @@ -8337,6 +8345,11 @@ expand_omp_target (struct omp_region *region)
>        push_cfun (child_cfun);
>        cgraph_edge::rebuild_edges ();
>  
> +      /* Prevent IPA from removing child_fn as unreachable, since there are no
> +	 refs from the parent function to child_fn in offload LTO mode.  */
> +      node = cgraph_node::get (child_fn);
> +      node->mark_force_output ();
> +
>        /* Some EH regions might become dead, see PR34608.  If
>  	 pass_cleanup_cfg isn't the first pass to happen with the
>  	 new child, these dead EH edges might cause problems.
> @@ -9207,6 +9220,19 @@ lower_omp_critical (gimple_stmt_iterator *gsi_p, omp_context *ctx)
>  	  DECL_COMMON (decl) = 1;
>  	  DECL_ARTIFICIAL (decl) = 1;
>  	  DECL_IGNORED_P (decl) = 1;
> +
> +	  /* If '#pragma omp critical' is inside target region, the symbol must
> +	     have an 'omp declare target' attribute.  */
> +	  omp_context *octx;
> +	  for (octx = ctx->outer; octx; octx = octx->outer)
> +	    if (is_targetreg_ctx (octx))
> +	      {
> +		DECL_ATTRIBUTES (decl)
> +		  = tree_cons (get_identifier ("omp declare target"),
> +			       NULL_TREE, DECL_ATTRIBUTES (decl));

Here - why not set a flag on cgraph_get_node (decl) instead?

LTO bits are ok apart from the way you split partitioning which I
leave to Honza.  Also see my suggestion about that odd "omp declare 
target" attribute.

Thanks,
Richard.

> +		break;
> +	      }
> +
>  	  varpool_node::finalize_decl (decl);
>  
>  	  splay_tree_insert (critical_name_mutexes, (splay_tree_key) name,
> diff --git a/gcc/passes.c b/gcc/passes.c
> index 5001c3d..0d5667d 100644
> --- a/gcc/passes.c
> +++ b/gcc/passes.c
> @@ -2297,7 +2297,7 @@ ipa_write_summaries_1 (lto_symtab_encoder_t encoder)
>  /* Write out summaries for all the nodes in the callgraph.  */
>  
>  void
> -ipa_write_summaries (void)
> +ipa_write_summaries (bool offload_lto_mode)
>  {
>    lto_symtab_encoder_t encoder;
>    int i, order_pos;
> @@ -2308,6 +2308,8 @@ ipa_write_summaries (void)
>    if (!flag_generate_lto || seen_error ())
>      return;
>  
> +  select_what_to_stream (offload_lto_mode);
> +
>    encoder = lto_symtab_encoder_new (false);
>  
>    /* Create the callgraph set in the same order used in
> @@ -2334,15 +2336,16 @@ ipa_write_summaries (void)
>  	  renumber_gimple_stmt_uids ();
>  	  pop_cfun ();
>  	}
> -      if (node->definition)
> +      if (node->definition && node->need_lto_streaming)
>          lto_set_symtab_encoder_in_partition (encoder, node);
>      }
>  
>    FOR_EACH_DEFINED_FUNCTION (node)
> -    if (node->alias)
> +    if (node->alias && node->need_lto_streaming)
>        lto_set_symtab_encoder_in_partition (encoder, node);
>    FOR_EACH_DEFINED_VARIABLE (vnode)
> -    lto_set_symtab_encoder_in_partition (encoder, vnode);
> +    if (vnode->need_lto_streaming)
> +      lto_set_symtab_encoder_in_partition (encoder, vnode);
>  
>    ipa_write_summaries_1 (compute_ltrans_boundary (encoder));
>  
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index ed109c3..0bc5ca1 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -592,7 +592,7 @@ extern void pass_fini_dump_file (opt_pass *);
>  extern const char *get_current_pass_name (void);
>  extern void print_current_pass (FILE *);
>  extern void debug_pass (void);
> -extern void ipa_write_summaries (void);
> +extern void ipa_write_summaries (bool);
>  extern void ipa_write_optimization_summaries (struct lto_symtab_encoder_d *);
>  extern void ipa_read_summaries (void);
>  extern void ipa_read_optimization_summaries (void);
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-10-15 14:28         ` Richard Biener
@ 2014-10-20 11:21           ` Ilya Verbin
  2014-10-20 11:26             ` Jakub Jelinek
  2014-10-24 14:16             ` Ilya Verbin
  0 siblings, 2 replies; 62+ messages in thread
From: Ilya Verbin @ 2014-10-20 11:21 UTC (permalink / raw)
  To: Richard Biener
  Cc: Thomas Schwinge, Jan Hubicka, Jakub Jelinek, gcc-patches,
	Kirill Yukhin, Ilya Tocar, Andrey Turetskiy, Bernd Schmidt

On 15 Oct 16:23, Richard Biener wrote:
> > +static bool
> > +initialize_offload (void)
> > +{
> > +  bool have_offload = false;
> > +  struct cgraph_node *node;
> > +  struct varpool_node *vnode;
> > +
> > +  FOR_EACH_DEFINED_FUNCTION (node)
> > +    if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (node->decl)))
> > +      {
> > +	have_offload = true;
> > +	break;
> > +      }
> > +
> > +  FOR_EACH_DEFINED_VARIABLE (vnode)
> > +    {
> > +      if (!lookup_attribute ("omp declare target",
> > +			     DECL_ATTRIBUTES (vnode->decl))
> > +	  || TREE_CODE (vnode->decl) != VAR_DECL
> > +	  || DECL_SIZE (vnode->decl) == 0)
> > +	continue;
> > +      have_offload = true;
> > +    }
> > +
> > +  return have_offload;
> > +}
> > +
> 
> I wonder if we can avoid the above by means of a global have_offload
> flag?  (or inside gcc::context)

So you propose to set global have_offload flag somewhere in expand_omp_target,
etc. where functions and global variables are created?

> >  static void
> >  ipa_passes (void)
> >  {
> > +  bool have_offload = false;
> >    gcc::pass_manager *passes = g->get_passes ();
> >  
> >    set_cfun (NULL);
> > @@ -2004,6 +2036,14 @@ ipa_passes (void)
> >    gimple_register_cfg_hooks ();
> >    bitmap_obstack_initialize (NULL);
> >  
> > +  if (!in_lto_p && flag_openmp)
> 
> As -fopenmp is not generally available it's odd to test
> flag_openmp (though that is available everywhere as
> implementation detail).  Doesn't offloading work
> without -fopenmp?

In this patch series offloading is implemented only for OpenMP.
OpenACC guys will add flag_openacc here.

> >    /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
> > -  if (flag_lto)
> > +  if (flag_lto || flag_openmp)
> 
> flag_generate_lto?
> 
> >    /* When not optimizing, do not bother to analyze.  Inlining is still done
> >       because edge redirection needs to happen there.  */
> > -  if (!optimize && !flag_lto && !flag_wpa)
> > +  if (!optimize && !flag_lto && !flag_wpa && !flag_openmp)
> >      return;
> 
> Likewise !flag_generate_lto

Currently this is not working, since symbol_table::compile is executed before
ipa_passes.  But with global have_offload it should work.

> > +/* Select what needs to be streamed out.  In regular lto mode stream everything.
> > +   In offload lto mode stream only stuff marked with an attribute.  */
> > +void
> > +select_what_to_stream (bool offload_lto_mode)
> > +{
> > +  struct symtab_node *snode;
> > +  FOR_EACH_SYMBOL (snode)
> > +    snode->need_lto_streaming
> > +      = !offload_lto_mode || lookup_attribute ("omp declare target",
> > +					       DECL_ATTRIBUTES (snode->decl));
> 
> I suppose I suggested this already earlier this year.  Why keep this
> artificial attribute when you have a cgraph node flag?

> > +	  /* If '#pragma omp critical' is inside target region, the symbol must
> > +	     have an 'omp declare target' attribute.  */
> > +	  omp_context *octx;
> > +	  for (octx = ctx->outer; octx; octx = octx->outer)
> > +	    if (is_targetreg_ctx (octx))
> > +	      {
> > +		DECL_ATTRIBUTES (decl)
> > +		  = tree_cons (get_identifier ("omp declare target"),
> > +			       NULL_TREE, DECL_ATTRIBUTES (decl));
> 
> Here - why not set a flag on cgraph_get_node (decl) instead?

I thought that select_what_to_stream is exactly what you've suggested.
Could you please clarify this?  You propose to replace "omp declare target"
attribure with some cgraph node flag like need_offload?  But we'll need
need_lto_streaming anyway, since for LTO it should be 1 for all nodes, but for
offloading it should be equal to need_offload.

Thanks,
  -- Ilya

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-10-20 11:21           ` Ilya Verbin
@ 2014-10-20 11:26             ` Jakub Jelinek
  2014-10-24 14:16             ` Ilya Verbin
  1 sibling, 0 replies; 62+ messages in thread
From: Jakub Jelinek @ 2014-10-20 11:26 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Richard Biener, Thomas Schwinge, Jan Hubicka, gcc-patches,
	Kirill Yukhin, Ilya Tocar, Andrey Turetskiy, Bernd Schmidt

On Mon, Oct 20, 2014 at 03:19:35PM +0400, Ilya Verbin wrote:
> > > +	  /* If '#pragma omp critical' is inside target region, the symbol must
> > > +	     have an 'omp declare target' attribute.  */
> > > +	  omp_context *octx;
> > > +	  for (octx = ctx->outer; octx; octx = octx->outer)
> > > +	    if (is_targetreg_ctx (octx))
> > > +	      {
> > > +		DECL_ATTRIBUTES (decl)
> > > +		  = tree_cons (get_identifier ("omp declare target"),
> > > +			       NULL_TREE, DECL_ATTRIBUTES (decl));
> > 
> > Here - why not set a flag on cgraph_get_node (decl) instead?
> 
> I thought that select_what_to_stream is exactly what you've suggested.
> Could you please clarify this?  You propose to replace "omp declare target"
> attribure with some cgraph node flag like need_offload?  But we'll need
> need_lto_streaming anyway, since for LTO it should be 1 for all nodes, but for
> offloading it should be equal to need_offload.

Note, the attribute is created usually by the FEs, at points where
cgraph/varpool nodes can't be created yet.  So, it is not possible to get
rid of the artificial attribute easily, it could be cached in some
cgraph/varpool bit field of course.

	Jakub

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-10-20 11:21           ` Ilya Verbin
  2014-10-20 11:26             ` Jakub Jelinek
@ 2014-10-24 14:16             ` Ilya Verbin
  2014-10-24 14:29               ` Jakub Jelinek
  1 sibling, 1 reply; 62+ messages in thread
From: Ilya Verbin @ 2014-10-24 14:16 UTC (permalink / raw)
  To: Richard Biener
  Cc: Thomas Schwinge, Jan Hubicka, Jakub Jelinek, gcc-patches,
	Kirill Yukhin, Ilya Tocar, Andrey Turetskiy, Bernd Schmidt

On 20 Oct 15:19, Ilya Verbin wrote:
> On 15 Oct 16:23, Richard Biener wrote:
> > > +static bool
> > > +initialize_offload (void)
> > > +{
> > > +  bool have_offload = false;
> > > +  struct cgraph_node *node;
> > > +  struct varpool_node *vnode;
> > > +
> > > +  FOR_EACH_DEFINED_FUNCTION (node)
> > > +    if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (node->decl)))
> > > +      {
> > > +	have_offload = true;
> > > +	break;
> > > +      }
> > > +
> > > +  FOR_EACH_DEFINED_VARIABLE (vnode)
> > > +    {
> > > +      if (!lookup_attribute ("omp declare target",
> > > +			     DECL_ATTRIBUTES (vnode->decl))
> > > +	  || TREE_CODE (vnode->decl) != VAR_DECL
> > > +	  || DECL_SIZE (vnode->decl) == 0)
> > > +	continue;
> > > +      have_offload = true;
> > > +    }
> > > +
> > > +  return have_offload;
> > > +}
> > > +
> > 
> > I wonder if we can avoid the above by means of a global have_offload
> > flag?  (or inside gcc::context)
>
> > > +/* Select what needs to be streamed out.  In regular lto mode stream everything.
> > > +   In offload lto mode stream only stuff marked with an attribute.  */
> > > +void
> > > +select_what_to_stream (bool offload_lto_mode)
> > > +{
> > > +  struct symtab_node *snode;
> > > +  FOR_EACH_SYMBOL (snode)
> > > +    snode->need_lto_streaming
> > > +      = !offload_lto_mode || lookup_attribute ("omp declare target",
> > > +					       DECL_ATTRIBUTES (snode->decl));
> > 
> > I suppose I suggested this already earlier this year.  Why keep this
> > artificial attribute when you have a cgraph node flag?
> 
> > > +	  /* If '#pragma omp critical' is inside target region, the symbol must
> > > +	     have an 'omp declare target' attribute.  */
> > > +	  omp_context *octx;
> > > +	  for (octx = ctx->outer; octx; octx = octx->outer)
> > > +	    if (is_targetreg_ctx (octx))
> > > +	      {
> > > +		DECL_ATTRIBUTES (decl)
> > > +		  = tree_cons (get_identifier ("omp declare target"),
> > > +			       NULL_TREE, DECL_ATTRIBUTES (decl));
> > 
> > Here - why not set a flag on cgraph_get_node (decl) instead?
> 
> I thought that select_what_to_stream is exactly what you've suggested.
> Could you please clarify this?  You propose to replace "omp declare target"
> attribure with some cgraph node flag like need_offload?  But we'll need
> need_lto_streaming anyway, since for LTO it should be 1 for all nodes, but for
> offloading it should be equal to need_offload.

We have to set the global have_offload flag in few places in omp-low.c and in FE
(c/c-decl.c:c_decl_attributes, fortran/trans-common.c:build_common_decl,
fortran/trans-decl.c:add_attributes_to_decl).
This way looks for me a bit more complicated than the current approach.

Actually, we could follow Jakub's suggestion of caching the attribute in a bit
field, and set the global have_offload flag on the run without any changes in
FE.  However, I don't know a suitable place for it.  If you agree with the
approach, could you please specify the place?

Thanks,
  -- Ilya

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-10-24 14:16             ` Ilya Verbin
@ 2014-10-24 14:29               ` Jakub Jelinek
  2014-10-28 19:32                 ` Ilya Verbin
  0 siblings, 1 reply; 62+ messages in thread
From: Jakub Jelinek @ 2014-10-24 14:29 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Richard Biener, Thomas Schwinge, Jan Hubicka, gcc-patches,
	Kirill Yukhin, Ilya Tocar, Andrey Turetskiy, Bernd Schmidt

On Fri, Oct 24, 2014 at 06:16:01PM +0400, Ilya Verbin wrote:
> We have to set the global have_offload flag in few places in omp-low.c and in FE
> (c/c-decl.c:c_decl_attributes, fortran/trans-common.c:build_common_decl,
> fortran/trans-decl.c:add_attributes_to_decl).
> This way looks for me a bit more complicated than the current approach.
> 
> Actually, we could follow Jakub's suggestion of caching the attribute in a bit
> field, and set the global have_offload flag on the run without any changes in
> FE.  However, I don't know a suitable place for it.  If you agree with the
> approach, could you please specify the place?

Can't you do that when creating the cgraph or varpool nodes?
I'd expect the attribute to be already present on the decls at those spots.

	Jakub

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-10-24 14:29               ` Jakub Jelinek
@ 2014-10-28 19:32                 ` Ilya Verbin
  2014-11-03  9:24                   ` Jakub Jelinek
  0 siblings, 1 reply; 62+ messages in thread
From: Ilya Verbin @ 2014-10-28 19:32 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener
  Cc: Thomas Schwinge, Jan Hubicka, gcc-patches, Kirill Yukhin,
	Ilya Tocar, Andrey Turetskiy, Bernd Schmidt

On 24 Oct 16:20, Jakub Jelinek wrote:
> Can't you do that when creating the cgraph or varpool nodes?
> I'd expect the attribute to be already present on the decls at those spots.

I cached "omp declare target" attribute in a symtab node.  Is this patch better?
OpenMP tests passed, make check in progress.

Thanks,
  -- Ilya


gcc/
	* cgraph.c: Include context.h.
	(cgraph_node::create): Set node->offloadable and g->have_offload if
	decl have "omp declare target" attribute.
	* cgraph.h (symtab_node): Add need_lto_streaming and offloadable flags.
	* cgraphunit.c: Include lto-section-names.h.
	(ipa_passes): Call ipa_write_summaries if there is something to write to
	OFFLOAD_SECTION_NAME_PREFIX sections.
	(symbol_table::compile): Set flag_generate_lto if there is something to
	offload.
	Replace flag_lto with flag_generate_lto before lto_streamer_hooks_init.
	* context.c (gcc::context::context): Initialize have_offload with false.
	* context.h (class context): Add have_offload flag.
	* ipa-inline-analysis.c (inline_generate_summary): Do not exit under
	flag_generate_lto.
	(inline_free_summary): Always remove hooks.
	* lto-cgraph.c (referenced_from_other_partition_p): Ignore references
	from non-offloadable nodes while streaming a node into offload section.
	(reachable_from_other_partition_p): Likewise.
	(select_what_to_stream): New function.
	(compute_ltrans_boundary): Do not call
	lto_set_symtab_encoder_in_partition if the node should not be streamed.
	* lto-section-names.h (OFFLOAD_SECTION_NAME_PREFIX): Define.
	(section_name_prefix): Declare.
	* lto-streamer.c (section_name_prefix): New variable.
	(lto_get_section_name): Use section_name_prefix instead of
	LTO_SECTION_NAME_PREFIX.
	* lto-streamer.h (select_what_to_stream): Declare.
	* omp-low.c: Include context.h.
	(is_targetreg_ctx): New function.
	(create_omp_child_function, check_omp_nesting_restrictions): Use it.
	Replace usage of "omp declare target" attribute with a cgraph_node flag
	offloadable.
	(expand_omp_target): Set mark_force_output for the offloaded functions.
	(lower_omp_critical): Set offloadable flag for omp critical symbol.
	* passes.c (ipa_write_summaries): New argument offload_lto_mode.  Call
	select_what_to_stream.  Do not call lto_set_symtab_encoder_in_partition
	if the node should not be streamed out.
	* tree-pass.h (ipa_write_summaries): New bool argument.
	* varpool.c: Include context.h.
	(varpool_node::get_create): Set node->offloadable and g->have_offload if
	decl have "omp declare target" attribute.
gcc/lto/
	* lto-object.c (lto_obj_add_section): Use section_name_prefix instead of
	LTO_SECTION_NAME_PREFIX.
	* lto-partition.c (lto_promote_cross_file_statics): Call
	select_what_to_stream.
	* lto.c (lto_section_with_id): Use section_name_prefix instead of
	LTO_SECTION_NAME_PREFIX.
	(read_cgraph_and_symbols): Read OFFLOAD_SECTION_NAME_PREFIX sections, if
	being built as an offload compiler.

---

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 9a47ba2..27aad73 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -70,6 +70,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-dfa.h"
 #include "profile.h"
 #include "params.h"
+#include "context.h"
 
 /* FIXME: Only for PROP_loops, but cgraph shouldn't have to know about this.  */
 #include "tree-pass.h"
@@ -474,6 +475,13 @@ cgraph_node::create (tree decl)
   gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
 
   node->decl = decl;
+
+  if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
+    {
+      node->offloadable = 1;
+      g->have_offload = true;
+    }
+
   node->register_symbol ();
 
   if (DECL_CONTEXT (decl) && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL)
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 377adce..4988f2d 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -463,6 +463,13 @@ public:
   /* Set when init priority is set.  */
   unsigned in_init_priority_hash : 1;
 
+  /* Set when symbol needs to be streamed into LTO bytecode for LTO, or in case
+     of offloading, for separate compilation for a different target.  */
+  unsigned need_lto_streaming : 1;
+
+  /* Set when symbol can be streamed into bytecode for offloading.  */
+  unsigned offloadable : 1;
+
 
   /* Ordering of all symtab entries.  */
   int order;
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 3e76bf0..50ab2bc 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -218,6 +218,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-nested.h"
 #include "gimplify.h"
 #include "dbgcnt.h"
+#include "lto-section-names.h"
 
 /* Queue of cgraph nodes scheduled to be added into cgraph.  This is a
    secondary queue used during optimization to accommodate passes that
@@ -2049,7 +2050,18 @@ ipa_passes (void)
     targetm.asm_out.lto_start ();
 
   if (!in_lto_p)
-    ipa_write_summaries ();
+    {
+      if (g->have_offload)
+	{
+	  section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
+	  ipa_write_summaries (true);
+	}
+      if (flag_lto)
+	{
+	  section_name_prefix = LTO_SECTION_NAME_PREFIX;
+	  ipa_write_summaries (false);
+	}
+    }
 
   if (flag_generate_lto)
     targetm.asm_out.lto_end ();
@@ -2129,8 +2141,12 @@ symbol_table::compile (void)
     fprintf (stderr, "Performing interprocedural optimizations\n");
   state = IPA;
 
+  /* OpenMP offloading requires LTO infrastructure.  */
+  if (!in_lto_p && flag_openmp && g->have_offload)
+    flag_generate_lto = 1;
+
   /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
-  if (flag_lto)
+  if (flag_generate_lto)
     lto_streamer_hooks_init ();
 
   /* Don't run the IPA passes if there was any error or sorry messages.  */
diff --git a/gcc/context.c b/gcc/context.c
index 5339e28..9279be4 100644
--- a/gcc/context.c
+++ b/gcc/context.c
@@ -30,6 +30,8 @@ gcc::context *g;
 
 gcc::context::context ()
 {
+  have_offload = false;
+
   /* The pass manager's constructor uses the dump manager (to set up
      dumps for the various passes), so the dump manager must be set up
      before the pass manager.  */
diff --git a/gcc/context.h b/gcc/context.h
index b8fb439..689ae5a 100644
--- a/gcc/context.h
+++ b/gcc/context.h
@@ -33,6 +33,9 @@ class context
 public:
   context ();
 
+  /* The flag shows if there are symbols to be streamed for offloading.  */
+  bool have_offload;
+
   /* Pass-management.  */
 
   pass_manager *get_passes () { gcc_assert (m_passes); return m_passes; }
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 7da02cd..6cb2057 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -4021,7 +4021,7 @@ inline_generate_summary (void)
 
   /* When not optimizing, do not bother to analyze.  Inlining is still done
      because edge redirection needs to happen there.  */
-  if (!optimize && !flag_lto && !flag_wpa)
+  if (!optimize && !flag_generate_lto && !flag_wpa)
     return;
 
   function_insertion_hook_holder =
@@ -4336,11 +4336,6 @@ void
 inline_free_summary (void)
 {
   struct cgraph_node *node;
-  if (!inline_edge_summary_vec.exists ())
-    return;
-  FOR_EACH_DEFINED_FUNCTION (node)
-    if (!node->alias)
-      reset_inline_summary (node);
   if (function_insertion_hook_holder)
     symtab->remove_cgraph_insertion_hook (function_insertion_hook_holder);
   function_insertion_hook_holder = NULL;
@@ -4356,6 +4351,11 @@ inline_free_summary (void)
   if (edge_duplication_hook_holder)
     symtab->remove_edge_duplication_hook (edge_duplication_hook_holder);
   edge_duplication_hook_holder = NULL;
+  if (!inline_edge_summary_vec.exists ())
+    return;
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (!node->alias)
+      reset_inline_summary (node);
   vec_free (inline_summary_vec);
   inline_edge_summary_vec.release ();
   if (edge_predicate_pool)
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 3071f0c..45655ba 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -326,6 +326,11 @@ referenced_from_other_partition_p (symtab_node *node, lto_symtab_encoder_t encod
 
   for (i = 0; node->iterate_referring (i, ref); i++)
     {
+      /* Ignore references from non-offloadable nodes while streaming NODE into
+	 offload LTO section.  */
+      if (!ref->referring->need_lto_streaming)
+	continue;
+
       if (ref->referring->in_other_partition
           || !lto_symtab_encoder_in_partition_p (encoder, ref->referring))
 	return true;
@@ -344,9 +349,16 @@ reachable_from_other_partition_p (struct cgraph_node *node, lto_symtab_encoder_t
   if (node->global.inlined_to)
     return false;
   for (e = node->callers; e; e = e->next_caller)
-    if (e->caller->in_other_partition
-	|| !lto_symtab_encoder_in_partition_p (encoder, e->caller))
-      return true;
+    {
+      /* Ignore references from non-offloadable nodes while streaming NODE into
+	 offload LTO section.  */
+      if (!e->caller->need_lto_streaming)
+	continue;
+
+      if (e->caller->in_other_partition
+	  || !lto_symtab_encoder_in_partition_p (encoder, e->caller))
+	return true;
+    }
   return false;
 }
 
@@ -808,6 +820,16 @@ create_references (lto_symtab_encoder_t encoder, symtab_node *node)
       lto_symtab_encoder_encode (encoder, ref->referred);
 }
 
+/* Select what needs to be streamed out.  In regular lto mode stream everything.
+   In offload lto mode stream only nodes marked as offloadable.  */
+void
+select_what_to_stream (bool offload_lto_mode)
+{
+  struct symtab_node *snode;
+  FOR_EACH_SYMBOL (snode)
+    snode->need_lto_streaming = !offload_lto_mode || snode->offloadable;
+}
+
 /* Find all symbols we want to stream into given partition and insert them
    to encoders.
 
@@ -834,6 +856,8 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
        !lsei_end_p (lsei); lsei_next_function_in_partition (&lsei))
     {
       struct cgraph_node *node = lsei_cgraph_node (lsei);
+      if (!node->need_lto_streaming)
+	continue;
       add_node_to (encoder, node, true);
       lto_set_symtab_encoder_in_partition (encoder, node);
       create_references (encoder, node);
@@ -850,6 +874,8 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
     {
       varpool_node *vnode = lsei_varpool_node (lsei);
 
+      if (!vnode->need_lto_streaming)
+	continue;
       lto_set_symtab_encoder_in_partition (encoder, vnode);
       lto_set_symtab_encoder_encode_initializer (encoder, vnode);
       create_references (encoder, vnode);
diff --git a/gcc/lto-section-names.h b/gcc/lto-section-names.h
index cb75230..f5dbed2 100644
--- a/gcc/lto-section-names.h
+++ b/gcc/lto-section-names.h
@@ -25,6 +25,11 @@ along with GCC; see the file COPYING3.  If not see
    name for the functions and static_initializers.  For other types of
    sections a '.' and the section type are appended.  */
 #define LTO_SECTION_NAME_PREFIX ".gnu.lto_"
+#define OFFLOAD_SECTION_NAME_PREFIX ".gnu.offload_lto_"
+
+/* Can be either OFFLOAD_SECTION_NAME_PREFIX when we stream IR for offload
+   compiler, or LTO_SECTION_NAME_PREFIX for LTO case.  */
+extern const char *section_name_prefix;
 
 /* Segment name for LTO sections.  This is only used for Mach-O.  */
 
diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
index cb647bd..79c137d 100644
--- a/gcc/lto-streamer.c
+++ b/gcc/lto-streamer.c
@@ -56,6 +56,7 @@ struct lto_stats_d lto_stats;
 static bitmap_obstack lto_obstack;
 static bool lto_obstack_initialized;
 
+const char *section_name_prefix = LTO_SECTION_NAME_PREFIX;
 
 /* Return a string representing LTO tag TAG.  */
 
@@ -185,7 +186,7 @@ lto_get_section_name (int section_type, const char *name, struct lto_file_decl_d
     sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
   else
     sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
-  return concat (LTO_SECTION_NAME_PREFIX, sep, add, post, NULL);
+  return concat (section_name_prefix, sep, add, post, NULL);
 }
 
 
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 63e4b32..0b3fb6a 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -832,6 +832,7 @@ bool referenced_from_this_partition_p (symtab_node *,
 bool reachable_from_this_partition_p (struct cgraph_node *,
 				      lto_symtab_encoder_t);
 lto_symtab_encoder_t compute_ltrans_boundary (lto_symtab_encoder_t encoder);
+void select_what_to_stream (bool);
 
 
 /* In lto-symtab.c.  */
diff --git a/gcc/lto/lto-object.c b/gcc/lto/lto-object.c
index 28b459c..637d1f2 100644
--- a/gcc/lto/lto-object.c
+++ b/gcc/lto/lto-object.c
@@ -238,8 +238,7 @@ lto_obj_add_section (void *data, const char *name, off_t offset,
   void **slot;
   struct lto_section_list *list = loasd->list;
 
-  if (strncmp (name, LTO_SECTION_NAME_PREFIX,
-	       strlen (LTO_SECTION_NAME_PREFIX)) != 0)
+  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
     return 1;
 
   new_name = xstrdup (name);
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index b647275..6290b23 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -928,6 +928,8 @@ lto_promote_cross_file_statics (void)
 
   gcc_assert (flag_wpa);
 
+  select_what_to_stream (false);
+
   /* First compute boundaries.  */
   n_sets = ltrans_partitions.length ();
   for (i = 0; i < n_sets; i++)
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 1234cee..0451e71 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -2127,7 +2127,7 @@ lto_section_with_id (const char *name, unsigned HOST_WIDE_INT *id)
 {
   const char *s;
 
-  if (strncmp (name, LTO_SECTION_NAME_PREFIX, strlen (LTO_SECTION_NAME_PREFIX)))
+  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
     return 0;
   s = strrchr (name, '.');
   return s && sscanf (s, "." HOST_WIDE_INT_PRINT_HEX_PURE, id) == 1;
@@ -2902,6 +2902,10 @@ read_cgraph_and_symbols (unsigned nfiles, const char **fnames)
 
   timevar_push (TV_IPA_LTO_DECL_IN);
 
+#ifdef ACCEL_COMPILER
+    section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
+#endif
+
   real_file_decl_data
     = decl_data = ggc_cleared_vec_alloc<lto_file_decl_data_ptr> (nfiles + 1);
   real_file_count = nfiles;
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index fe9bf80..1404b5e 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -81,6 +81,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-nested.h"
 #include "tree-eh.h"
 #include "cilk.h"
+#include "context.h"
 
 
 /* Lowering of OpenMP parallel and workshare constructs proceeds in two
@@ -268,6 +269,16 @@ is_parallel_ctx (omp_context *ctx)
 }
 
 
+/* Return true if CTX is for an omp target region.  */
+
+static inline bool
+is_targetreg_ctx (omp_context *ctx)
+{
+  return gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
+	 && gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_REGION;
+}
+
+
 /* Return true if CTX is for an omp task.  */
 
 static inline bool
@@ -1933,26 +1944,19 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
   DECL_EXTERNAL (decl) = 0;
   DECL_CONTEXT (decl) = NULL_TREE;
   DECL_INITIAL (decl) = make_node (BLOCK);
-  bool target_p = false;
-  if (lookup_attribute ("omp declare target",
-			DECL_ATTRIBUTES (current_function_decl)))
-    target_p = true;
+  if (cgraph_node::get (current_function_decl)->offloadable)
+    cgraph_node::get_create (decl)->offloadable = 1;
   else
     {
       omp_context *octx;
       for (octx = ctx; octx; octx = octx->outer)
-	if (gimple_code (octx->stmt) == GIMPLE_OMP_TARGET
-	    && gimple_omp_target_kind (octx->stmt)
-	       == GF_OMP_TARGET_KIND_REGION)
+	if (is_targetreg_ctx (octx))
 	  {
-	    target_p = true;
+	    cgraph_node::get_create (decl)->offloadable = 1;
+	    g->have_offload = true;
 	    break;
 	  }
     }
-  if (target_p)
-    DECL_ATTRIBUTES (decl)
-      = tree_cons (get_identifier ("omp declare target"),
-		   NULL_TREE, DECL_ATTRIBUTES (decl));
 
   t = build_decl (DECL_SOURCE_LOCATION (decl),
 		  RESULT_DECL, NULL_TREE, void_type_node);
@@ -2658,8 +2662,7 @@ check_omp_nesting_restrictions (gimple stmt, omp_context *ctx)
       break;
     case GIMPLE_OMP_TARGET:
       for (; ctx != NULL; ctx = ctx->outer)
-	if (gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
-	    && gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_REGION)
+	if (is_targetreg_ctx (ctx))
 	  {
 	    const char *name;
 	    switch (gimple_omp_target_kind (stmt))
@@ -8276,6 +8279,7 @@ expand_omp_target (struct omp_region *region)
   if (kind == GF_OMP_TARGET_KIND_REGION)
     {
       unsigned srcidx, dstidx, num;
+      struct cgraph_node *node;
 
       /* If the target region needs data sent from the parent
 	 function, then the very first statement (except possible
@@ -8407,6 +8411,11 @@ expand_omp_target (struct omp_region *region)
       push_cfun (child_cfun);
       cgraph_edge::rebuild_edges ();
 
+      /* Prevent IPA from removing child_fn as unreachable, since there are no
+	 refs from the parent function to child_fn in offload LTO mode.  */
+      node = cgraph_node::get (child_fn);
+      node->mark_force_output ();
+
       /* Some EH regions might become dead, see PR34608.  If
 	 pass_cleanup_cfg isn't the first pass to happen with the
 	 new child, these dead EH edges might cause problems.
@@ -9277,6 +9286,17 @@ lower_omp_critical (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	  DECL_COMMON (decl) = 1;
 	  DECL_ARTIFICIAL (decl) = 1;
 	  DECL_IGNORED_P (decl) = 1;
+
+	  /* If '#pragma omp critical' is inside target region, the symbol must
+	     be marked for offloading.  */
+	  omp_context *octx;
+	  for (octx = ctx->outer; octx; octx = octx->outer)
+	    if (is_targetreg_ctx (octx))
+	      {
+		varpool_node::get_create (decl)->offloadable = 1;
+		break;
+	      }
+
 	  varpool_node::finalize_decl (decl);
 
 	  splay_tree_insert (critical_name_mutexes, (splay_tree_key) name,
diff --git a/gcc/passes.c b/gcc/passes.c
index 8432de8..bd4031b 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -2303,7 +2303,7 @@ ipa_write_summaries_1 (lto_symtab_encoder_t encoder)
 /* Write out summaries for all the nodes in the callgraph.  */
 
 void
-ipa_write_summaries (void)
+ipa_write_summaries (bool offload_lto_mode)
 {
   lto_symtab_encoder_t encoder;
   int i, order_pos;
@@ -2314,6 +2314,8 @@ ipa_write_summaries (void)
   if (!flag_generate_lto || seen_error ())
     return;
 
+  select_what_to_stream (offload_lto_mode);
+
   encoder = lto_symtab_encoder_new (false);
 
   /* Create the callgraph set in the same order used in
@@ -2340,15 +2342,16 @@ ipa_write_summaries (void)
 	  renumber_gimple_stmt_uids ();
 	  pop_cfun ();
 	}
-      if (node->definition)
+      if (node->definition && node->need_lto_streaming)
         lto_set_symtab_encoder_in_partition (encoder, node);
     }
 
   FOR_EACH_DEFINED_FUNCTION (node)
-    if (node->alias)
+    if (node->alias && node->need_lto_streaming)
       lto_set_symtab_encoder_in_partition (encoder, node);
   FOR_EACH_DEFINED_VARIABLE (vnode)
-    lto_set_symtab_encoder_in_partition (encoder, vnode);
+    if (vnode->need_lto_streaming)
+      lto_set_symtab_encoder_in_partition (encoder, vnode);
 
   ipa_write_summaries_1 (compute_ltrans_boundary (encoder));
 
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 3db1a08..cbed6e7 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -594,7 +594,7 @@ extern void pass_fini_dump_file (opt_pass *);
 extern const char *get_current_pass_name (void);
 extern void print_current_pass (FILE *);
 extern void debug_pass (void);
-extern void ipa_write_summaries (void);
+extern void ipa_write_summaries (bool);
 extern void ipa_write_optimization_summaries (struct lto_symtab_encoder_d *);
 extern void ipa_read_summaries (void);
 extern void ipa_read_optimization_summaries (void);
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 50b5665..483566d 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple.h"
 #include "lto-streamer.h"
 #include "hash-set.h"
+#include "context.h"
 
 const char * const tls_model_names[]={"none", "tls-emulated", "tls-real",
 				      "tls-global-dynamic", "tls-local-dynamic",
@@ -155,6 +156,13 @@ varpool_node::get_create (tree decl)
 
   node = varpool_node::create_empty ();
   node->decl = decl;
+
+  if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
+    {
+      node->offloadable = 1;
+      g->have_offload = true;
+    }
+
   node->register_symbol ();
   return node;
 }
-- 
1.7.1

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-10-28 19:32                 ` Ilya Verbin
@ 2014-11-03  9:24                   ` Jakub Jelinek
  2014-11-05 12:47                     ` Ilya Verbin
  0 siblings, 1 reply; 62+ messages in thread
From: Jakub Jelinek @ 2014-11-03  9:24 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Richard Biener, Thomas Schwinge, Jan Hubicka, gcc-patches,
	Kirill Yukhin, Ilya Tocar, Andrey Turetskiy, Bernd Schmidt

On Tue, Oct 28, 2014 at 10:30:47PM +0300, Ilya Verbin wrote:
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -70,6 +70,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-dfa.h"
>  #include "profile.h"
>  #include "params.h"
> +#include "context.h"
>  
>  /* FIXME: Only for PROP_loops, but cgraph shouldn't have to know about this.  */
>  #include "tree-pass.h"
> @@ -474,6 +475,13 @@ cgraph_node::create (tree decl)
>    gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
>  
>    node->decl = decl;
> +
> +  if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
> +    {
> +      node->offloadable = 1;
> +      g->have_offload = true;
> +    }

I wonder if we shouldn't optimize here and call lookup_attribute only
if there is a chance that the attribute might be present, so guard with
flag_openmp (and flag_openacc later on?).  During LTO the cgraph nodes
are streamed in and supposedly the flag offloadable too.

> @@ -2129,8 +2141,12 @@ symbol_table::compile (void)
>      fprintf (stderr, "Performing interprocedural optimizations\n");
>    state = IPA;
>  
> +  /* OpenMP offloading requires LTO infrastructure.  */
> +  if (!in_lto_p && flag_openmp && g->have_offload)
> +    flag_generate_lto = 1;

On the other side, do you need flag_openmp here?  Supposedly g->have_offload
would already been set if needed.

	Jakub

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-11-03  9:24                   ` Jakub Jelinek
@ 2014-11-05 12:47                     ` Ilya Verbin
  2014-11-05 12:50                       ` Jakub Jelinek
                                         ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Ilya Verbin @ 2014-11-05 12:47 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener
  Cc: Thomas Schwinge, Jan Hubicka, gcc-patches, Kirill Yukhin,
	Ilya Tocar, Andrey Turetskiy, Bernd Schmidt

On 03 Nov 10:24, Jakub Jelinek wrote:
> On Tue, Oct 28, 2014 at 10:30:47PM +0300, Ilya Verbin wrote:
> > @@ -474,6 +475,13 @@ cgraph_node::create (tree decl)
> >    gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
> >  
> >    node->decl = decl;
> > +
> > +  if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
> > +    {
> > +      node->offloadable = 1;
> > +      g->have_offload = true;
> > +    }
> 
> I wonder if we shouldn't optimize here and call lookup_attribute only
> if there is a chance that the attribute might be present, so guard with
> flag_openmp (and flag_openacc later on?).  During LTO the cgraph nodes
> are streamed in and supposedly the flag offloadable too.
> 
> > @@ -2129,8 +2141,12 @@ symbol_table::compile (void)
> >      fprintf (stderr, "Performing interprocedural optimizations\n");
> >    state = IPA;
> >  
> > +  /* OpenMP offloading requires LTO infrastructure.  */
> > +  if (!in_lto_p && flag_openmp && g->have_offload)
> > +    flag_generate_lto = 1;
> 
> On the other side, do you need flag_openmp here?  Supposedly g->have_offload
> would already been set if needed.

Done, flag_openmp moved from symbol_table::compile to cgraph_node::create and
varpool_node::get_create.  OK for trunk?

Maybe also with this change?

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 4e9ed25..beae5b5 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -1653,8 +1653,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 	      && DECL_P (decl)
 	      && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
-	      && lookup_attribute ("omp declare target",
-				   DECL_ATTRIBUTES (decl)))
+	      && varpool_node::get_create (decl)->offloadable)
 	    break;
 	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 	      && OMP_CLAUSE_MAP_KIND (c) == OMP_CLAUSE_MAP_POINTER)
@@ -1794,8 +1793,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	  decl = OMP_CLAUSE_DECL (c);
 	  if (DECL_P (decl)
 	      && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
-	      && lookup_attribute ("omp declare target",
-				   DECL_ATTRIBUTES (decl)))
+	      && varpool_node::get_create (decl)->offloadable)
 	    break;
 	  if (DECL_P (decl))
 	    {

Thanks,
  -- Ilya


---

diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 9a47ba2..a491886 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -70,6 +70,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-dfa.h"
 #include "profile.h"
 #include "params.h"
+#include "context.h"
 
 /* FIXME: Only for PROP_loops, but cgraph shouldn't have to know about this.  */
 #include "tree-pass.h"
@@ -474,6 +475,14 @@ cgraph_node::create (tree decl)
   gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
 
   node->decl = decl;
+
+  if (flag_openmp
+      && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
+    {
+      node->offloadable = 1;
+      g->have_offload = true;
+    }
+
   node->register_symbol ();
 
   if (DECL_CONTEXT (decl) && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL)
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 377adce..4988f2d 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -463,6 +463,13 @@ public:
   /* Set when init priority is set.  */
   unsigned in_init_priority_hash : 1;
 
+  /* Set when symbol needs to be streamed into LTO bytecode for LTO, or in case
+     of offloading, for separate compilation for a different target.  */
+  unsigned need_lto_streaming : 1;
+
+  /* Set when symbol can be streamed into bytecode for offloading.  */
+  unsigned offloadable : 1;
+
 
   /* Ordering of all symtab entries.  */
   int order;
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 3e76bf0..83ab419 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -218,6 +218,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-nested.h"
 #include "gimplify.h"
 #include "dbgcnt.h"
+#include "lto-section-names.h"
 
 /* Queue of cgraph nodes scheduled to be added into cgraph.  This is a
    secondary queue used during optimization to accommodate passes that
@@ -2049,7 +2050,18 @@ ipa_passes (void)
     targetm.asm_out.lto_start ();
 
   if (!in_lto_p)
-    ipa_write_summaries ();
+    {
+      if (g->have_offload)
+	{
+	  section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
+	  ipa_write_summaries (true);
+	}
+      if (flag_lto)
+	{
+	  section_name_prefix = LTO_SECTION_NAME_PREFIX;
+	  ipa_write_summaries (false);
+	}
+    }
 
   if (flag_generate_lto)
     targetm.asm_out.lto_end ();
@@ -2129,8 +2141,12 @@ symbol_table::compile (void)
     fprintf (stderr, "Performing interprocedural optimizations\n");
   state = IPA;
 
+  /* Offloading requires LTO infrastructure.  */
+  if (!in_lto_p && g->have_offload)
+    flag_generate_lto = 1;
+
   /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
-  if (flag_lto)
+  if (flag_generate_lto)
     lto_streamer_hooks_init ();
 
   /* Don't run the IPA passes if there was any error or sorry messages.  */
diff --git a/gcc/context.c b/gcc/context.c
index 5339e28..9279be4 100644
--- a/gcc/context.c
+++ b/gcc/context.c
@@ -30,6 +30,8 @@ gcc::context *g;
 
 gcc::context::context ()
 {
+  have_offload = false;
+
   /* The pass manager's constructor uses the dump manager (to set up
      dumps for the various passes), so the dump manager must be set up
      before the pass manager.  */
diff --git a/gcc/context.h b/gcc/context.h
index b8fb439..689ae5a 100644
--- a/gcc/context.h
+++ b/gcc/context.h
@@ -33,6 +33,9 @@ class context
 public:
   context ();
 
+  /* The flag shows if there are symbols to be streamed for offloading.  */
+  bool have_offload;
+
   /* Pass-management.  */
 
   pass_manager *get_passes () { gcc_assert (m_passes); return m_passes; }
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 7da02cd..6cb2057 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -4021,7 +4021,7 @@ inline_generate_summary (void)
 
   /* When not optimizing, do not bother to analyze.  Inlining is still done
      because edge redirection needs to happen there.  */
-  if (!optimize && !flag_lto && !flag_wpa)
+  if (!optimize && !flag_generate_lto && !flag_wpa)
     return;
 
   function_insertion_hook_holder =
@@ -4336,11 +4336,6 @@ void
 inline_free_summary (void)
 {
   struct cgraph_node *node;
-  if (!inline_edge_summary_vec.exists ())
-    return;
-  FOR_EACH_DEFINED_FUNCTION (node)
-    if (!node->alias)
-      reset_inline_summary (node);
   if (function_insertion_hook_holder)
     symtab->remove_cgraph_insertion_hook (function_insertion_hook_holder);
   function_insertion_hook_holder = NULL;
@@ -4356,6 +4351,11 @@ inline_free_summary (void)
   if (edge_duplication_hook_holder)
     symtab->remove_edge_duplication_hook (edge_duplication_hook_holder);
   edge_duplication_hook_holder = NULL;
+  if (!inline_edge_summary_vec.exists ())
+    return;
+  FOR_EACH_DEFINED_FUNCTION (node)
+    if (!node->alias)
+      reset_inline_summary (node);
   vec_free (inline_summary_vec);
   inline_edge_summary_vec.release ();
   if (edge_predicate_pool)
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 3071f0c..45655ba 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -326,6 +326,11 @@ referenced_from_other_partition_p (symtab_node *node, lto_symtab_encoder_t encod
 
   for (i = 0; node->iterate_referring (i, ref); i++)
     {
+      /* Ignore references from non-offloadable nodes while streaming NODE into
+	 offload LTO section.  */
+      if (!ref->referring->need_lto_streaming)
+	continue;
+
       if (ref->referring->in_other_partition
           || !lto_symtab_encoder_in_partition_p (encoder, ref->referring))
 	return true;
@@ -344,9 +349,16 @@ reachable_from_other_partition_p (struct cgraph_node *node, lto_symtab_encoder_t
   if (node->global.inlined_to)
     return false;
   for (e = node->callers; e; e = e->next_caller)
-    if (e->caller->in_other_partition
-	|| !lto_symtab_encoder_in_partition_p (encoder, e->caller))
-      return true;
+    {
+      /* Ignore references from non-offloadable nodes while streaming NODE into
+	 offload LTO section.  */
+      if (!e->caller->need_lto_streaming)
+	continue;
+
+      if (e->caller->in_other_partition
+	  || !lto_symtab_encoder_in_partition_p (encoder, e->caller))
+	return true;
+    }
   return false;
 }
 
@@ -808,6 +820,16 @@ create_references (lto_symtab_encoder_t encoder, symtab_node *node)
       lto_symtab_encoder_encode (encoder, ref->referred);
 }
 
+/* Select what needs to be streamed out.  In regular lto mode stream everything.
+   In offload lto mode stream only nodes marked as offloadable.  */
+void
+select_what_to_stream (bool offload_lto_mode)
+{
+  struct symtab_node *snode;
+  FOR_EACH_SYMBOL (snode)
+    snode->need_lto_streaming = !offload_lto_mode || snode->offloadable;
+}
+
 /* Find all symbols we want to stream into given partition and insert them
    to encoders.
 
@@ -834,6 +856,8 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
        !lsei_end_p (lsei); lsei_next_function_in_partition (&lsei))
     {
       struct cgraph_node *node = lsei_cgraph_node (lsei);
+      if (!node->need_lto_streaming)
+	continue;
       add_node_to (encoder, node, true);
       lto_set_symtab_encoder_in_partition (encoder, node);
       create_references (encoder, node);
@@ -850,6 +874,8 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
     {
       varpool_node *vnode = lsei_varpool_node (lsei);
 
+      if (!vnode->need_lto_streaming)
+	continue;
       lto_set_symtab_encoder_in_partition (encoder, vnode);
       lto_set_symtab_encoder_encode_initializer (encoder, vnode);
       create_references (encoder, vnode);
diff --git a/gcc/lto-section-names.h b/gcc/lto-section-names.h
index cb75230..f5dbed2 100644
--- a/gcc/lto-section-names.h
+++ b/gcc/lto-section-names.h
@@ -25,6 +25,11 @@ along with GCC; see the file COPYING3.  If not see
    name for the functions and static_initializers.  For other types of
    sections a '.' and the section type are appended.  */
 #define LTO_SECTION_NAME_PREFIX ".gnu.lto_"
+#define OFFLOAD_SECTION_NAME_PREFIX ".gnu.offload_lto_"
+
+/* Can be either OFFLOAD_SECTION_NAME_PREFIX when we stream IR for offload
+   compiler, or LTO_SECTION_NAME_PREFIX for LTO case.  */
+extern const char *section_name_prefix;
 
 /* Segment name for LTO sections.  This is only used for Mach-O.  */
 
diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
index cb647bd..79c137d 100644
--- a/gcc/lto-streamer.c
+++ b/gcc/lto-streamer.c
@@ -56,6 +56,7 @@ struct lto_stats_d lto_stats;
 static bitmap_obstack lto_obstack;
 static bool lto_obstack_initialized;
 
+const char *section_name_prefix = LTO_SECTION_NAME_PREFIX;
 
 /* Return a string representing LTO tag TAG.  */
 
@@ -185,7 +186,7 @@ lto_get_section_name (int section_type, const char *name, struct lto_file_decl_d
     sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
   else
     sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
-  return concat (LTO_SECTION_NAME_PREFIX, sep, add, post, NULL);
+  return concat (section_name_prefix, sep, add, post, NULL);
 }
 
 
diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
index 63e4b32..0b3fb6a 100644
--- a/gcc/lto-streamer.h
+++ b/gcc/lto-streamer.h
@@ -832,6 +832,7 @@ bool referenced_from_this_partition_p (symtab_node *,
 bool reachable_from_this_partition_p (struct cgraph_node *,
 				      lto_symtab_encoder_t);
 lto_symtab_encoder_t compute_ltrans_boundary (lto_symtab_encoder_t encoder);
+void select_what_to_stream (bool);
 
 
 /* In lto-symtab.c.  */
diff --git a/gcc/lto/lto-object.c b/gcc/lto/lto-object.c
index 28b459c..637d1f2 100644
--- a/gcc/lto/lto-object.c
+++ b/gcc/lto/lto-object.c
@@ -238,8 +238,7 @@ lto_obj_add_section (void *data, const char *name, off_t offset,
   void **slot;
   struct lto_section_list *list = loasd->list;
 
-  if (strncmp (name, LTO_SECTION_NAME_PREFIX,
-	       strlen (LTO_SECTION_NAME_PREFIX)) != 0)
+  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
     return 1;
 
   new_name = xstrdup (name);
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index b647275..6290b23 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -928,6 +928,8 @@ lto_promote_cross_file_statics (void)
 
   gcc_assert (flag_wpa);
 
+  select_what_to_stream (false);
+
   /* First compute boundaries.  */
   n_sets = ltrans_partitions.length ();
   for (i = 0; i < n_sets; i++)
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 1234cee..0451e71 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -2127,7 +2127,7 @@ lto_section_with_id (const char *name, unsigned HOST_WIDE_INT *id)
 {
   const char *s;
 
-  if (strncmp (name, LTO_SECTION_NAME_PREFIX, strlen (LTO_SECTION_NAME_PREFIX)))
+  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
     return 0;
   s = strrchr (name, '.');
   return s && sscanf (s, "." HOST_WIDE_INT_PRINT_HEX_PURE, id) == 1;
@@ -2902,6 +2902,10 @@ read_cgraph_and_symbols (unsigned nfiles, const char **fnames)
 
   timevar_push (TV_IPA_LTO_DECL_IN);
 
+#ifdef ACCEL_COMPILER
+    section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
+#endif
+
   real_file_decl_data
     = decl_data = ggc_cleared_vec_alloc<lto_file_decl_data_ptr> (nfiles + 1);
   real_file_count = nfiles;
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index fe9bf80..1404b5e 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -81,6 +81,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-nested.h"
 #include "tree-eh.h"
 #include "cilk.h"
+#include "context.h"
 
 
 /* Lowering of OpenMP parallel and workshare constructs proceeds in two
@@ -268,6 +269,16 @@ is_parallel_ctx (omp_context *ctx)
 }
 
 
+/* Return true if CTX is for an omp target region.  */
+
+static inline bool
+is_targetreg_ctx (omp_context *ctx)
+{
+  return gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
+	 && gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_REGION;
+}
+
+
 /* Return true if CTX is for an omp task.  */
 
 static inline bool
@@ -1933,26 +1944,19 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
   DECL_EXTERNAL (decl) = 0;
   DECL_CONTEXT (decl) = NULL_TREE;
   DECL_INITIAL (decl) = make_node (BLOCK);
-  bool target_p = false;
-  if (lookup_attribute ("omp declare target",
-			DECL_ATTRIBUTES (current_function_decl)))
-    target_p = true;
+  if (cgraph_node::get (current_function_decl)->offloadable)
+    cgraph_node::get_create (decl)->offloadable = 1;
   else
     {
       omp_context *octx;
       for (octx = ctx; octx; octx = octx->outer)
-	if (gimple_code (octx->stmt) == GIMPLE_OMP_TARGET
-	    && gimple_omp_target_kind (octx->stmt)
-	       == GF_OMP_TARGET_KIND_REGION)
+	if (is_targetreg_ctx (octx))
 	  {
-	    target_p = true;
+	    cgraph_node::get_create (decl)->offloadable = 1;
+	    g->have_offload = true;
 	    break;
 	  }
     }
-  if (target_p)
-    DECL_ATTRIBUTES (decl)
-      = tree_cons (get_identifier ("omp declare target"),
-		   NULL_TREE, DECL_ATTRIBUTES (decl));
 
   t = build_decl (DECL_SOURCE_LOCATION (decl),
 		  RESULT_DECL, NULL_TREE, void_type_node);
@@ -2658,8 +2662,7 @@ check_omp_nesting_restrictions (gimple stmt, omp_context *ctx)
       break;
     case GIMPLE_OMP_TARGET:
       for (; ctx != NULL; ctx = ctx->outer)
-	if (gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
-	    && gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_REGION)
+	if (is_targetreg_ctx (ctx))
 	  {
 	    const char *name;
 	    switch (gimple_omp_target_kind (stmt))
@@ -8276,6 +8279,7 @@ expand_omp_target (struct omp_region *region)
   if (kind == GF_OMP_TARGET_KIND_REGION)
     {
       unsigned srcidx, dstidx, num;
+      struct cgraph_node *node;
 
       /* If the target region needs data sent from the parent
 	 function, then the very first statement (except possible
@@ -8407,6 +8411,11 @@ expand_omp_target (struct omp_region *region)
       push_cfun (child_cfun);
       cgraph_edge::rebuild_edges ();
 
+      /* Prevent IPA from removing child_fn as unreachable, since there are no
+	 refs from the parent function to child_fn in offload LTO mode.  */
+      node = cgraph_node::get (child_fn);
+      node->mark_force_output ();
+
       /* Some EH regions might become dead, see PR34608.  If
 	 pass_cleanup_cfg isn't the first pass to happen with the
 	 new child, these dead EH edges might cause problems.
@@ -9277,6 +9286,17 @@ lower_omp_critical (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	  DECL_COMMON (decl) = 1;
 	  DECL_ARTIFICIAL (decl) = 1;
 	  DECL_IGNORED_P (decl) = 1;
+
+	  /* If '#pragma omp critical' is inside target region, the symbol must
+	     be marked for offloading.  */
+	  omp_context *octx;
+	  for (octx = ctx->outer; octx; octx = octx->outer)
+	    if (is_targetreg_ctx (octx))
+	      {
+		varpool_node::get_create (decl)->offloadable = 1;
+		break;
+	      }
+
 	  varpool_node::finalize_decl (decl);
 
 	  splay_tree_insert (critical_name_mutexes, (splay_tree_key) name,
diff --git a/gcc/passes.c b/gcc/passes.c
index 8432de8..bd4031b 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -2303,7 +2303,7 @@ ipa_write_summaries_1 (lto_symtab_encoder_t encoder)
 /* Write out summaries for all the nodes in the callgraph.  */
 
 void
-ipa_write_summaries (void)
+ipa_write_summaries (bool offload_lto_mode)
 {
   lto_symtab_encoder_t encoder;
   int i, order_pos;
@@ -2314,6 +2314,8 @@ ipa_write_summaries (void)
   if (!flag_generate_lto || seen_error ())
     return;
 
+  select_what_to_stream (offload_lto_mode);
+
   encoder = lto_symtab_encoder_new (false);
 
   /* Create the callgraph set in the same order used in
@@ -2340,15 +2342,16 @@ ipa_write_summaries (void)
 	  renumber_gimple_stmt_uids ();
 	  pop_cfun ();
 	}
-      if (node->definition)
+      if (node->definition && node->need_lto_streaming)
         lto_set_symtab_encoder_in_partition (encoder, node);
     }
 
   FOR_EACH_DEFINED_FUNCTION (node)
-    if (node->alias)
+    if (node->alias && node->need_lto_streaming)
       lto_set_symtab_encoder_in_partition (encoder, node);
   FOR_EACH_DEFINED_VARIABLE (vnode)
-    lto_set_symtab_encoder_in_partition (encoder, vnode);
+    if (vnode->need_lto_streaming)
+      lto_set_symtab_encoder_in_partition (encoder, vnode);
 
   ipa_write_summaries_1 (compute_ltrans_boundary (encoder));
 
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 3db1a08..cbed6e7 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -594,7 +594,7 @@ extern void pass_fini_dump_file (opt_pass *);
 extern const char *get_current_pass_name (void);
 extern void print_current_pass (FILE *);
 extern void debug_pass (void);
-extern void ipa_write_summaries (void);
+extern void ipa_write_summaries (bool);
 extern void ipa_write_optimization_summaries (struct lto_symtab_encoder_d *);
 extern void ipa_read_summaries (void);
 extern void ipa_read_optimization_summaries (void);
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 50b5665..c508bf9 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple.h"
 #include "lto-streamer.h"
 #include "hash-set.h"
+#include "context.h"
 
 const char * const tls_model_names[]={"none", "tls-emulated", "tls-real",
 				      "tls-global-dynamic", "tls-local-dynamic",
@@ -155,6 +156,14 @@ varpool_node::get_create (tree decl)
 
   node = varpool_node::create_empty ();
   node->decl = decl;
+
+  if (flag_openmp
+      && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
+    {
+      node->offloadable = 1;
+      g->have_offload = true;
+    }
+
   node->register_symbol ();
   return node;
 }
-- 
1.7.1

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-11-05 12:47                     ` Ilya Verbin
@ 2014-11-05 12:50                       ` Jakub Jelinek
  2014-11-07 14:41                         ` Kirill Yukhin
  2014-11-12  9:32                       ` Richard Biener
  2015-07-31 15:37                       ` Thomas Schwinge
  2 siblings, 1 reply; 62+ messages in thread
From: Jakub Jelinek @ 2014-11-05 12:50 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Richard Biener, Thomas Schwinge, Jan Hubicka, gcc-patches,
	Kirill Yukhin, Ilya Tocar, Andrey Turetskiy, Bernd Schmidt

On Wed, Nov 05, 2014 at 03:46:55PM +0300, Ilya Verbin wrote:
> Maybe also with this change?
> 
> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index 4e9ed25..beae5b5 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -1653,8 +1653,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
>  	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
>  	      && DECL_P (decl)
>  	      && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
> -	      && lookup_attribute ("omp declare target",
> -				   DECL_ATTRIBUTES (decl)))
> +	      && varpool_node::get_create (decl)->offloadable)
>  	    break;
>  	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
>  	      && OMP_CLAUSE_MAP_KIND (c) == OMP_CLAUSE_MAP_POINTER)
> @@ -1794,8 +1793,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
>  	  decl = OMP_CLAUSE_DECL (c);
>  	  if (DECL_P (decl)
>  	      && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
> -	      && lookup_attribute ("omp declare target",
> -				   DECL_ATTRIBUTES (decl)))
> +	      && varpool_node::get_create (decl)->offloadable)
>  	    break;
>  	  if (DECL_P (decl))
>  	    {

That looks reasonable (of course if the other patch is committed).

> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -70,6 +70,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-dfa.h"
>  #include "profile.h"
>  #include "params.h"
> +#include "context.h"
>  
>  /* FIXME: Only for PROP_loops, but cgraph shouldn't have to know about this.  */
>  #include "tree-pass.h"
> @@ -474,6 +475,14 @@ cgraph_node::create (tree decl)
>    gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
>  
>    node->decl = decl;
> +
> +  if (flag_openmp
> +      && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
> +    {
> +      node->offloadable = 1;
> +      g->have_offload = true;
> +    }
> +
>    node->register_symbol ();

LGTM.

	Jakub

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-11-05 12:50                       ` Jakub Jelinek
@ 2014-11-07 14:41                         ` Kirill Yukhin
  0 siblings, 0 replies; 62+ messages in thread
From: Kirill Yukhin @ 2014-11-07 14:41 UTC (permalink / raw)
  To: Richard Biener
  Cc: Ilya Verbin, Thomas Schwinge, Jan Hubicka, gcc-patches,
	Ilya Tocar, Andrey Turetskiy, Bernd Schmidt, Jakub Jelinek,
	Jeff Law

Hello Richard,
On 05 Nov 13:50, Jakub Jelinek wrote:
> On Wed, Nov 05, 2014 at 03:46:55PM +0300, Ilya Verbin wrote:
> > +
> >    node->register_symbol ();
> 
> LGTM.
Are you ok with the patch?
> 
> 	Jakub

--
Thanks, K

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-11-05 12:47                     ` Ilya Verbin
  2014-11-05 12:50                       ` Jakub Jelinek
@ 2014-11-12  9:32                       ` Richard Biener
  2014-11-12 14:11                         ` Kirill Yukhin
  2015-07-31 15:37                       ` Thomas Schwinge
  2 siblings, 1 reply; 62+ messages in thread
From: Richard Biener @ 2014-11-12  9:32 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Jakub Jelinek, Thomas Schwinge, Jan Hubicka, gcc-patches,
	Kirill Yukhin, Ilya Tocar, Andrey Turetskiy, Bernd Schmidt

On Wed, 5 Nov 2014, Ilya Verbin wrote:

> On 03 Nov 10:24, Jakub Jelinek wrote:
> > On Tue, Oct 28, 2014 at 10:30:47PM +0300, Ilya Verbin wrote:
> > > @@ -474,6 +475,13 @@ cgraph_node::create (tree decl)
> > >    gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
> > >  
> > >    node->decl = decl;
> > > +
> > > +  if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
> > > +    {
> > > +      node->offloadable = 1;
> > > +      g->have_offload = true;
> > > +    }
> > 
> > I wonder if we shouldn't optimize here and call lookup_attribute only
> > if there is a chance that the attribute might be present, so guard with
> > flag_openmp (and flag_openacc later on?).  During LTO the cgraph nodes
> > are streamed in and supposedly the flag offloadable too.
> > 
> > > @@ -2129,8 +2141,12 @@ symbol_table::compile (void)
> > >      fprintf (stderr, "Performing interprocedural optimizations\n");
> > >    state = IPA;
> > >  
> > > +  /* OpenMP offloading requires LTO infrastructure.  */
> > > +  if (!in_lto_p && flag_openmp && g->have_offload)
> > > +    flag_generate_lto = 1;
> > 
> > On the other side, do you need flag_openmp here?  Supposedly g->have_offload
> > would already been set if needed.
> 
> Done, flag_openmp moved from symbol_table::compile to cgraph_node::create and
> varpool_node::get_create.  OK for trunk?

Yes.

> Maybe also with this change?
> 
> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index 4e9ed25..beae5b5 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -1653,8 +1653,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
>  	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
>  	      && DECL_P (decl)
>  	      && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
> -	      && lookup_attribute ("omp declare target",
> -				   DECL_ATTRIBUTES (decl)))
> +	      && varpool_node::get_create (decl)->offloadable)
>  	    break;
>  	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
>  	      && OMP_CLAUSE_MAP_KIND (c) == OMP_CLAUSE_MAP_POINTER)
> @@ -1794,8 +1793,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
>  	  decl = OMP_CLAUSE_DECL (c);
>  	  if (DECL_P (decl)
>  	      && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
> -	      && lookup_attribute ("omp declare target",
> -				   DECL_ATTRIBUTES (decl)))
> +	      && varpool_node::get_create (decl)->offloadable)
>  	    break;
>  	  if (DECL_P (decl))
>  	    {

Yes please.

Please make sure that regular LTO bootstrap still works - LTO is
only tested lightly in the testsuite.

Thanks,
Richard.

> Thanks,
>   -- Ilya
> 
> 
> ---
> 
> diff --git a/gcc/cgraph.c b/gcc/cgraph.c
> index 9a47ba2..a491886 100644
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -70,6 +70,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-dfa.h"
>  #include "profile.h"
>  #include "params.h"
> +#include "context.h"
>  
>  /* FIXME: Only for PROP_loops, but cgraph shouldn't have to know about this.  */
>  #include "tree-pass.h"
> @@ -474,6 +475,14 @@ cgraph_node::create (tree decl)
>    gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
>  
>    node->decl = decl;
> +
> +  if (flag_openmp
> +      && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
> +    {
> +      node->offloadable = 1;
> +      g->have_offload = true;
> +    }
> +
>    node->register_symbol ();
>  
>    if (DECL_CONTEXT (decl) && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL)
> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
> index 377adce..4988f2d 100644
> --- a/gcc/cgraph.h
> +++ b/gcc/cgraph.h
> @@ -463,6 +463,13 @@ public:
>    /* Set when init priority is set.  */
>    unsigned in_init_priority_hash : 1;
>  
> +  /* Set when symbol needs to be streamed into LTO bytecode for LTO, or in case
> +     of offloading, for separate compilation for a different target.  */
> +  unsigned need_lto_streaming : 1;
> +
> +  /* Set when symbol can be streamed into bytecode for offloading.  */
> +  unsigned offloadable : 1;
> +
>  
>    /* Ordering of all symtab entries.  */
>    int order;
> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> index 3e76bf0..83ab419 100644
> --- a/gcc/cgraphunit.c
> +++ b/gcc/cgraphunit.c
> @@ -218,6 +218,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-nested.h"
>  #include "gimplify.h"
>  #include "dbgcnt.h"
> +#include "lto-section-names.h"
>  
>  /* Queue of cgraph nodes scheduled to be added into cgraph.  This is a
>     secondary queue used during optimization to accommodate passes that
> @@ -2049,7 +2050,18 @@ ipa_passes (void)
>      targetm.asm_out.lto_start ();
>  
>    if (!in_lto_p)
> -    ipa_write_summaries ();
> +    {
> +      if (g->have_offload)
> +	{
> +	  section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
> +	  ipa_write_summaries (true);
> +	}
> +      if (flag_lto)
> +	{
> +	  section_name_prefix = LTO_SECTION_NAME_PREFIX;
> +	  ipa_write_summaries (false);
> +	}
> +    }
>  
>    if (flag_generate_lto)
>      targetm.asm_out.lto_end ();
> @@ -2129,8 +2141,12 @@ symbol_table::compile (void)
>      fprintf (stderr, "Performing interprocedural optimizations\n");
>    state = IPA;
>  
> +  /* Offloading requires LTO infrastructure.  */
> +  if (!in_lto_p && g->have_offload)
> +    flag_generate_lto = 1;
> +
>    /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
> -  if (flag_lto)
> +  if (flag_generate_lto)
>      lto_streamer_hooks_init ();
>  
>    /* Don't run the IPA passes if there was any error or sorry messages.  */
> diff --git a/gcc/context.c b/gcc/context.c
> index 5339e28..9279be4 100644
> --- a/gcc/context.c
> +++ b/gcc/context.c
> @@ -30,6 +30,8 @@ gcc::context *g;
>  
>  gcc::context::context ()
>  {
> +  have_offload = false;
> +
>    /* The pass manager's constructor uses the dump manager (to set up
>       dumps for the various passes), so the dump manager must be set up
>       before the pass manager.  */
> diff --git a/gcc/context.h b/gcc/context.h
> index b8fb439..689ae5a 100644
> --- a/gcc/context.h
> +++ b/gcc/context.h
> @@ -33,6 +33,9 @@ class context
>  public:
>    context ();
>  
> +  /* The flag shows if there are symbols to be streamed for offloading.  */
> +  bool have_offload;
> +
>    /* Pass-management.  */
>  
>    pass_manager *get_passes () { gcc_assert (m_passes); return m_passes; }
> diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
> index 7da02cd..6cb2057 100644
> --- a/gcc/ipa-inline-analysis.c
> +++ b/gcc/ipa-inline-analysis.c
> @@ -4021,7 +4021,7 @@ inline_generate_summary (void)
>  
>    /* When not optimizing, do not bother to analyze.  Inlining is still done
>       because edge redirection needs to happen there.  */
> -  if (!optimize && !flag_lto && !flag_wpa)
> +  if (!optimize && !flag_generate_lto && !flag_wpa)
>      return;
>  
>    function_insertion_hook_holder =
> @@ -4336,11 +4336,6 @@ void
>  inline_free_summary (void)
>  {
>    struct cgraph_node *node;
> -  if (!inline_edge_summary_vec.exists ())
> -    return;
> -  FOR_EACH_DEFINED_FUNCTION (node)
> -    if (!node->alias)
> -      reset_inline_summary (node);
>    if (function_insertion_hook_holder)
>      symtab->remove_cgraph_insertion_hook (function_insertion_hook_holder);
>    function_insertion_hook_holder = NULL;
> @@ -4356,6 +4351,11 @@ inline_free_summary (void)
>    if (edge_duplication_hook_holder)
>      symtab->remove_edge_duplication_hook (edge_duplication_hook_holder);
>    edge_duplication_hook_holder = NULL;
> +  if (!inline_edge_summary_vec.exists ())
> +    return;
> +  FOR_EACH_DEFINED_FUNCTION (node)
> +    if (!node->alias)
> +      reset_inline_summary (node);
>    vec_free (inline_summary_vec);
>    inline_edge_summary_vec.release ();
>    if (edge_predicate_pool)
> diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
> index 3071f0c..45655ba 100644
> --- a/gcc/lto-cgraph.c
> +++ b/gcc/lto-cgraph.c
> @@ -326,6 +326,11 @@ referenced_from_other_partition_p (symtab_node *node, lto_symtab_encoder_t encod
>  
>    for (i = 0; node->iterate_referring (i, ref); i++)
>      {
> +      /* Ignore references from non-offloadable nodes while streaming NODE into
> +	 offload LTO section.  */
> +      if (!ref->referring->need_lto_streaming)
> +	continue;
> +
>        if (ref->referring->in_other_partition
>            || !lto_symtab_encoder_in_partition_p (encoder, ref->referring))
>  	return true;
> @@ -344,9 +349,16 @@ reachable_from_other_partition_p (struct cgraph_node *node, lto_symtab_encoder_t
>    if (node->global.inlined_to)
>      return false;
>    for (e = node->callers; e; e = e->next_caller)
> -    if (e->caller->in_other_partition
> -	|| !lto_symtab_encoder_in_partition_p (encoder, e->caller))
> -      return true;
> +    {
> +      /* Ignore references from non-offloadable nodes while streaming NODE into
> +	 offload LTO section.  */
> +      if (!e->caller->need_lto_streaming)
> +	continue;
> +
> +      if (e->caller->in_other_partition
> +	  || !lto_symtab_encoder_in_partition_p (encoder, e->caller))
> +	return true;
> +    }
>    return false;
>  }
>  
> @@ -808,6 +820,16 @@ create_references (lto_symtab_encoder_t encoder, symtab_node *node)
>        lto_symtab_encoder_encode (encoder, ref->referred);
>  }
>  
> +/* Select what needs to be streamed out.  In regular lto mode stream everything.
> +   In offload lto mode stream only nodes marked as offloadable.  */
> +void
> +select_what_to_stream (bool offload_lto_mode)
> +{
> +  struct symtab_node *snode;
> +  FOR_EACH_SYMBOL (snode)
> +    snode->need_lto_streaming = !offload_lto_mode || snode->offloadable;
> +}
> +
>  /* Find all symbols we want to stream into given partition and insert them
>     to encoders.
>  
> @@ -834,6 +856,8 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
>         !lsei_end_p (lsei); lsei_next_function_in_partition (&lsei))
>      {
>        struct cgraph_node *node = lsei_cgraph_node (lsei);
> +      if (!node->need_lto_streaming)
> +	continue;
>        add_node_to (encoder, node, true);
>        lto_set_symtab_encoder_in_partition (encoder, node);
>        create_references (encoder, node);
> @@ -850,6 +874,8 @@ compute_ltrans_boundary (lto_symtab_encoder_t in_encoder)
>      {
>        varpool_node *vnode = lsei_varpool_node (lsei);
>  
> +      if (!vnode->need_lto_streaming)
> +	continue;
>        lto_set_symtab_encoder_in_partition (encoder, vnode);
>        lto_set_symtab_encoder_encode_initializer (encoder, vnode);
>        create_references (encoder, vnode);
> diff --git a/gcc/lto-section-names.h b/gcc/lto-section-names.h
> index cb75230..f5dbed2 100644
> --- a/gcc/lto-section-names.h
> +++ b/gcc/lto-section-names.h
> @@ -25,6 +25,11 @@ along with GCC; see the file COPYING3.  If not see
>     name for the functions and static_initializers.  For other types of
>     sections a '.' and the section type are appended.  */
>  #define LTO_SECTION_NAME_PREFIX ".gnu.lto_"
> +#define OFFLOAD_SECTION_NAME_PREFIX ".gnu.offload_lto_"
> +
> +/* Can be either OFFLOAD_SECTION_NAME_PREFIX when we stream IR for offload
> +   compiler, or LTO_SECTION_NAME_PREFIX for LTO case.  */
> +extern const char *section_name_prefix;
>  
>  /* Segment name for LTO sections.  This is only used for Mach-O.  */
>  
> diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
> index cb647bd..79c137d 100644
> --- a/gcc/lto-streamer.c
> +++ b/gcc/lto-streamer.c
> @@ -56,6 +56,7 @@ struct lto_stats_d lto_stats;
>  static bitmap_obstack lto_obstack;
>  static bool lto_obstack_initialized;
>  
> +const char *section_name_prefix = LTO_SECTION_NAME_PREFIX;
>  
>  /* Return a string representing LTO tag TAG.  */
>  
> @@ -185,7 +186,7 @@ lto_get_section_name (int section_type, const char *name, struct lto_file_decl_d
>      sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, f->id);
>    else
>      sprintf (post, "." HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); 
> -  return concat (LTO_SECTION_NAME_PREFIX, sep, add, post, NULL);
> +  return concat (section_name_prefix, sep, add, post, NULL);
>  }
>  
>  
> diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h
> index 63e4b32..0b3fb6a 100644
> --- a/gcc/lto-streamer.h
> +++ b/gcc/lto-streamer.h
> @@ -832,6 +832,7 @@ bool referenced_from_this_partition_p (symtab_node *,
>  bool reachable_from_this_partition_p (struct cgraph_node *,
>  				      lto_symtab_encoder_t);
>  lto_symtab_encoder_t compute_ltrans_boundary (lto_symtab_encoder_t encoder);
> +void select_what_to_stream (bool);
>  
>  
>  /* In lto-symtab.c.  */
> diff --git a/gcc/lto/lto-object.c b/gcc/lto/lto-object.c
> index 28b459c..637d1f2 100644
> --- a/gcc/lto/lto-object.c
> +++ b/gcc/lto/lto-object.c
> @@ -238,8 +238,7 @@ lto_obj_add_section (void *data, const char *name, off_t offset,
>    void **slot;
>    struct lto_section_list *list = loasd->list;
>  
> -  if (strncmp (name, LTO_SECTION_NAME_PREFIX,
> -	       strlen (LTO_SECTION_NAME_PREFIX)) != 0)
> +  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
>      return 1;
>  
>    new_name = xstrdup (name);
> diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
> index b647275..6290b23 100644
> --- a/gcc/lto/lto-partition.c
> +++ b/gcc/lto/lto-partition.c
> @@ -928,6 +928,8 @@ lto_promote_cross_file_statics (void)
>  
>    gcc_assert (flag_wpa);
>  
> +  select_what_to_stream (false);
> +
>    /* First compute boundaries.  */
>    n_sets = ltrans_partitions.length ();
>    for (i = 0; i < n_sets; i++)
> diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
> index 1234cee..0451e71 100644
> --- a/gcc/lto/lto.c
> +++ b/gcc/lto/lto.c
> @@ -2127,7 +2127,7 @@ lto_section_with_id (const char *name, unsigned HOST_WIDE_INT *id)
>  {
>    const char *s;
>  
> -  if (strncmp (name, LTO_SECTION_NAME_PREFIX, strlen (LTO_SECTION_NAME_PREFIX)))
> +  if (strncmp (name, section_name_prefix, strlen (section_name_prefix)))
>      return 0;
>    s = strrchr (name, '.');
>    return s && sscanf (s, "." HOST_WIDE_INT_PRINT_HEX_PURE, id) == 1;
> @@ -2902,6 +2902,10 @@ read_cgraph_and_symbols (unsigned nfiles, const char **fnames)
>  
>    timevar_push (TV_IPA_LTO_DECL_IN);
>  
> +#ifdef ACCEL_COMPILER
> +    section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
> +#endif
> +
>    real_file_decl_data
>      = decl_data = ggc_cleared_vec_alloc<lto_file_decl_data_ptr> (nfiles + 1);
>    real_file_count = nfiles;
> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index fe9bf80..1404b5e 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -81,6 +81,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-nested.h"
>  #include "tree-eh.h"
>  #include "cilk.h"
> +#include "context.h"
>  
>  
>  /* Lowering of OpenMP parallel and workshare constructs proceeds in two
> @@ -268,6 +269,16 @@ is_parallel_ctx (omp_context *ctx)
>  }
>  
>  
> +/* Return true if CTX is for an omp target region.  */
> +
> +static inline bool
> +is_targetreg_ctx (omp_context *ctx)
> +{
> +  return gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
> +	 && gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_REGION;
> +}
> +
> +
>  /* Return true if CTX is for an omp task.  */
>  
>  static inline bool
> @@ -1933,26 +1944,19 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
>    DECL_EXTERNAL (decl) = 0;
>    DECL_CONTEXT (decl) = NULL_TREE;
>    DECL_INITIAL (decl) = make_node (BLOCK);
> -  bool target_p = false;
> -  if (lookup_attribute ("omp declare target",
> -			DECL_ATTRIBUTES (current_function_decl)))
> -    target_p = true;
> +  if (cgraph_node::get (current_function_decl)->offloadable)
> +    cgraph_node::get_create (decl)->offloadable = 1;
>    else
>      {
>        omp_context *octx;
>        for (octx = ctx; octx; octx = octx->outer)
> -	if (gimple_code (octx->stmt) == GIMPLE_OMP_TARGET
> -	    && gimple_omp_target_kind (octx->stmt)
> -	       == GF_OMP_TARGET_KIND_REGION)
> +	if (is_targetreg_ctx (octx))
>  	  {
> -	    target_p = true;
> +	    cgraph_node::get_create (decl)->offloadable = 1;
> +	    g->have_offload = true;
>  	    break;
>  	  }
>      }
> -  if (target_p)
> -    DECL_ATTRIBUTES (decl)
> -      = tree_cons (get_identifier ("omp declare target"),
> -		   NULL_TREE, DECL_ATTRIBUTES (decl));
>  
>    t = build_decl (DECL_SOURCE_LOCATION (decl),
>  		  RESULT_DECL, NULL_TREE, void_type_node);
> @@ -2658,8 +2662,7 @@ check_omp_nesting_restrictions (gimple stmt, omp_context *ctx)
>        break;
>      case GIMPLE_OMP_TARGET:
>        for (; ctx != NULL; ctx = ctx->outer)
> -	if (gimple_code (ctx->stmt) == GIMPLE_OMP_TARGET
> -	    && gimple_omp_target_kind (ctx->stmt) == GF_OMP_TARGET_KIND_REGION)
> +	if (is_targetreg_ctx (ctx))
>  	  {
>  	    const char *name;
>  	    switch (gimple_omp_target_kind (stmt))
> @@ -8276,6 +8279,7 @@ expand_omp_target (struct omp_region *region)
>    if (kind == GF_OMP_TARGET_KIND_REGION)
>      {
>        unsigned srcidx, dstidx, num;
> +      struct cgraph_node *node;
>  
>        /* If the target region needs data sent from the parent
>  	 function, then the very first statement (except possible
> @@ -8407,6 +8411,11 @@ expand_omp_target (struct omp_region *region)
>        push_cfun (child_cfun);
>        cgraph_edge::rebuild_edges ();
>  
> +      /* Prevent IPA from removing child_fn as unreachable, since there are no
> +	 refs from the parent function to child_fn in offload LTO mode.  */
> +      node = cgraph_node::get (child_fn);
> +      node->mark_force_output ();
> +
>        /* Some EH regions might become dead, see PR34608.  If
>  	 pass_cleanup_cfg isn't the first pass to happen with the
>  	 new child, these dead EH edges might cause problems.
> @@ -9277,6 +9286,17 @@ lower_omp_critical (gimple_stmt_iterator *gsi_p, omp_context *ctx)
>  	  DECL_COMMON (decl) = 1;
>  	  DECL_ARTIFICIAL (decl) = 1;
>  	  DECL_IGNORED_P (decl) = 1;
> +
> +	  /* If '#pragma omp critical' is inside target region, the symbol must
> +	     be marked for offloading.  */
> +	  omp_context *octx;
> +	  for (octx = ctx->outer; octx; octx = octx->outer)
> +	    if (is_targetreg_ctx (octx))
> +	      {
> +		varpool_node::get_create (decl)->offloadable = 1;
> +		break;
> +	      }
> +
>  	  varpool_node::finalize_decl (decl);
>  
>  	  splay_tree_insert (critical_name_mutexes, (splay_tree_key) name,
> diff --git a/gcc/passes.c b/gcc/passes.c
> index 8432de8..bd4031b 100644
> --- a/gcc/passes.c
> +++ b/gcc/passes.c
> @@ -2303,7 +2303,7 @@ ipa_write_summaries_1 (lto_symtab_encoder_t encoder)
>  /* Write out summaries for all the nodes in the callgraph.  */
>  
>  void
> -ipa_write_summaries (void)
> +ipa_write_summaries (bool offload_lto_mode)
>  {
>    lto_symtab_encoder_t encoder;
>    int i, order_pos;
> @@ -2314,6 +2314,8 @@ ipa_write_summaries (void)
>    if (!flag_generate_lto || seen_error ())
>      return;
>  
> +  select_what_to_stream (offload_lto_mode);
> +
>    encoder = lto_symtab_encoder_new (false);
>  
>    /* Create the callgraph set in the same order used in
> @@ -2340,15 +2342,16 @@ ipa_write_summaries (void)
>  	  renumber_gimple_stmt_uids ();
>  	  pop_cfun ();
>  	}
> -      if (node->definition)
> +      if (node->definition && node->need_lto_streaming)
>          lto_set_symtab_encoder_in_partition (encoder, node);
>      }
>  
>    FOR_EACH_DEFINED_FUNCTION (node)
> -    if (node->alias)
> +    if (node->alias && node->need_lto_streaming)
>        lto_set_symtab_encoder_in_partition (encoder, node);
>    FOR_EACH_DEFINED_VARIABLE (vnode)
> -    lto_set_symtab_encoder_in_partition (encoder, vnode);
> +    if (vnode->need_lto_streaming)
> +      lto_set_symtab_encoder_in_partition (encoder, vnode);
>  
>    ipa_write_summaries_1 (compute_ltrans_boundary (encoder));
>  
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 3db1a08..cbed6e7 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -594,7 +594,7 @@ extern void pass_fini_dump_file (opt_pass *);
>  extern const char *get_current_pass_name (void);
>  extern void print_current_pass (FILE *);
>  extern void debug_pass (void);
> -extern void ipa_write_summaries (void);
> +extern void ipa_write_summaries (bool);
>  extern void ipa_write_optimization_summaries (struct lto_symtab_encoder_d *);
>  extern void ipa_read_summaries (void);
>  extern void ipa_read_optimization_summaries (void);
> diff --git a/gcc/varpool.c b/gcc/varpool.c
> index 50b5665..c508bf9 100644
> --- a/gcc/varpool.c
> +++ b/gcc/varpool.c
> @@ -40,6 +40,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple.h"
>  #include "lto-streamer.h"
>  #include "hash-set.h"
> +#include "context.h"
>  
>  const char * const tls_model_names[]={"none", "tls-emulated", "tls-real",
>  				      "tls-global-dynamic", "tls-local-dynamic",
> @@ -155,6 +156,14 @@ varpool_node::get_create (tree decl)
>  
>    node = varpool_node::create_empty ();
>    node->decl = decl;
> +
> +  if (flag_openmp
> +      && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
> +    {
> +      node->offloadable = 1;
> +      g->have_offload = true;
> +    }
> +
>    node->register_symbol ();
>    return node;
>  }
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-11-12  9:32                       ` Richard Biener
@ 2014-11-12 14:11                         ` Kirill Yukhin
  2014-11-12 14:23                           ` Richard Biener
  0 siblings, 1 reply; 62+ messages in thread
From: Kirill Yukhin @ 2014-11-12 14:11 UTC (permalink / raw)
  To: Richard Biener
  Cc: Ilya Verbin, Jakub Jelinek, Thomas Schwinge, Jan Hubicka,
	gcc-patches, Ilya Tocar, Andrey Turetskiy, Bernd Schmidt

Hello Richard,
On 12 Nov 10:23, Richard Biener wrote:
> On Wed, 5 Nov 2014, Ilya Verbin wrote:
> Yes please.
> 
> Please make sure that regular LTO bootstrap still works - LTO is
> only tested lightly in the testsuite.

Current main trunk fails to bootstrap w/ `bootstrap-lto':
git/gcc/configure --enable-languages=c,c++ --with-build-config=bootstrap-lto --with-fpmath=sse

/export/users/kyukhin/gcc/build/build-x86_64-linux/./prev-gcc/xg++ -B/export/users/kyukhin/gcc/build/build-x86_64-linux/./prev-gcc/ -B/usr/local/x86_64-unknown-linux-gnu/bin/ -nostdinc++ -B/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs -B/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs  -I/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu  -I/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include  -I/export/users/kyukhin/gcc/git/gcc/libstdc++-v3/libsupc++ -L/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs -L/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs   -g -O2 -flto=jobserver -frandom-seed=1 -DIN_GCC    -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -static-libstdc++ -static-libgcc  -o cc1 c/c-lang.o c-family/stub-objc.o attribs.o c/c-errors.o c/c-decl.o c/c-typeck.o c/c-convert.o c/c-aux-info.o c/c-objc-common.o c/c-parser.o c/c-array-notation.o c-family/c-common.o c-family/c-cppbuiltin.o c-family/c-dump.o c-family/c-format.o c-family/c-gimplify.o c-family/c-lex.o c-family/c-omp.o c-family/c-opts.o c-family/c-pch.o c-family/c-ppoutput.o c-family/c-pragma.o c-family/c-pretty-print.o c-family/c-semantics.o c-family/c-ada-spec.o c-family/c-cilkplus.o c-family/array-notation-common.o c-family/cilk.o c-family/c-ubsan.o i386-c.o glibc-c.o \
  cc1-checksum.o libbackend.a main.o tree-browser.o libcommon-target.a libcommon.a ../libcpp/libcpp.a ../libdecnumber/libdecnumber.a libcommon.a ../libcpp/libcpp.a   ../libbacktrace/.libs/libbacktrace.a ../libiberty/libiberty.a ../libdecnumber/libdecnumber.a    -lmpc -lmpfr -lgmp -rdynamic -ldl  -L../zlib -lz
/export/users/kyukhin/gcc/git/gcc/gcc/gcse.c:447:0: error: type ‘struct bb_data’ violates one definition rule [-Werror=odr]
 struct bb_data
 ^
/export/users/kyukhin/gcc/git/gcc/gcc/lra-lives.c:395:0: note: a different type is defined in another translation unit
 struct bb_data
 ^
/export/users/kyukhin/gcc/git/gcc/gcc/gcse.c:451:0: note: the first difference of corresponding definitions is field ‘max_reg_pressure’
   int max_reg_pressure[N_REG_CLASSES];
 ^
/export/users/kyukhin/gcc/git/gcc/gcc/lra-lives.c:398:0: note: a field with different name is defined in another translation unit
   basic_block bb;
 ^
lto1: all warnings being treated as errors
lto-wrapper: fatal error: /export/users/kyukhin/gcc/build/build-x86_64-linux/./prev-gcc/xg++ returned 1 exit status
compilation terminated.
/usr/bin/ld: lto-wrapper failed
collect2: error: ld returned 1 exit status
make[3]: *** [cc1] Error 1

Is it known issue?
(or we are doing something wrong...)

--
Thanks, K

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-11-12 14:11                         ` Kirill Yukhin
@ 2014-11-12 14:23                           ` Richard Biener
  2014-11-12 14:35                             ` Kirill Yukhin
  0 siblings, 1 reply; 62+ messages in thread
From: Richard Biener @ 2014-11-12 14:23 UTC (permalink / raw)
  To: Kirill Yukhin
  Cc: Ilya Verbin, Jakub Jelinek, Thomas Schwinge, Jan Hubicka,
	gcc-patches, Ilya Tocar, Andrey Turetskiy, Bernd Schmidt,
	vmakarov

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4489 bytes --]

On Wed, 12 Nov 2014, Kirill Yukhin wrote:

> Hello Richard,
> On 12 Nov 10:23, Richard Biener wrote:
> > On Wed, 5 Nov 2014, Ilya Verbin wrote:
> > Yes please.
> > 
> > Please make sure that regular LTO bootstrap still works - LTO is
> > only tested lightly in the testsuite.
> 
> Current main trunk fails to bootstrap w/ `bootstrap-lto':
> git/gcc/configure --enable-languages=c,c++ --with-build-config=bootstrap-lto --with-fpmath=sse
> 
> /export/users/kyukhin/gcc/build/build-x86_64-linux/./prev-gcc/xg++ -B/export/users/kyukhin/gcc/build/build-x86_64-linux/./prev-gcc/ -B/usr/local/x86_64-unknown-linux-gnu/bin/ -nostdinc++ -B/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs -B/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs  -I/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu  -I/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include  -I/export/users/kyukhin/gcc/git/gcc/libstdc++-v3/libsupc++ -L/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs -L/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs   -g -O2 -flto=jobserver -frandom-seed=1 -DIN_GCC    -fno-exceptions -fno-rtti
  -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -static-libstdc++ -static-libgcc  -o cc1 c/c-lang.o c-family/stub-objc.o attribs.o c/c-errors.o c/c-decl.o c/c-typeck.o c/c-convert.o c/c-aux-info.o c/c-objc-common.o c/c-parser.o c/c-array-notation.o c-family/c-common.o c-family/c-cppbuiltin.o c-family/c-dump.o c-family/c-format.o c-family/c-gimplify.o c-family/c-lex.o c-family/c-omp.o c-family/c-opts.o c-family/c-pch.o c-family/c-ppoutput.o c-family/c-pragma.o c-family/c-pretty-print.o c-family/c-semantics.o c-family/c-ada-spec.o c-family/c-cilkplus.o c-family/array-notation-common.o c-family/cilk.o c-family/c-ubsan.o i386-c.o glibc-c.o \
>   cc1-checksum.o libbackend.a main.o tree-browser.o libcommon-target.a libcommon.a ../libcpp/libcpp.a ../libdecnumber/libdecnumber.a libcommon.a ../libcpp/libcpp.a   ../libbacktrace/.libs/libbacktrace.a ../libiberty/libiberty.a ../libdecnumber/libdecnumber.a    -lmpc -lmpfr -lgmp -rdynamic -ldl  -L../zlib -lz
> /export/users/kyukhin/gcc/git/gcc/gcc/gcse.c:447:0: error: type ‘struct bb_data’ violates one definition rule [-Werror=odr]
>  struct bb_data
>  ^
> /export/users/kyukhin/gcc/git/gcc/gcc/lra-lives.c:395:0: note: a different type is defined in another translation unit
>  struct bb_data
>  ^
> /export/users/kyukhin/gcc/git/gcc/gcc/gcse.c:451:0: note: the first difference of corresponding definitions is field ‘max_reg_pressure’
>    int max_reg_pressure[N_REG_CLASSES];
>  ^
> /export/users/kyukhin/gcc/git/gcc/gcc/lra-lives.c:398:0: note: a field with different name is defined in another translation unit
>    basic_block bb;
>  ^
> lto1: all warnings being treated as errors
> lto-wrapper: fatal error: /export/users/kyukhin/gcc/build/build-x86_64-linux/./prev-gcc/xg++ returned 1 exit status
> compilation terminated.
> /usr/bin/ld: lto-wrapper failed
> collect2: error: ld returned 1 exit status
> make[3]: *** [cc1] Error 1
> 
> Is it known issue?
> (or we are doing something wrong...)

Seems like Vlad introduced the conflicting type with

2014-11-09  Vladimir Makarov  <vmakarov@redhat.com>

        PR rtl-optimization/63620
        * lra-constraints.c (substitute_pseudo): Add prefix lra_ to the
        name.  Move to lra.c.  Make it external.
        (substitute_pseudo_within_insn): Ditto.
        (inherit_reload_reg, split_reg, remove_inheritance_pseudos): Use
        the new names.
        (undo_optional_reloads): Ditto.
        * lra-int.h (lra_dump_bitmap_with_title, lra_substitute_pseudo):
        New prototypes.
        (lra_substitute_pseudo_within_insn): Ditto.
        * lra-lives.c (bb_killed_pseudos, bb_gen_pseudos): New.
        (mark_regno_live): Add parameter.  Update bb_gen_pseudos.
        (mark_regno_dead): Add parameter.  Update bb_gen_pseudos and
        bb_killed_pseudos.
        (struct bb_data, bb_data_t, bb_data): New.
...

Richard.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-11-12 14:23                           ` Richard Biener
@ 2014-11-12 14:35                             ` Kirill Yukhin
  2014-11-12 14:41                               ` Richard Biener
  0 siblings, 1 reply; 62+ messages in thread
From: Kirill Yukhin @ 2014-11-12 14:35 UTC (permalink / raw)
  To: Richard Biener
  Cc: Ilya Verbin, Jakub Jelinek, Thomas Schwinge, Jan Hubicka,
	gcc-patches, Ilya Tocar, Andrey Turetskiy, Bernd Schmidt,
	vmakarov

On 12 Nov 15:09, Richard Biener wrote:
> On Wed, 12 Nov 2014, Kirill Yukhin wrote:
> 
> > Hello Richard,
> > On 12 Nov 10:23, Richard Biener wrote:
> > > On Wed, 5 Nov 2014, Ilya Verbin wrote:
> > > Yes please.
> > > 
> > > Please make sure that regular LTO bootstrap still works - LTO is
> > > only tested lightly in the testsuite.
> > 
> > Current main trunk fails to bootstrap w/ `bootstrap-lto':
> > git/gcc/configure --enable-languages=c,c++ --with-build-config=bootstrap-lto --with-fpmath=sse
> > 
> > /export/users/kyukhin/gcc/build/build-x86_64-linux/./prev-gcc/xg++ -B/export/users/kyukhin/gcc/build/build-x86_64-linux/./prev-gcc/ -B/usr/local/x86_64-unknown-linux-gnu/bin/ -nostdinc++ -B/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs -B/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs  -I/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu  -I/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include  -I/export/users/kyukhin/gcc/git/gcc/libstdc++-v3/libsupc++ -L/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs -L/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs   -g -O2 -flto=jobserver -frandom-seed=1 -DIN_GCC    -fno-exceptions -fno-rtti
>   -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -static-libstdc++ -static-libgcc  -o cc1 c/c-lang.o c-family/stub-objc.o attribs.o c/c-errors.o c/c-decl.o c/c-typeck.o c/c-convert.o c/c-aux-info.o c/c-objc-common.o c/c-parser.o c/c-array-notation.o c-family/c-common.o c-family/c-cppbuiltin.o c-family/c-dump.o c-family/c-format.o c-family/c-gimplify.o c-family/c-lex.o c-family/c-omp.o c-family/c-opts.o c-family/c-pch.o c-family/c-ppoutput.o c-family/c-pragma.o c-family/c-pretty-print.o c-family/c-semantics.o c-family/c-ada-spec.o c-family/c-cilkplus.o c-family/array-notation-common.o c-family/cilk.o c-family/c-ubsan.o i386-c.o glibc-c.o \
> >   cc1-checksum.o libbackend.a main.o tree-browser.o libcommon-target.a libcommon.a ../libcpp/libcpp.a ../libdecnumber/libdecnumber.a libcommon.a ../libcpp/libcpp.a   ../libbacktrace/.libs/libbacktrace.a ../libiberty/libiberty.a ../libdecnumber/libdecnumber.a    -lmpc -lmpfr -lgmp -rdynamic -ldl  -L../zlib -lz
> > /export/users/kyukhin/gcc/git/gcc/gcc/gcse.c:447:0: error: type ‘struct bb_data’ violates one definition rule [-Werror=odr]
> >  struct bb_data
> >  ^
> > /export/users/kyukhin/gcc/git/gcc/gcc/lra-lives.c:395:0: note: a different type is defined in another translation unit
> >  struct bb_data
> >  ^
> > /export/users/kyukhin/gcc/git/gcc/gcc/gcse.c:451:0: note: the first difference of corresponding definitions is field ‘max_reg_pressure’
> >    int max_reg_pressure[N_REG_CLASSES];
> >  ^
> > /export/users/kyukhin/gcc/git/gcc/gcc/lra-lives.c:398:0: note: a field with different name is defined in another translation unit
> >    basic_block bb;
> >  ^
> > lto1: all warnings being treated as errors
> > lto-wrapper: fatal error: /export/users/kyukhin/gcc/build/build-x86_64-linux/./prev-gcc/xg++ returned 1 exit status
> > compilation terminated.
> > /usr/bin/ld: lto-wrapper failed
> > collect2: error: ld returned 1 exit status
> > make[3]: *** [cc1] Error 1
> > 
> > Is it known issue?
> > (or we are doing something wrong...)
> 
> Seems like Vlad introduced the conflicting type with
Okay, we're going to test our changes with patch in the bottom applied
both to kyukhin/gomp4-offload and trunk.

     * gcc/lra-lives.c (struct bb_data): Rename to ...
     (struct bb_data_pseudos): ... this.
     (initiate_live_solver): Update struct name.

Is it ok if lto-bootstrap pass?

--
Thanks, K

diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
index 03def82..2c54ca70 100644
--- a/gcc/lra-lives.c
+++ b/gcc/lra-lives.c
@@ -392,7 +392,7 @@ mark_regno_dead (int regno, machine_mode mode, int point, bool local_sets_p)
 
 /* Structure describing local BB data used for pseudo
    live-analysis.  */
-struct bb_data
+struct bb_data_pseudos
 {
   /* Basic block about which the below data are.  */
   basic_block bb;
@@ -401,7 +401,7 @@ struct bb_data
 };
 
 /* Array for all BB data.  Indexed by the corresponding BB index.  */
-typedef struct bb_data *bb_data_t;
+typedef struct bb_data_pseudos *bb_data_t;
 
 /* All basic block data are referred through the following array.  */
 static bb_data_t bb_data;
@@ -481,7 +481,7 @@ initiate_live_solver (void)
   bitmap_initialize (&temp_bitmap, &reg_obstack);
   bitmap_initialize (&all_hard_regs_bitmap, &reg_obstack);
   bitmap_set_range (&all_hard_regs_bitmap, 0, FIRST_PSEUDO_REGISTER);
-  bb_data = XNEWVEC (struct bb_data, last_basic_block_for_fn (cfun));
+  bb_data = XNEWVEC (struct bb_data_pseudos, last_basic_block_for_fn (cfun));
   bitmap_initialize (&all_blocks, &reg_obstack);
 
   basic_block bb;

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-11-12 14:35                             ` Kirill Yukhin
@ 2014-11-12 14:41                               ` Richard Biener
  2014-11-12 17:38                                 ` Ilya Verbin
  0 siblings, 1 reply; 62+ messages in thread
From: Richard Biener @ 2014-11-12 14:41 UTC (permalink / raw)
  To: Kirill Yukhin
  Cc: Ilya Verbin, Jakub Jelinek, Thomas Schwinge, Jan Hubicka,
	gcc-patches, Ilya Tocar, Andrey Turetskiy, Bernd Schmidt,
	vmakarov

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5651 bytes --]

On Wed, 12 Nov 2014, Kirill Yukhin wrote:

> On 12 Nov 15:09, Richard Biener wrote:
> > On Wed, 12 Nov 2014, Kirill Yukhin wrote:
> > 
> > > Hello Richard,
> > > On 12 Nov 10:23, Richard Biener wrote:
> > > > On Wed, 5 Nov 2014, Ilya Verbin wrote:
> > > > Yes please.
> > > > 
> > > > Please make sure that regular LTO bootstrap still works - LTO is
> > > > only tested lightly in the testsuite.
> > > 
> > > Current main trunk fails to bootstrap w/ `bootstrap-lto':
> > > git/gcc/configure --enable-languages=c,c++ --with-build-config=bootstrap-lto --with-fpmath=sse
> > > 
> > > /export/users/kyukhin/gcc/build/build-x86_64-linux/./prev-gcc/xg++ -B/export/users/kyukhin/gcc/build/build-x86_64-linux/./prev-gcc/ -B/usr/local/x86_64-unknown-linux-gnu/bin/ -nostdinc++ -B/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs -B/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs  -I/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu  -I/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/include  -I/export/users/kyukhin/gcc/git/gcc/libstdc++-v3/libsupc++ -L/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/src/.libs -L/export/users/kyukhin/gcc/build/build-x86_64-linux/prev-x86_64-unknown-linux-gnu/libstdc++-v3/libsupc++/.libs   -g -O2 -flto=jobserver -frandom-seed=1 -DIN_GCC    -fno-exceptions -fno-
 rtti
> >   -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -static-libstdc++ -static-libgcc  -o cc1 c/c-lang.o c-family/stub-objc.o attribs.o c/c-errors.o c/c-decl.o c/c-typeck.o c/c-convert.o c/c-aux-info.o c/c-objc-common.o c/c-parser.o c/c-array-notation.o c-family/c-common.o c-family/c-cppbuiltin.o c-family/c-dump.o c-family/c-format.o c-family/c-gimplify.o c-family/c-lex.o c-family/c-omp.o c-family/c-opts.o c-family/c-pch.o c-family/c-ppoutput.o c-family/c-pragma.o c-family/c-pretty-print.o c-family/c-semantics.o c-family/c-ada-spec.o c-family/c-cilkplus.o c-family/array-notation-common.o c-family/cilk.o c-family/c-ubsan.o i386-c.o glibc-c.o \
> > >   cc1-checksum.o libbackend.a main.o tree-browser.o libcommon-target.a libcommon.a ../libcpp/libcpp.a ../libdecnumber/libdecnumber.a libcommon.a ../libcpp/libcpp.a   ../libbacktrace/.libs/libbacktrace.a ../libiberty/libiberty.a ../libdecnumber/libdecnumber.a    -lmpc -lmpfr -lgmp -rdynamic -ldl  -L../zlib -lz
> > > /export/users/kyukhin/gcc/git/gcc/gcc/gcse.c:447:0: error: type ‘struct bb_data’ violates one definition rule [-Werror=odr]
> > >  struct bb_data
> > >  ^
> > > /export/users/kyukhin/gcc/git/gcc/gcc/lra-lives.c:395:0: note: a different type is defined in another translation unit
> > >  struct bb_data
> > >  ^
> > > /export/users/kyukhin/gcc/git/gcc/gcc/gcse.c:451:0: note: the first difference of corresponding definitions is field ‘max_reg_pressure’
> > >    int max_reg_pressure[N_REG_CLASSES];
> > >  ^
> > > /export/users/kyukhin/gcc/git/gcc/gcc/lra-lives.c:398:0: note: a field with different name is defined in another translation unit
> > >    basic_block bb;
> > >  ^
> > > lto1: all warnings being treated as errors
> > > lto-wrapper: fatal error: /export/users/kyukhin/gcc/build/build-x86_64-linux/./prev-gcc/xg++ returned 1 exit status
> > > compilation terminated.
> > > /usr/bin/ld: lto-wrapper failed
> > > collect2: error: ld returned 1 exit status
> > > make[3]: *** [cc1] Error 1
> > > 
> > > Is it known issue?
> > > (or we are doing something wrong...)
> > 
> > Seems like Vlad introduced the conflicting type with
> Okay, we're going to test our changes with patch in the bottom applied
> both to kyukhin/gomp4-offload and trunk.
> 
>      * gcc/lra-lives.c (struct bb_data): Rename to ...
>      (struct bb_data_pseudos): ... this.
>      (initiate_live_solver): Update struct name.
> 
> Is it ok if lto-bootstrap pass?

Ok.

Thanks,
Richard.

> --
> Thanks, K
> 
> diff --git a/gcc/lra-lives.c b/gcc/lra-lives.c
> index 03def82..2c54ca70 100644
> --- a/gcc/lra-lives.c
> +++ b/gcc/lra-lives.c
> @@ -392,7 +392,7 @@ mark_regno_dead (int regno, machine_mode mode, int point, bool local_sets_p)
>  
>  /* Structure describing local BB data used for pseudo
>     live-analysis.  */
> -struct bb_data
> +struct bb_data_pseudos
>  {
>    /* Basic block about which the below data are.  */
>    basic_block bb;
> @@ -401,7 +401,7 @@ struct bb_data
>  };
>  
>  /* Array for all BB data.  Indexed by the corresponding BB index.  */
> -typedef struct bb_data *bb_data_t;
> +typedef struct bb_data_pseudos *bb_data_t;
>  
>  /* All basic block data are referred through the following array.  */
>  static bb_data_t bb_data;
> @@ -481,7 +481,7 @@ initiate_live_solver (void)
>    bitmap_initialize (&temp_bitmap, &reg_obstack);
>    bitmap_initialize (&all_hard_regs_bitmap, &reg_obstack);
>    bitmap_set_range (&all_hard_regs_bitmap, 0, FIRST_PSEUDO_REGISTER);
> -  bb_data = XNEWVEC (struct bb_data, last_basic_block_for_fn (cfun));
> +  bb_data = XNEWVEC (struct bb_data_pseudos, last_basic_block_for_fn (cfun));
>    bitmap_initialize (&all_blocks, &reg_obstack);
>  
>    basic_block bb;
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284
(AG Nuernberg)
Maxfeldstrasse 5, 90409 Nuernberg, Germany

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-11-12 14:41                               ` Richard Biener
@ 2014-11-12 17:38                                 ` Ilya Verbin
  2014-11-13  8:51                                   ` Richard Biener
  0 siblings, 1 reply; 62+ messages in thread
From: Ilya Verbin @ 2014-11-12 17:38 UTC (permalink / raw)
  To: Richard Biener
  Cc: Kirill Yukhin, Jakub Jelinek, Thomas Schwinge, Jan Hubicka,
	gcc-patches, Ilya Tocar, Andrey Turetskiy, Bernd Schmidt,
	vmakarov

On 12 Nov 15:34, Richard Biener wrote:
> > > Seems like Vlad introduced the conflicting type with
> > Okay, we're going to test our changes with patch in the bottom applied
> > both to kyukhin/gomp4-offload and trunk.
> > 
> >      * gcc/lra-lives.c (struct bb_data): Rename to ...
> >      (struct bb_data_pseudos): ... this.
> >      (initiate_live_solver): Update struct name.
> > 
> > Is it ok if lto-bootstrap pass?
> 
> Ok.

With this patch lto-bootstrap reached comparison stage and failed:

Comparing stages 2 and 3
Bootstrap comparison failure!
gcc/tree-sra.o differs
make[3]: *** [compare] Error 1

In objdump I see the difference only in .gnu.lto_.decls.1 section.
And this error occurs both on trunk and with our patches applied, so looks like
everything is ok?

Thanks,
  -- Ilya

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-11-12 17:38                                 ` Ilya Verbin
@ 2014-11-13  8:51                                   ` Richard Biener
  0 siblings, 0 replies; 62+ messages in thread
From: Richard Biener @ 2014-11-13  8:51 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Kirill Yukhin, Jakub Jelinek, Thomas Schwinge, Jan Hubicka,
	gcc-patches, Ilya Tocar, Andrey Turetskiy, Bernd Schmidt,
	vmakarov

On Wed, 12 Nov 2014, Ilya Verbin wrote:

> On 12 Nov 15:34, Richard Biener wrote:
> > > > Seems like Vlad introduced the conflicting type with
> > > Okay, we're going to test our changes with patch in the bottom applied
> > > both to kyukhin/gomp4-offload and trunk.
> > > 
> > >      * gcc/lra-lives.c (struct bb_data): Rename to ...
> > >      (struct bb_data_pseudos): ... this.
> > >      (initiate_live_solver): Update struct name.
> > > 
> > > Is it ok if lto-bootstrap pass?
> > 
> > Ok.
> 
> With this patch lto-bootstrap reached comparison stage and failed:
> 
> Comparing stages 2 and 3
> Bootstrap comparison failure!
> gcc/tree-sra.o differs
> make[3]: *** [compare] Error 1
> 
> In objdump I see the difference only in .gnu.lto_.decls.1 section.
> And this error occurs both on trunk and with our patches applied, so looks like
> everything is ok?

Yeah, I think the above is already reported as -fcompare-debug fail.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2014-11-05 12:47                     ` Ilya Verbin
  2014-11-05 12:50                       ` Jakub Jelinek
  2014-11-12  9:32                       ` Richard Biener
@ 2015-07-31 15:37                       ` Thomas Schwinge
  2015-07-31 15:43                         ` Ilya Verbin
  2 siblings, 1 reply; 62+ messages in thread
From: Thomas Schwinge @ 2015-07-31 15:37 UTC (permalink / raw)
  To: Ilya Verbin, Jakub Jelinek, Richard Biener
  Cc: Jan Hubicka, gcc-patches, Kirill Yukhin, Ilya Tocar, Andrey Turetskiy

[-- Attachment #1: Type: text/plain, Size: 8459 bytes --]

Hi!

We had established the use of a boolean flag have_offload in gcc::context
to indicate whether during compilation, we've actually seen any code to
be offloaded (see cited below the relevant parts of the patch by Ilya et
al.).  This means that currently, the whole offload machinery will not be
run unless we actually have any offloaded data.  This means that the
configured mkoffload programs (-foffload=[...], defaulting to
configure-time --enable-offload-targets=[...]) will not be invoked unless
we actually have any offloaded data.  This means that we will not
actually generate constructor code to call libgomp's
GOMP_offload_register unless we actually have any offloaded data.  At
runtime, in libgomp, we then cannot reliably tell which -foffload=[...]
targets have been specified during compilation.

But: at runtime, I'd like to know which -foffload=[...] targets have been
specified during compilation, so that we can, for example, reliably
resort to host fallback execution for -foffload=disable instead of
getting error message that an offloaded function is missing.  On the
other hand, for example, for -foffload=nvptx-none, even if user program
code doesn't contain any offloaded data (and thus the offload machinery
has not been run), the user program might still contain any executable
directives or OpenACC runtime library calls, so we'd still like to use
the libgomp nvptx plugin.  However, we currently cannot detect this
situation.

I see two ways to resolve this: a) embed the compile-time -foffload=[...]
configuration in the executable (as a string, for example) for libgomp to
look that up, or b) make it a requirement that (if configured via
-foffload=[...]), the offload machinery is run even if there is not
actually any data to be offloaded, so we then reliably get the respective
constructor call to libgomp's GOMP_offload_register.  I once began to
implement a), but this to get a big ugly, so then looked into b) instead.
Compared to the status quo, always running the whole offloading machinery
for the configured -foffload=[...] targets whenever -fopenacc/-fopenmp
are active, certainly does introduce some overhead when there isn't
actually any code to be offloaded, so I'm not sure whether that is
acceptable?

Anyway, please comment on the prototype patch for b) that I'm posting
below, after citing the patch that added boolean flag have_offload in
gcc::context:

On Wed, 5 Nov 2014 15:46:55 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> --- a/gcc/cgraph.c
> +++ b/gcc/cgraph.c
> @@ -474,6 +475,14 @@ cgraph_node::create (tree decl)
>    gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
>  
>    node->decl = decl;
> +
> +  if (flag_openmp
> +      && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
> +    {
> +      node->offloadable = 1;
> +      g->have_offload = true;
> +    }
> +

> --- a/gcc/cgraphunit.c
> +++ b/gcc/cgraphunit.c
> @@ -2049,7 +2050,18 @@ ipa_passes (void)
>      targetm.asm_out.lto_start ();
>  
>    if (!in_lto_p)
> -    ipa_write_summaries ();
> +    {
> +      if (g->have_offload)
> +	{
> +	  section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
> +	  ipa_write_summaries (true);
> +	}
> +      if (flag_lto)
> +	{
> +	  section_name_prefix = LTO_SECTION_NAME_PREFIX;
> +	  ipa_write_summaries (false);
> +	}
> +    }
>  
>    if (flag_generate_lto)
>      targetm.asm_out.lto_end ();
> @@ -2129,8 +2141,12 @@ symbol_table::compile (void)
>      fprintf (stderr, "Performing interprocedural optimizations\n");
>    state = IPA;
>  
> +  /* Offloading requires LTO infrastructure.  */
> +  if (!in_lto_p && g->have_offload)
> +    flag_generate_lto = 1;
> +
>    /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
> -  if (flag_lto)
> +  if (flag_generate_lto)
>      lto_streamer_hooks_init ();
>  
>    /* Don't run the IPA passes if there was any error or sorry messages.  */

> --- a/gcc/context.c
> +++ b/gcc/context.c
> @@ -30,6 +30,8 @@ gcc::context *g;
>  
>  gcc::context::context ()
>  {
> +  have_offload = false;
> +
>    /* The pass manager's constructor uses the dump manager (to set up
>       dumps for the various passes), so the dump manager must be set up
>       before the pass manager.  */

> --- a/gcc/context.h
> +++ b/gcc/context.h
> @@ -33,6 +33,9 @@ class context
>  public:
>    context ();
>  
> +  /* The flag shows if there are symbols to be streamed for offloading.  */
> +  bool have_offload;
> +
>    /* Pass-management.  */
>  
>    pass_manager *get_passes () { gcc_assert (m_passes); return m_passes; }

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -1933,26 +1944,19 @@ create_omp_child_function (omp_context *ctx, bool task_copy)

> +	if (is_targetreg_ctx (octx))
>  	  {
> -	    target_p = true;
> +	    cgraph_node::get_create (decl)->offloadable = 1;
> +	    g->have_offload = true;
>  	    break;
>  	  }
>      }

> --- a/gcc/varpool.c
> +++ b/gcc/varpool.c
> @@ -155,6 +156,14 @@ varpool_node::get_create (tree decl)
>  
>    node = varpool_node::create_empty ();
>    node->decl = decl;
> +
> +  if (flag_openmp
> +      && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
> +    {
> +      node->offloadable = 1;
> +      g->have_offload = true;
> +    }
> +
>    node->register_symbol ();
>    return node;
>  }

Prototype patch for b):

--- gcc/cgraph.c
+++ gcc/cgraph.c
@@ -513,12 +512,7 @@ cgraph_node::create (tree decl)
 
   if ((flag_openacc || flag_openmp)
       && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
-    {
-      node->offloadable = 1;
-#ifdef ENABLE_OFFLOADING
-      g->have_offload = true;
-#endif
-    }
+    node->offloadable = 1;
 
   node->register_symbol ();
 
--- gcc/cgraphunit.c
+++ gcc/cgraphunit.c
@@ -2226,13 +2226,15 @@ ipa_passes (void)
 
   if (!in_lto_p)
     {
-      if (g->have_offload)
+#ifdef ENABLE_OFFLOADING
+      if (flag_openacc || flag_openmp)
 	{
 	  section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
 	  lto_stream_offload_p = true;
 	  ipa_write_summaries ();
 	  lto_stream_offload_p = false;
 	}
+#endif
       if (flag_lto)
 	{
 	  section_name_prefix = LTO_SECTION_NAME_PREFIX;
@@ -2323,9 +2325,11 @@ symbol_table::compile (void)
     fprintf (stderr, "Performing interprocedural optimizations\n");
   state = IPA;
 
+#ifdef ENABLE_OFFLOADING
   /* Offloading requires LTO infrastructure.  */
-  if (!in_lto_p && g->have_offload)
+  if (!in_lto_p && (flag_openacc || flag_openmp))
     flag_generate_offload = 1;
+#endif
 
   /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
   if (flag_generate_lto || flag_generate_offload)
--- gcc/context.c
+++ gcc/context.c
@@ -29,8 +29,6 @@ gcc::context *g;
 
 gcc::context::context ()
 {
-  have_offload = false;
-
   /* The pass manager's constructor uses the dump manager (to set up
      dumps for the various passes), so the dump manager must be set up
      before the pass manager.  */
--- gcc/context.h
+++ gcc/context.h
@@ -34,9 +34,6 @@ public:
   context ();
   ~context ();
 
-  /* The flag shows if there are symbols to be streamed for offloading.  */
-  bool have_offload;
-
   /* Pass-management.  */
 
   pass_manager *get_passes () { gcc_assert (m_passes); return m_passes; }
--- gcc/lto-cgraph.c
+++ gcc/lto-cgraph.c
@@ -1122,8 +1122,10 @@ read_string (struct lto_input_block *ib)
 void
 output_offload_tables (void)
 {
+#if 0
   if (vec_safe_is_empty (offload_funcs) && vec_safe_is_empty (offload_vars))
     return;
+#endif
 
   struct lto_simple_output_block *ob
     = lto_create_simple_output_block (LTO_section_offload_table);
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -2288,9 +2287,6 @@ create_omp_child_function (omp_context *ctx, bool task_copy)
 	if (is_gimple_omp_offloaded (octx->stmt))
 	  {
 	    cgraph_node::get_create (decl)->offloadable = 1;
-#ifdef ENABLE_OFFLOADING
-	    g->have_offload = true;
-#endif
 	    break;
 	  }
     }
--- gcc/varpool.c
+++ gcc/varpool.c
@@ -149,7 +148,6 @@ make_offloadable_1 (varpool_node *node, tree decl ATTRIBUTE_UNUSED)
 {
   node->offloadable = 1;
 #ifdef ENABLE_OFFLOADING
-  g->have_offload = true;
   if (!in_lto_p)
     vec_safe_push (offload_vars, decl);
   node->force_output = 1;


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2015-07-31 15:37                       ` Thomas Schwinge
@ 2015-07-31 15:43                         ` Ilya Verbin
  2015-08-05  8:40                           ` Richard Biener
  0 siblings, 1 reply; 62+ messages in thread
From: Ilya Verbin @ 2015-07-31 15:43 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Jakub Jelinek, Richard Biener, Jan Hubicka, gcc-patches, Kirill Yukhin

On Fri, Jul 31, 2015 at 16:08:27 +0200, Thomas Schwinge wrote:
> We had established the use of a boolean flag have_offload in gcc::context
> to indicate whether during compilation, we've actually seen any code to
> be offloaded (see cited below the relevant parts of the patch by Ilya et
> al.).  This means that currently, the whole offload machinery will not be
> run unless we actually have any offloaded data.  This means that the
> configured mkoffload programs (-foffload=[...], defaulting to
> configure-time --enable-offload-targets=[...]) will not be invoked unless
> we actually have any offloaded data.  This means that we will not
> actually generate constructor code to call libgomp's
> GOMP_offload_register unless we actually have any offloaded data.

Yes, that was the plan.

> runtime, in libgomp, we then cannot reliably tell which -foffload=[...]
> targets have been specified during compilation.
> 
> But: at runtime, I'd like to know which -foffload=[...] targets have been
> specified during compilation, so that we can, for example, reliably
> resort to host fallback execution for -foffload=disable instead of
> getting error message that an offloaded function is missing.

It's easy to fix:

diff --git a/libgomp/target.c b/libgomp/target.c
index a5fb164..f81d570 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1066,9 +1066,6 @@ gomp_get_target_fn_addr (struct gomp_device_descr *devicep,
       k.host_end = k.host_start + 1;
       splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
       gomp_mutex_unlock (&devicep->lock);
-      if (tgt_fn == NULL)
-	gomp_fatal ("Target function wasn't mapped");
-
       return (void *) tgt_fn->tgt_offset;
     }
 }
@@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
     return gomp_target_fallback (fn, hostaddrs);
 
   void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
+  if (fn_addr == NULL)
+    return gomp_target_fallback (fn, hostaddrs);
 
   struct target_mem_desc *tgt_vars
     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
@@ -1155,6 +1154,8 @@ GOMP_target_41 (int device, void (*fn) (void *), size_t mapnum,
     }
 
   void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
+  if (fn_addr == NULL)
+    return gomp_target_fallback (fn, hostaddrs);
 
   struct target_mem_desc *tgt_vars
     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,


> other hand, for example, for -foffload=nvptx-none, even if user program
> code doesn't contain any offloaded data (and thus the offload machinery
> has not been run), the user program might still contain any executable
> directives or OpenACC runtime library calls, so we'd still like to use
> the libgomp nvptx plugin.  However, we currently cannot detect this
> situation.
> 
> I see two ways to resolve this: a) embed the compile-time -foffload=[...]
> configuration in the executable (as a string, for example) for libgomp to
> look that up, or b) make it a requirement that (if configured via
> -foffload=[...]), the offload machinery is run even if there is not
> actually any data to be offloaded, so we then reliably get the respective
> constructor call to libgomp's GOMP_offload_register.  I once began to
> implement a), but this to get a big ugly, so then looked into b) instead.
> Compared to the status quo, always running the whole offloading machinery
> for the configured -foffload=[...] targets whenever -fopenacc/-fopenmp
> are active, certainly does introduce some overhead when there isn't
> actually any code to be offloaded, so I'm not sure whether that is
> acceptable?

I vote for (a).

  -- Ilya

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2015-07-31 15:43                         ` Ilya Verbin
@ 2015-08-05  8:40                           ` Richard Biener
  2015-08-05 15:09                             ` Ilya Verbin
  0 siblings, 1 reply; 62+ messages in thread
From: Richard Biener @ 2015-08-05  8:40 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Thomas Schwinge, Jakub Jelinek, Richard Biener, Jan Hubicka,
	GCC Patches, Kirill Yukhin

On Fri, Jul 31, 2015 at 4:20 PM, Ilya Verbin <iverbin@gmail.com> wrote:
> On Fri, Jul 31, 2015 at 16:08:27 +0200, Thomas Schwinge wrote:
>> We had established the use of a boolean flag have_offload in gcc::context
>> to indicate whether during compilation, we've actually seen any code to
>> be offloaded (see cited below the relevant parts of the patch by Ilya et
>> al.).  This means that currently, the whole offload machinery will not be
>> run unless we actually have any offloaded data.  This means that the
>> configured mkoffload programs (-foffload=[...], defaulting to
>> configure-time --enable-offload-targets=[...]) will not be invoked unless
>> we actually have any offloaded data.  This means that we will not
>> actually generate constructor code to call libgomp's
>> GOMP_offload_register unless we actually have any offloaded data.
>
> Yes, that was the plan.
>
>> runtime, in libgomp, we then cannot reliably tell which -foffload=[...]
>> targets have been specified during compilation.
>>
>> But: at runtime, I'd like to know which -foffload=[...] targets have been
>> specified during compilation, so that we can, for example, reliably
>> resort to host fallback execution for -foffload=disable instead of
>> getting error message that an offloaded function is missing.
>
> It's easy to fix:
>
> diff --git a/libgomp/target.c b/libgomp/target.c
> index a5fb164..f81d570 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -1066,9 +1066,6 @@ gomp_get_target_fn_addr (struct gomp_device_descr *devicep,
>        k.host_end = k.host_start + 1;
>        splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
>        gomp_mutex_unlock (&devicep->lock);
> -      if (tgt_fn == NULL)
> -       gomp_fatal ("Target function wasn't mapped");
> -
>        return (void *) tgt_fn->tgt_offset;
>      }
>  }
> @@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
>      return gomp_target_fallback (fn, hostaddrs);
>
>    void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> +  if (fn_addr == NULL)
> +    return gomp_target_fallback (fn, hostaddrs);
>
>    struct target_mem_desc *tgt_vars
>      = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
> @@ -1155,6 +1154,8 @@ GOMP_target_41 (int device, void (*fn) (void *), size_t mapnum,
>      }
>
>    void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> +  if (fn_addr == NULL)
> +    return gomp_target_fallback (fn, hostaddrs);
>
>    struct target_mem_desc *tgt_vars
>      = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,
>
>
>> other hand, for example, for -foffload=nvptx-none, even if user program
>> code doesn't contain any offloaded data (and thus the offload machinery
>> has not been run), the user program might still contain any executable
>> directives or OpenACC runtime library calls, so we'd still like to use
>> the libgomp nvptx plugin.  However, we currently cannot detect this
>> situation.
>>
>> I see two ways to resolve this: a) embed the compile-time -foffload=[...]
>> configuration in the executable (as a string, for example) for libgomp to
>> look that up, or b) make it a requirement that (if configured via
>> -foffload=[...]), the offload machinery is run even if there is not
>> actually any data to be offloaded, so we then reliably get the respective
>> constructor call to libgomp's GOMP_offload_register.  I once began to
>> implement a), but this to get a big ugly, so then looked into b) instead.
>> Compared to the status quo, always running the whole offloading machinery
>> for the configured -foffload=[...] targets whenever -fopenacc/-fopenmp
>> are active, certainly does introduce some overhead when there isn't
>> actually any code to be offloaded, so I'm not sure whether that is
>> acceptable?
>
> I vote for (a).

What happens for conflicting -fofffload=[...] options in different TUs?

Richard.

>   -- Ilya

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming
  2015-08-05  8:40                           ` Richard Biener
@ 2015-08-05 15:09                             ` Ilya Verbin
  2015-08-14  9:49                               ` Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) (was: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming) Thomas Schwinge
  0 siblings, 1 reply; 62+ messages in thread
From: Ilya Verbin @ 2015-08-05 15:09 UTC (permalink / raw)
  To: Richard Biener, Thomas Schwinge
  Cc: Jakub Jelinek, Richard Biener, Jan Hubicka, GCC Patches, Kirill Yukhin

On Wed, Aug 05, 2015 at 10:40:44 +0200, Richard Biener wrote:
> On Fri, Jul 31, 2015 at 4:20 PM, Ilya Verbin <iverbin@gmail.com> wrote:
> > On Fri, Jul 31, 2015 at 16:08:27 +0200, Thomas Schwinge wrote:
> >> We had established the use of a boolean flag have_offload in gcc::context
> >> to indicate whether during compilation, we've actually seen any code to
> >> be offloaded (see cited below the relevant parts of the patch by Ilya et
> >> al.).  This means that currently, the whole offload machinery will not be
> >> run unless we actually have any offloaded data.  This means that the
> >> configured mkoffload programs (-foffload=[...], defaulting to
> >> configure-time --enable-offload-targets=[...]) will not be invoked unless
> >> we actually have any offloaded data.  This means that we will not
> >> actually generate constructor code to call libgomp's
> >> GOMP_offload_register unless we actually have any offloaded data.
> >
> > Yes, that was the plan.
> >
> >> runtime, in libgomp, we then cannot reliably tell which -foffload=[...]
> >> targets have been specified during compilation.
> >>
> >> But: at runtime, I'd like to know which -foffload=[...] targets have been
> >> specified during compilation, so that we can, for example, reliably
> >> resort to host fallback execution for -foffload=disable instead of
> >> getting error message that an offloaded function is missing.
> >
> > It's easy to fix:
> >
> > diff --git a/libgomp/target.c b/libgomp/target.c
> > index a5fb164..f81d570 100644
> > --- a/libgomp/target.c
> > +++ b/libgomp/target.c
> > @@ -1066,9 +1066,6 @@ gomp_get_target_fn_addr (struct gomp_device_descr *devicep,
> >        k.host_end = k.host_start + 1;
> >        splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
> >        gomp_mutex_unlock (&devicep->lock);
> > -      if (tgt_fn == NULL)
> > -       gomp_fatal ("Target function wasn't mapped");
> > -
> >        return (void *) tgt_fn->tgt_offset;
> >      }
> >  }
> > @@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
> >      return gomp_target_fallback (fn, hostaddrs);
> >
> >    void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> > +  if (fn_addr == NULL)
> > +    return gomp_target_fallback (fn, hostaddrs);
> >
> >    struct target_mem_desc *tgt_vars
> >      = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
> > @@ -1155,6 +1154,8 @@ GOMP_target_41 (int device, void (*fn) (void *), size_t mapnum,
> >      }
> >
> >    void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> > +  if (fn_addr == NULL)
> > +    return gomp_target_fallback (fn, hostaddrs);
> >
> >    struct target_mem_desc *tgt_vars
> >      = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,
> >
> >
> >> other hand, for example, for -foffload=nvptx-none, even if user program
> >> code doesn't contain any offloaded data (and thus the offload machinery
> >> has not been run), the user program might still contain any executable
> >> directives or OpenACC runtime library calls, so we'd still like to use
> >> the libgomp nvptx plugin.  However, we currently cannot detect this
> >> situation.
> >>
> >> I see two ways to resolve this: a) embed the compile-time -foffload=[...]
> >> configuration in the executable (as a string, for example) for libgomp to
> >> look that up, or b) make it a requirement that (if configured via
> >> -foffload=[...]), the offload machinery is run even if there is not
> >> actually any data to be offloaded, so we then reliably get the respective
> >> constructor call to libgomp's GOMP_offload_register.  I once began to
> >> implement a), but this to get a big ugly, so then looked into b) instead.
> >> Compared to the status quo, always running the whole offloading machinery
> >> for the configured -foffload=[...] targets whenever -fopenacc/-fopenmp
> >> are active, certainly does introduce some overhead when there isn't
> >> actually any code to be offloaded, so I'm not sure whether that is
> >> acceptable?
> >
> > I vote for (a).
> 
> What happens for conflicting -fofffload=[...] options in different TUs?

If you're asking about what happens now, only the list of offload targets from
link-time -foffload=tgt1,tgt2 option matters.

I don't like plan (b) because it calls ipa_write_summaries unconditionally for
all OpenMP programs, which creates IR sections, which increases filesize and may
cause other problems, e.g. <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63868>.
Also compile-time is increased because of LTO machinery, mkoffloads, etc.

If OpenACC requires some registration in libgomp even without offload, maybe you
can run this machinery only under flag_openacc?

  -- Ilya

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) (was: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming)
  2015-08-05 15:09                             ` Ilya Verbin
@ 2015-08-14  9:49                               ` Thomas Schwinge
  2015-08-14 13:29                                 ` Ilya Verbin
  2015-08-14 17:08                                 ` Joseph Myers
  0 siblings, 2 replies; 62+ messages in thread
From: Thomas Schwinge @ 2015-08-14  9:49 UTC (permalink / raw)
  To: Jakub Jelinek, Ilya Verbin, Richard Biener, Joseph Myers
  Cc: Richard Biener, Jan Hubicka, GCC Patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 48335 bytes --]

Hi!

Assuming that the overall approach (my option a) is fine, this is now
primarily a question about how to teach the driver to "the right thing".
(Joseph CCed as driver reviewer.)

On Wed, 5 Aug 2015 18:09:04 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> On Wed, Aug 05, 2015 at 10:40:44 +0200, Richard Biener wrote:
> > On Fri, Jul 31, 2015 at 4:20 PM, Ilya Verbin <iverbin@gmail.com> wrote:
> > > On Fri, Jul 31, 2015 at 16:08:27 +0200, Thomas Schwinge wrote:
> > >> We had established the use of a boolean flag have_offload in gcc::context
> > >> to indicate whether during compilation, we've actually seen any code to
> > >> be offloaded (see cited below the relevant parts of the patch by Ilya et
> > >> al.).  This means that currently, the whole offload machinery will not be
> > >> run unless we actually have any offloaded data.  This means that the
> > >> configured mkoffload programs (-foffload=[...], defaulting to
> > >> configure-time --enable-offload-targets=[...]) will not be invoked unless
> > >> we actually have any offloaded data.  This means that we will not
> > >> actually generate constructor code to call libgomp's
> > >> GOMP_offload_register unless we actually have any offloaded data.
> > >
> > > Yes, that was the plan.
> > >
> > >> runtime, in libgomp, we then cannot reliably tell which -foffload=[...]
> > >> targets have been specified during compilation.
> > >>
> > >> But: at runtime, I'd like to know which -foffload=[...] targets have been
> > >> specified during compilation, so that we can, for example, reliably
> > >> resort to host fallback execution for -foffload=disable instead of
> > >> getting error message that an offloaded function is missing.
> > >
> > > It's easy to fix:
> > >
> > > diff --git a/libgomp/target.c b/libgomp/target.c
> > > index a5fb164..f81d570 100644
> > > --- a/libgomp/target.c
> > > +++ b/libgomp/target.c
> > > @@ -1066,9 +1066,6 @@ gomp_get_target_fn_addr (struct gomp_device_descr *devicep,
> > >        k.host_end = k.host_start + 1;
> > >        splay_tree_key tgt_fn = splay_tree_lookup (&devicep->mem_map, &k);
> > >        gomp_mutex_unlock (&devicep->lock);
> > > -      if (tgt_fn == NULL)
> > > -       gomp_fatal ("Target function wasn't mapped");
> > > -
> > >        return (void *) tgt_fn->tgt_offset;
> > >      }

Won't that possibly result in a NULL pointer dereference (tgt_fn) --
instead return NULL, I think?

> > > @@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
> > >      return gomp_target_fallback (fn, hostaddrs);
> > >
> > >    void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> > > +  if (fn_addr == NULL)
> > > +    return gomp_target_fallback (fn, hostaddrs);

Is that reliable?  Consider the following scenario, with f1 and f2
implemented in separate TUs:

    #pragma omp target data [map clauses]
    {
      f1([...]);
      f2([...]);
    }

Consider that in f1 we have a OpenMP target region with offloading data
available, and in f2 we have a OpenMP target region without offloading
data available.  In this case, the GOMP_target in f1 will execute on the
offloading target, but the GOMP_target in f2 will resort to host fallback
-- and we then likely have data inconsistencies, as the data specified by
the map clauses is not synchronized between host and device.

Admittedly, this is user error (inconsistent set of offloading functions
available -- need either all, or none), but in such a scenario probably
we should be doing a better job (at detecting this).  (Note, I'm not sure
whether my current patch actually does any better.)  ;-)

> > >> other hand, for example, for -foffload=nvptx-none, even if user program
> > >> code doesn't contain any offloaded data (and thus the offload machinery
> > >> has not been run), the user program might still contain any executable
> > >> directives or OpenACC runtime library calls, so we'd still like to use
> > >> the libgomp nvptx plugin.  However, we currently cannot detect this
> > >> situation.
> > >>
> > >> I see two ways to resolve this: a) embed the compile-time -foffload=[...]
> > >> configuration in the executable (as a string, for example) for libgomp to
> > >> look that up, or b) make it a requirement that (if configured via
> > >> -foffload=[...]), the offload machinery is run even if there is not
> > >> actually any data to be offloaded, so we then reliably get the respective
> > >> constructor call to libgomp's GOMP_offload_register.  I once began to
> > >> implement a), but this to get a big ugly, so then looked into b) instead.
> > >> Compared to the status quo, always running the whole offloading machinery
> > >> for the configured -foffload=[...] targets whenever -fopenacc/-fopenmp
> > >> are active, certainly does introduce some overhead when there isn't
> > >> actually any code to be offloaded, so I'm not sure whether that is
> > >> acceptable?
> > >
> > > I vote for (a).

OK.  Any other opinions?

> > What happens for conflicting -fofffload=[...] options in different TUs?
> 
> If you're asking about what happens now, only the list of offload targets from
> link-time -foffload=tgt1,tgt2 option matters.

I'm fine with that -- require the user to specify a consistent set of
-foffload options.  (Consistent in the sense such that all offload data
must be available that can possibly be required at run-time.)

> I don't like plan (b) because it calls ipa_write_summaries unconditionally for
> all OpenMP programs, which creates IR sections, which increases filesize and may
> cause other problems, e.g. <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63868>.
> Also compile-time is increased because of LTO machinery, mkoffloads, etc.

I think the compile-time effort is the only real argument against this --
everything else are "just" bugs that need to/have been addressed.  But
OK, assuming it is doable, I have a preference for option a), too.
(Which is why it's option a).)  ;-)

> If OpenACC requires some registration in libgomp even without offload, maybe you
> can run this machinery only under flag_openacc?

That seems to much special-casing for my taste.

Here is an attempt at option a).  The idea is to have "the compilation
process" create a constructor call to a (new) libgomp
GOMP_set_offload_targets function, and at run-time, the information
passed with that call will be used to determine the set of requested
offload targets (if such a GOMP_set_offload_targets call is not made,
defaulting to all of them, just like now).

I've settled on adding a function combined with a constructor call, as
that seems the easiest solution without adding a lot of complexity, to
not break the ABI for older (pre-GCC 6) executables dynamically linking
against the new (GCC 6) libgomp.  (If that weren't a concern, we could
have embedded a string GOMP_offload_targets in the final executable, and
have libgomp look up the information from there, but that string won't be
available in pre-GCC 6 executables, which could possibly be deal with by
relying on symbol lookup ordering (provide a default for
GOMP_offload_targets in libgomp itself -- fragile), or using weak
symbols, but those "are difficult"/not generally supported...)

A lot of the libgomp changes (which apply to gomp-4_0-branch) are just
cleanup, so don't be scared by that.  (That is, the trunk patch will be
simpler in that regard.)

The interesting thing, that I'd like comments on, are the driver changes.

I'm adding a new add-omp-infile spec function (add_omp_infile_spec_func),
which is "invoked" whenever we're linking against libgomp (-fopenacc or
-fopenmp or -ftree-parallelize-loops).  (Should probably generally clean
that up a little.)

This function »generate[s] a C source file containing a constructor call
to GOMP_set_offload_targets [...], and adds that as an infile«.  This
"basically" works ;-) -- but really only for C source code, and for C++
and Fortran it fails if there are command-line options used that conflict
with the C compilation of add-omp-infile, such as (from a libgomp
testsuite run): for C++: -std=c++11, -fno-extern-tls-init, or for
Fortran: -fcray-pointer, -fintrinsic-modules-path.  Any suggestion about
how to overcome that?

That is, in the driver, how can I compile this C file with just the
options that are applicable to the C compiler, and then just link in the
object file?

 gcc/gcc.c                                          | 104 +++++++++++--
 libgomp/config.h.in                                |   8 +-
 libgomp/configure                                  |  33 +++-
 libgomp/env.c                                      |   6 +-
 libgomp/libgomp.h                                  |   1 +
 libgomp/libgomp.map                                |   7 +-
 libgomp/libgomp_g.h                                |   1 +
 libgomp/oacc-init.c                                |  18 ++-
 libgomp/plugin/configfrag.ac                       |  10 +-
 libgomp/target.c                                   | 172 ++++++++++++++++-----
 libgomp/testsuite/lib/libgomp.exp                  |  75 ++-------
 libgomp/testsuite/libgomp.c++/c++.exp              |  13 --
 libgomp/testsuite/libgomp.c/c.exp                  |   2 -
 libgomp/testsuite/libgomp.fortran/fortran.exp      |   5 -
 libgomp/testsuite/libgomp.graphite/graphite.exp    |   2 -
 libgomp/testsuite/libgomp.oacc-c++/c++.exp         |  33 ++--
 libgomp/testsuite/libgomp.oacc-c/c.exp             |  17 +-
 libgomp/testsuite/libgomp.oacc-fortran/fortran.exp |  23 +--
 18 files changed, 322 insertions(+), 208 deletions(-)

diff --git gcc/gcc.c gcc/gcc.c
index 0642be1..bcb32f5 100644
--- gcc/gcc.c
+++ gcc/gcc.c
@@ -1,3 +1,5 @@
+#define FPRINTF if (getenv("DEBUG")) fprintf
+
 /* Compiler driver program that can handle many languages.
    Copyright (C) 1987-2015 Free Software Foundation, Inc.
 
@@ -158,7 +160,7 @@ static const char *const spec_version = DEFAULT_TARGET_VERSION;
 static const char *spec_machine = DEFAULT_TARGET_MACHINE;
 static const char *spec_host_machine = DEFAULT_REAL_TARGET_MACHINE;
 
-/* List of offload targets.  */
+/* List of offload targets.  Empty string for -foffload=disable.  */
 
 static char *offload_targets = NULL;
 
@@ -275,6 +277,8 @@ static const char *compare_debug_auxbase_opt_spec_function (int, const char **);
 static const char *pass_through_libs_spec_func (int, const char **);
 static const char *replace_extension_spec_func (int, const char **);
 static const char *greater_than_spec_func (int, const char **);
+static const char *add_omp_infile_spec_func (int, const char **);
+
 static char *convert_white_space (char *);
 \f
 /* The Specs Language
@@ -1061,6 +1065,14 @@ static const char *const multilib_defaults_raw[] = MULTILIB_DEFAULTS;
 
 static const char *const driver_self_specs[] = {
   "%{fdump-final-insns:-fdump-final-insns=.} %<fdump-final-insns",
+#ifdef ENABLE_OFFLOADING
+  /* If the user didn't specify any, default to all configured offload
+     targets.  */
+  "%{!foffload=*:-foffload=" OFFLOAD_TARGETS "}",
+  /* If linking against libgomp, add a setup file.  */
+  "%{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*} 1):" \
+  "%:add-omp-infile()}",
+#endif /* ENABLE_OFFLOADING */
   DRIVER_SELF_SPECS, CONFIGURE_SPECS, GOMP_SELF_SPECS, GTM_SELF_SPECS,
   CILK_SELF_SPECS
 };
@@ -1491,6 +1503,7 @@ static const struct spec_function static_spec_functions[] =
   { "pass-through-libs",	pass_through_libs_spec_func },
   { "replace-extension",	replace_extension_spec_func },
   { "gt",			greater_than_spec_func },
+  { "add-omp-infile",		add_omp_infile_spec_func },
 #ifdef EXTRA_SPEC_FUNCTIONS
   EXTRA_SPEC_FUNCTIONS
 #endif
@@ -3450,6 +3463,8 @@ static int last_language_n_infiles;
 static void
 handle_foffload_option (const char *arg)
 {
+  FPRINTF(stderr, "%s (\"%s\")\n", __FUNCTION__, arg);
+
   const char *c, *cur, *n, *next, *end;
   char *target;
 
@@ -7162,6 +7177,8 @@ driver::build_multilib_strings () const
 void
 driver::set_up_specs () const
 {
+  FPRINTF(stderr, "%s\n", __FUNCTION__);
+
   const char *spec_machine_suffix;
   char *specs_file;
   size_t i;
@@ -7448,22 +7465,16 @@ driver::maybe_putenv_COLLECT_LTO_WRAPPER () const
 void
 driver::maybe_putenv_OFFLOAD_TARGETS () const
 {
-  const char *targets = offload_targets;
-
-  /* If no targets specified by -foffload, use all available targets.  */
-  if (!targets)
-    targets = OFFLOAD_TARGETS;
+  FPRINTF(stderr, "OFFLOAD_TARGETS = %s\n", offload_targets);
 
-  if (strlen (targets) > 0)
+  if (offload_targets && offload_targets[0] != '\0')
     {
       obstack_grow (&collect_obstack, "OFFLOAD_TARGET_NAMES=",
 		    sizeof ("OFFLOAD_TARGET_NAMES=") - 1);
-      obstack_grow (&collect_obstack, targets,
-		    strlen (targets) + 1);
+      obstack_grow (&collect_obstack, offload_targets,
+		    strlen (offload_targets) + 1);
       xputenv (XOBFINISH (&collect_obstack, char *));
     }
-
-  free (offload_targets);
 }
 
 /* Reject switches that no pass was interested in.  */
@@ -9507,6 +9518,77 @@ greater_than_spec_func (int argc, const char **argv)
   return NULL;
 }
 
+/* If applicable, generate a C source file containing a constructor call to
+   GOMP_set_offload_targets, to inform libgomp which offload targets have
+   actually been requested (-foffload=[...]), and adds that as an infile.  */
+
+static const char *
+add_omp_infile_spec_func (int argc, const char **)
+{
+  gcc_assert (argc == 0);
+  gcc_assert (offload_targets != NULL);
+
+  /* Nothing to do if we're not actually linking.  */
+  if (have_c)
+    return NULL;
+
+  int err;
+  const char *tmp_filename;
+  tmp_filename = make_temp_file (".c");
+  record_temp_file (tmp_filename, !save_temps_flag, !save_temps_flag);
+  FILE *f = fopen (tmp_filename, "w");
+  if (f == NULL)
+    fatal_error (input_location,
+		 "could not open temporary file %s", tmp_filename);
+  /* As libgomp uses constructors internally, and this code is only added when
+     linking against libgomp, it is fine to use a constructor here.  */
+  err = fprintf (f,
+		 "extern void GOMP_set_offload_targets (const char *);\n"
+		 "static __attribute__ ((constructor)) void\n"
+		 "init (void)\n"
+		 "{\n"
+		 "  GOMP_set_offload_targets (\"%s\");\n"
+		 "}\n",
+		 offload_targets);
+  if (err < 0)
+    fatal_error (input_location,
+		 "could not write to temporary file %s", tmp_filename);
+  err = fclose (f);
+  if (err == EOF)
+    fatal_error (input_location,
+		 "could not close temporary file %s", tmp_filename);
+
+  //TODO: correct thing to do?
+#if 1
+  add_infile (tmp_filename, "cpp-output");
+  return NULL;
+#elif 0
+  return tmp_filename;
+#elif 0
+  store_arg ("-x", 0, 0);
+  store_arg ("cpp-output", 0, 0);
+  store_arg (tmp_filename, 0, 0);
+  return NULL;
+#else
+  //add_infile (tmp_filename, /* TODO */ "cpp-output");
+  //int i = n_infiles - 1;
+  //input_file_number = i;
+  set_input (tmp_filename);
+  //outfiles[i] = gcc_input_filename;
+  input_file_compiler
+    = lookup_compiler (gcc_input_filename, input_filename_length,
+		       "cpp-output");
+  err = do_spec (input_file_compiler->spec);
+  //infiles[i].compiled = true;
+  if (err < 0)
+    {
+      delete_failure_queue ();
+      errorcount++;
+    }
+  clear_failure_queue ();
+#endif
+}
+
 /* Insert backslash before spaces in ORIG (usually a file path), to 
    avoid being broken by spec parser.
 
diff --git libgomp/config.h.in libgomp/config.h.in
index 8533f03..d9d5914 100644
--- libgomp/config.h.in
+++ libgomp/config.h.in
@@ -24,6 +24,12 @@
 /* Define to 1 if you have the <dlfcn.h> header file. */
 #undef HAVE_DLFCN_H
 
+/* Define to 1 if you have the `fnmatch' function. */
+#undef HAVE_FNMATCH
+
+/* Define to 1 if you have the <fnmatch.h> header file. */
+#undef HAVE_FNMATCH_H
+
 /* Define to 1 if you have the `getloadavg' function. */
 #undef HAVE_GETLOADAVG
 
@@ -95,7 +101,7 @@
    */
 #undef LT_OBJDIR
 
-/* Define to hold the list of target names suitable for offloading. */
+/* Define to hold the list of offload targets, separated by colons. */
 #undef OFFLOAD_TARGETS
 
 /* Name of package */
diff --git libgomp/configure libgomp/configure
index c93e877..3d990bb 100755
--- libgomp/configure
+++ libgomp/configure
@@ -15119,6 +15119,33 @@ esac
 offload_targets=
 
 plugin_support=yes
+for ac_header in fnmatch.h
+do :
+  ac_fn_c_check_header_mongrel "$LINENO" "fnmatch.h" "ac_cv_header_fnmatch_h" "$ac_includes_default"
+if test "x$ac_cv_header_fnmatch_h" = x""yes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_FNMATCH_H 1
+_ACEOF
+
+else
+  plugin_support=no
+fi
+
+done
+
+for ac_func in fnmatch
+do :
+  ac_fn_c_check_func "$LINENO" "fnmatch" "ac_cv_func_fnmatch"
+if test "x$ac_cv_func_fnmatch" = x""yes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_FNMATCH 1
+_ACEOF
+
+else
+  plugin_support=no
+fi
+done
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for dlsym in -ldl" >&5
 $as_echo_n "checking for dlsym in -ldl... " >&6; }
 if test "${ac_cv_lib_dl_dlsym+set}" = set; then :
@@ -15236,10 +15263,8 @@ if test x"$enable_offload_targets" != x; then
     tgt=`echo $tgt | sed 's/=.*//'`
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
-	tgt_name=intelmic
 	;;
       nvptx*)
-        tgt_name=nvptx
 	PLUGIN_NVPTX=$tgt
 	PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
 	PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
@@ -15282,9 +15307,9 @@ rm -f core conftest.err conftest.$ac_objext \
 	;;
     esac
     if test x"$offload_targets" = x; then
-      offload_targets=$tgt_name
+      offload_targets=$tgt
     else
-      offload_targets=$offload_targets,$tgt_name
+      offload_targets=$offload_targets:$tgt
     fi
     if test x"$tgt_dir" != x; then
       offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
diff --git libgomp/env.c libgomp/env.c
index 1811bf5..6b5e963 100644
--- libgomp/env.c
+++ libgomp/env.c
@@ -1175,11 +1175,7 @@ handle_omp_display_env (unsigned long stacksize, int wait_policy)
 }
 
 
-/* TODO.  See testsuite/lib/libgomp.exp:libgomp_init.  */
-#if 0
-static
-#endif
-void __attribute__((constructor))
+static void __attribute__((constructor))
 initialize_env (void)
 {
   unsigned long thread_limit_var, stacksize;
diff --git libgomp/libgomp.h libgomp/libgomp.h
index 420a525..1f95906 100644
--- libgomp/libgomp.h
+++ libgomp/libgomp.h
@@ -785,6 +785,7 @@ extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
 extern void gomp_fini_device (struct gomp_device_descr *);
 extern void gomp_unload_device (struct gomp_device_descr *);
+extern bool gomp_offload_target_available_p (int);
 
 /* work.c */
 
diff --git libgomp/libgomp.map libgomp/libgomp.map
index 36932d3..05e75fb0 100644
--- libgomp/libgomp.map
+++ libgomp/libgomp.map
@@ -339,6 +339,7 @@ GOACC_2.0.GOMP_4_BRANCH {
 	GOACC_get_ganglocal_ptr;
 	GOACC_parallel_keyed;
 	GOACC_register_static;
+	GOMP_set_offload_targets;
 } GOACC_2.0;
 
 GOMP_PLUGIN_1.0 {
@@ -352,9 +353,3 @@ GOMP_PLUGIN_1.0 {
 	GOMP_PLUGIN_async_unmap_vars;
 	GOMP_PLUGIN_acc_thread;
 };
-
-# TODO.  See testsuite/lib/libgomp.exp:libgomp_init.
-INTERNAL {
-  global:
-	initialize_env;
-};
diff --git libgomp/libgomp_g.h libgomp/libgomp_g.h
index e65b888..e67fc86 100644
--- libgomp/libgomp_g.h
+++ libgomp/libgomp_g.h
@@ -206,6 +206,7 @@ extern void GOMP_single_copy_end (void *);
 
 /* target.c */
 
+extern void GOMP_set_offload_targets (const char *);
 extern void GOMP_target (int, void (*) (void *), const void *,
 			 size_t, void **, size_t *, unsigned char *);
 extern void GOMP_target_data (int, const void *,
diff --git libgomp/oacc-init.c libgomp/oacc-init.c
index 8dfe4a7..78f130c 100644
--- libgomp/oacc-init.c
+++ libgomp/oacc-init.c
@@ -122,7 +122,9 @@ resolve_device (acc_device_t d, bool fail_is_error)
       {
 	if (goacc_device_type)
 	  {
-	    /* Lookup the named device.  */
+	    /* Lookup the device that has been explicitly named, so do not pay
+	       attention to gomp_offload_target_available_p.  (That is, hard
+	       error if not actually available.)  */
 	    while (++d != _ACC_device_hwm)
 	      if (dispatchers[d]
 		  && !strcasecmp (goacc_device_type,
@@ -148,8 +150,14 @@ resolve_device (acc_device_t d, bool fail_is_error)
     case acc_device_not_host:
       /* Find the first available device after acc_device_not_host.  */
       while (++d != _ACC_device_hwm)
-	if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0)
+	if (dispatchers[d]
+	    && dispatchers[d]->get_num_devices_func () > 0
+	    /* No device has been explicitly named, so pay attention to
+	       gomp_offload_target_available_p, to not decide on an offload
+	       target that we don't have offload data available for.  */
+	    && gomp_offload_target_available_p (dispatchers[d]->type))
 	  goto found;
+      /* No non-host device found.  */
       if (d_arg == acc_device_default)
 	{
 	  d = acc_device_host;
@@ -164,9 +172,6 @@ resolve_device (acc_device_t d, bool fail_is_error)
         return NULL;
       break;
 
-    case acc_device_host:
-      break;
-
     default:
       if (d > _ACC_device_hwm)
 	{
@@ -181,7 +186,8 @@ resolve_device (acc_device_t d, bool fail_is_error)
 
   assert (d != acc_device_none
 	  && d != acc_device_default
-	  && d != acc_device_not_host);
+	  && d != acc_device_not_host
+	  && d < _ACC_device_hwm);
 
   if (dispatchers[d] == NULL && fail_is_error)
     {
diff --git libgomp/plugin/configfrag.ac libgomp/plugin/configfrag.ac
index 8c2a420..e2392e1 100644
--- libgomp/plugin/configfrag.ac
+++ libgomp/plugin/configfrag.ac
@@ -29,6 +29,8 @@
 offload_targets=
 AC_SUBST(offload_targets)
 plugin_support=yes
+AC_CHECK_HEADERS([fnmatch.h], , [plugin_support=no])
+AC_CHECK_FUNCS([fnmatch], , [plugin_support=no])
 AC_CHECK_LIB(dl, dlsym, , [plugin_support=no])
 if test x"$plugin_support" = xyes; then
   AC_DEFINE(PLUGIN_SUPPORT, 1,
@@ -92,10 +94,8 @@ if test x"$enable_offload_targets" != x; then
     tgt=`echo $tgt | sed 's/=.*//'`
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
-	tgt_name=intelmic
 	;;
       nvptx*)
-        tgt_name=nvptx
 	PLUGIN_NVPTX=$tgt
 	PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
 	PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
@@ -127,9 +127,9 @@ if test x"$enable_offload_targets" != x; then
 	;;
     esac
     if test x"$offload_targets" = x; then
-      offload_targets=$tgt_name
+      offload_targets=$tgt
     else
-      offload_targets=$offload_targets,$tgt_name
+      offload_targets=$offload_targets:$tgt
     fi
     if test x"$tgt_dir" != x; then
       offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
@@ -141,7 +141,7 @@ if test x"$enable_offload_targets" != x; then
   done
 fi
 AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets",
-  [Define to hold the list of target names suitable for offloading.])
+  [Define to hold the list of offload targets, separated by colons.])
 AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
   [Define to 1 if the NVIDIA plugin is built, 0 if not.])
diff --git libgomp/target.c libgomp/target.c
index 6426254..9cf5251 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -41,6 +41,7 @@
 
 #ifdef PLUGIN_SUPPORT
 #include <dlfcn.h>
+#include <fnmatch.h>
 #include "plugin-suffix.h"
 #endif
 
@@ -122,17 +123,26 @@ gomp_get_num_devices (void)
 }
 
 static struct gomp_device_descr *
-resolve_device (int device_id)
+resolve_device (int device)
 {
-  if (device_id == GOMP_DEVICE_ICV)
+  int device_id;
+  if (device == GOMP_DEVICE_ICV)
     {
       struct gomp_task_icv *icv = gomp_icv (false);
       device_id = icv->default_device_var;
     }
+  else
+    device_id = device;
 
   if (device_id < 0 || device_id >= gomp_get_num_devices ())
     return NULL;
 
+  /* If the device-var ICV does not actually have offload data available, don't
+     try use it (which will fail), and use host fallback instead.  */
+  if (device == GOMP_DEVICE_ICV
+      && !gomp_offload_target_available_p (devices[device_id].type))
+    return NULL;
+
   return &devices[device_id];
 }
 
@@ -947,6 +957,49 @@ gomp_fini_device (struct gomp_device_descr *devicep)
   devicep->is_initialized = false;
 }
 
+/* Do we have offload data available for the given offload target type?
+   Instead of verifying that *all* offload data is available that could
+   possibly be required, we instead just look for *any*.  If we later find any
+   offload data missing, that's user error.  */
+
+attribute_hidden bool
+gomp_offload_target_available_p (int type)
+{
+  bool available = false;
+
+  /* Has the offload target already been initialized?  */
+  for (int i = 0; !available && i < num_devices; i++)
+    {
+      struct gomp_device_descr *devicep = &devices[i];
+      gomp_mutex_lock (&devicep->lock);
+      if (devicep->type == type && devicep->is_initialized)
+	available = true;
+      gomp_mutex_unlock (&devicep->lock);
+    }
+
+  if (!available)
+    {
+      // TODO: locking correct?
+      gomp_mutex_lock (&register_lock);
+
+      /* If there is no offload data available at all, we cannot later fail to
+	 find any of it for a specific offload target.  This is the case where
+	 there are no offloaded code regions in user code, but there can still
+	 be executable directives used, or runtime library calls made.  */
+      if (num_offload_images == 0)
+	available = true;
+
+      /* Can the offload target be initialized?  */
+      for (int i = 0; !available && i < num_offload_images; i++)
+	if (offload_images[i].type == type)
+	  available = true;
+
+      gomp_mutex_unlock (&register_lock);
+    }
+
+  return available;
+}
+
 /* Called when encountering a target directive.  If DEVICE
    is GOMP_DEVICE_ICV, it means use device-var ICV.  If it is
    GOMP_DEVICE_HOST_FALLBACK (or any value
@@ -1116,6 +1169,8 @@ static bool
 gomp_load_plugin_for_device (struct gomp_device_descr *device,
 			     const char *plugin_name)
 {
+  gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, plugin_name);
+
   const char *err = NULL, *last_missing = NULL;
 
   void *plugin_handle = dlopen (plugin_name, RTLD_LAZY);
@@ -1212,6 +1267,38 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   return 0;
 }
 
+/* Helper, to translate from an offload target to the corresponding plugin name.  */
+
+static const char *
+offload_target_to_plugin_name (const char *offload_target)
+{
+  if (fnmatch ("*-intelmic*", offload_target, 0) == 0)
+    return "intelmic";
+  if (fnmatch ("nvptx*", offload_target, 0) == 0)
+    return "nvptx";
+  gomp_fatal ("Unknown offload target: %s", offload_target);
+}
+
+/* List of offload targets, separated by colon.  Defaults to the list
+   determined when configuring libgomp.  */
+static const char *gomp_offload_targets = OFFLOAD_TARGETS;
+
+/* Override the list of offload targets.  This must be called early, and only
+   once.  */
+
+void
+GOMP_set_offload_targets (const char *offload_targets)
+{
+  gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, offload_targets);
+
+  //TODO: any locking?
+  /* Make sure this gets called early.  */
+  assert (gomp_is_initialized == PTHREAD_ONCE_INIT);
+  ///* Make sure this only gets called once.  */
+  //assert (gomp_offload_targets == OFFLOAD_TARGETS);
+  gomp_offload_targets = offload_targets;
+}
+
 /* This function initializes the runtime needed for offloading.
    It parses the list of offload targets and tries to load the plugins for
    these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -1219,11 +1306,12 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
    corresponding devices, first the GOMP_OFFLOAD_CAP_OPENMP_400 ones, follows
    by the others.  */
 
+static const char *gomp_plugin_prefix ="libgomp-plugin-";
+static const char *gomp_plugin_suffix = SONAME_SUFFIX (1);
+
 static void
 gomp_target_init (void)
 {
-  const char *prefix ="libgomp-plugin-";
-  const char *suffix = SONAME_SUFFIX (1);
   const char *cur, *next;
   char *plugin_name;
   int i, new_num_devices;
@@ -1231,48 +1319,60 @@ gomp_target_init (void)
   num_devices = 0;
   devices = NULL;
 
-  cur = OFFLOAD_TARGETS;
+  cur = gomp_offload_targets;
   if (*cur)
     do
       {
-	struct gomp_device_descr current_device;
-
-	next = strchr (cur, ',');
-
-	plugin_name = (char *) malloc (1 + (next ? next - cur : strlen (cur))
-				       + strlen (prefix) + strlen (suffix));
-	if (!plugin_name)
-	  {
-	    num_devices = 0;
-	    break;
-	  }
-
-	strcpy (plugin_name, prefix);
-	strncat (plugin_name, cur, next ? next - cur : strlen (cur));
-	strcat (plugin_name, suffix);
+	next = strchr (cur, ':');
+	/* If no other offload target following...  */
+	if (next == NULL)
+	  /* ..., point to the terminating NUL character.  */
+	  next = cur + strlen (cur);
+
+	size_t gomp_plugin_prefix_len = strlen (gomp_plugin_prefix);
+	size_t cur_len = next - cur;
+	size_t gomp_plugin_suffix_len = strlen (gomp_plugin_suffix);
+	plugin_name = gomp_malloc (gomp_plugin_prefix_len
+				   + cur_len
+				   + gomp_plugin_suffix_len
+				   + 1);
+	memcpy (plugin_name, gomp_plugin_prefix, gomp_plugin_prefix_len);
+	memcpy (plugin_name + gomp_plugin_prefix_len, cur, cur_len);
+	/* NUL-terminate the string here...  */
+	plugin_name[gomp_plugin_prefix_len + cur_len] = '\0';
+	/* ..., so that we can then use it to translate the offload target to
+	   the plugin name...  */
+	const char *cur_plugin_name
+	  = offload_target_to_plugin_name (plugin_name
+					   + gomp_plugin_prefix_len);
+	size_t cur_plugin_name_len = strlen (cur_plugin_name);
+	assert (cur_plugin_name_len <= cur_len);
+	/* ..., and then rewrite it.  */
+	memcpy (plugin_name + gomp_plugin_prefix_len,
+		cur_plugin_name, cur_plugin_name_len);
+	memcpy (plugin_name + gomp_plugin_prefix_len + cur_plugin_name_len,
+		gomp_plugin_suffix, gomp_plugin_suffix_len);
+	plugin_name[gomp_plugin_prefix_len
+		    + cur_plugin_name_len
+		    + gomp_plugin_suffix_len] = '\0';
 
+	struct gomp_device_descr current_device;
 	if (gomp_load_plugin_for_device (&current_device, plugin_name))
 	  {
 	    new_num_devices = current_device.get_num_devices_func ();
 	    if (new_num_devices >= 1)
 	      {
-		/* Augment DEVICES and NUM_DEVICES.  */
-
-		devices = realloc (devices, (num_devices + new_num_devices)
-				   * sizeof (struct gomp_device_descr));
-		if (!devices)
-		  {
-		    num_devices = 0;
-		    free (plugin_name);
-		    break;
-		  }
-
 		current_device.name = current_device.get_name_func ();
 		/* current_device.capabilities has already been set.  */
 		current_device.type = current_device.get_type_func ();
 		current_device.mem_map.root = NULL;
 		current_device.is_initialized = false;
 		current_device.openacc.data_environ = NULL;
+
+		/* Augment DEVICES and NUM_DEVICES.  */
+		devices = gomp_realloc (devices,
+					((num_devices + new_num_devices)
+					 * sizeof (struct gomp_device_descr)));
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.target_id = i;
@@ -1286,18 +1386,12 @@ gomp_target_init (void)
 	free (plugin_name);
 	cur = next + 1;
       }
-    while (next);
+    while (*next);
 
   /* In DEVICES, sort the GOMP_OFFLOAD_CAP_OPENMP_400 ones first, and set
      NUM_DEVICES_OPENMP.  */
   struct gomp_device_descr *devices_s
-    = malloc (num_devices * sizeof (struct gomp_device_descr));
-  if (!devices_s)
-    {
-      num_devices = 0;
-      free (devices);
-      devices = NULL;
-    }
+    = gomp_malloc (num_devices * sizeof (struct gomp_device_descr));
   num_devices_openmp = 0;
   for (i = 0; i < num_devices; i++)
     if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
diff --git libgomp/testsuite/lib/libgomp.exp libgomp/testsuite/lib/libgomp.exp
index 33d1a54..898dfc3 100644
--- libgomp/testsuite/lib/libgomp.exp
+++ libgomp/testsuite/lib/libgomp.exp
@@ -36,24 +36,21 @@ load_gcc_lib fortran-modules.exp
 load_file libgomp-test-support.exp
 
 # Populate offload_targets_s (offloading targets separated by a space), and
-# offload_targets_s_openacc (the same, but with OpenACC names; OpenACC spells
-# some of them a little differently).
-set offload_targets_s [split $offload_targets ","]
+# offload_targets_s_openacc (those suitable for OpenACC).
+set offload_targets_s [split $offload_targets ":"]
 set offload_targets_s_openacc {}
 foreach offload_target_openacc $offload_targets_s {
-    switch $offload_target_openacc {
-	intelmic {
+    switch -glob $offload_target_openacc {
+	*-intelmic* {
 	    # TODO.  Skip; will all FAIL because of missing
 	    # GOMP_OFFLOAD_CAP_OPENACC_200.
 	    continue
 	}
-	nvptx {
-	    set offload_target_openacc "nvidia"
-	}
     }
     lappend offload_targets_s_openacc "$offload_target_openacc"
 }
-lappend offload_targets_s_openacc "host"
+# Host fallback.
+lappend offload_targets_s_openacc "disable"
 
 set dg-do-what-default run
 
@@ -134,7 +131,7 @@ proc libgomp_init { args } {
     # Add liboffloadmic build directory in LD_LIBRARY_PATH to support
     # non-fallback testing for Intel MIC targets
     global offload_targets
-    if { [string match "*,intelmic,*" ",$offload_targets,"] } {
+    if { [string match "*:*-intelmic*:*" ":$offload_targets:"] } {
 	append always_ld_library_path ":${blddir}/../liboffloadmic/.libs"
 	append always_ld_library_path ":${blddir}/../liboffloadmic/plugin/.libs"
 	# libstdc++ is required by liboffloadmic
@@ -235,56 +232,6 @@ proc libgomp_init { args } {
     if { $offload_additional_options != "" } {
 	lappend ALWAYS_CFLAGS "additional_flags=${offload_additional_options}"
     }
-
-    # TODO.  Evil hack.  DejaGnu doesn't have a mechanism for setting
-    # environment variables on remote boards.  Thus, we have to fake it, using
-    # GCC's constructor attributes to create object files that install the
-    # desired environment variables.
-    set e_list [list \
-		    [list defaults DUMMY=dummy ] \
-		    [list ACC_DEVICE_TYPE-host ACC_DEVICE_TYPE=host ] \
-		    [list ACC_DEVICE_TYPE-nvidia ACC_DEVICE_TYPE=nvidia ] ]
-    foreach e $e_list {
-	set v [lindex $e 0]
-	set s [lindex $e 1]
-	verbose "creating constructor-setenv: $v: $s"
-	set src constructor-setenv-$v.c
-	set obj constructor-setenv-$v.o
-	set f_src [open $src "w"]
-	puts $f_src "static void __attribute__((constructor(1000)))"
-	puts $f_src "init_env(void) {"
-	puts $f_src "  int putenv(char *);"
-	puts $f_src "  putenv(\"$s\");"
-	puts $f_src "}"
-	if { $v == "defaults" } {
-	    # TODO.  We want libgomp to initialize after the putenv calls.
-	    # But: shared libraries' constructors (and thus
-	    # env.c:initialize_env) will be called before the executable's
-	    # (init_env functions created above), so it will already have been
-	    # initialized (and has to be, in case we're not linking in this
-	    # gunk).  Assuming no execution of other libgomp functionality in
-	    # between (which we're not doing during initialization),
-	    # initialize_env's effects are idempotent when calling it again, so
-	    # we'll do that now, after the putenv calls have been executed.
-	    puts $f_src "static void __attribute__((constructor(1001)))"
-	    puts $f_src "init_libgomp(void) {"
-	    # Some test cases specify -fno-openmp, so libgomp isn't linked in.
-	    puts $f_src "  void initialize_env(void) __attribute__((weak));"
-	    puts $f_src "  if (initialize_env)"
-	    puts $f_src "    initialize_env();"
-	    puts $f_src "}"
-	}
-	close $f_src
-	# TODO.  Using whichever compiler is currently configured...  At least
-	# switch it into C mode.
-	set lines [libgomp_target_compile $src $obj object "additional_flags=-xc"]
-	# TODO.  Error checking.
-	file delete $src
-    }
-    # When adding constructor-setenv-*.o files, make sure to cancel any -x flag
-    # that may have been set before.
-    lappend ALWAYS_CFLAGS "ldflags=-x none"
-    lappend ALWAYS_CFLAGS "ldflags=constructor-setenv-defaults.o"
 }
 
 #
@@ -296,6 +243,7 @@ proc libgomp_target_compile { source dest type options } {
     global libgomp_compile_options
     global gluefile wrap_flags
     global ALWAYS_CFLAGS
+    global GCC_UNDER_TEST
     global lang_test_file
     global lang_library_path
     global lang_link_flags
@@ -323,6 +271,7 @@ proc libgomp_target_compile { source dest type options } {
 
     lappend options "additional_flags=[libio_include_flags]"
     lappend options "timeout=[timeout_value]"
+    lappend options "compiler=$GCC_UNDER_TEST"
 
     set options [concat $libgomp_compile_options $options]
 
@@ -370,7 +319,7 @@ proc check_effective_target_offload_device { } {
 
 proc check_effective_target_openacc_nvidia_accel_supported { } {
     global offload_targets_s_openacc
-    set res [lsearch $offload_targets_s_openacc "nvidia" ]
+    set res [lsearch -glob $offload_targets_s_openacc "nvptx*" ]
     if { $res != -1 } {
 	return 1;
     }
@@ -396,7 +345,7 @@ proc check_effective_target_openacc_nvidia_accel_selected { } {
 	return 0;
     }
     global offload_target_openacc
-    if { $offload_target_openacc == "nvidia" } {
+    if { [string match "nvptx*" $offload_target_openacc] } {
         return 1;
     }
     return 0;
@@ -406,7 +355,7 @@ proc check_effective_target_openacc_nvidia_accel_selected { } {
 
 proc check_effective_target_openacc_host_selected { } {
     global offload_target_openacc
-    if { $offload_target_openacc == "host" } {
+    if { $offload_target_openacc == "disable" } {
         return 1;
     }
     return 0;
diff --git libgomp/testsuite/libgomp.c++/c++.exp libgomp/testsuite/libgomp.c++/c++.exp
index d6d525a..0454f95 100644
--- libgomp/testsuite/libgomp.c++/c++.exp
+++ libgomp/testsuite/libgomp.c++/c++.exp
@@ -4,7 +4,6 @@ load_gcc_lib gcc-dg.exp
 global shlib_ext
 
 set shlib_ext [get_shlib_extension]
-#TODO
 set lang_link_flags "-lstdc++"
 set lang_test_file_found 0
 set lang_library_path "../libstdc++-v3/src/.libs"
@@ -47,13 +46,6 @@ if { $blddir != "" } {
 }
 
 if { $lang_test_file_found } {
-    if ![info exists GXX_UNDER_TEST] then {
-	# TODO.  See libgomp.oacc-c++/c++.exp.
-	set HAVE_SET_GXX_UNDER_TEST ""
-	set GXX_UNDER_TEST "$GCC_UNDER_TEST"
-    }
-    lappend libgomp_compile_options "compiler=$GXX_UNDER_TEST"
-
     # Gather a list of all tests.
     set tests [lsort [find $srcdir/$subdir *.C]]
 
@@ -76,10 +68,5 @@ if { $lang_test_file_found } {
     dg-runtest $tests "" "$libstdcxx_includes $DEFAULT_CFLAGS"
 }
 
-# TODO.  See above.
-if { [info exists HAVE_SET_GXX_UNDER_TEST] } {
-    unset GXX_UNDER_TEST
-}
-
 # All done.
 dg-finish
diff --git libgomp/testsuite/libgomp.c/c.exp libgomp/testsuite/libgomp.c/c.exp
index 25f347b..300b921 100644
--- libgomp/testsuite/libgomp.c/c.exp
+++ libgomp/testsuite/libgomp.c/c.exp
@@ -23,8 +23,6 @@ dg-init
 # Turn on OpenMP.
 lappend ALWAYS_CFLAGS "additional_flags=-fopenmp"
 
-lappend libgomp_compile_options "compiler=$GCC_UNDER_TEST"
-
 # Gather a list of all tests.
 set tests [lsort [find $srcdir/$subdir *.c]]
 
diff --git libgomp/testsuite/libgomp.fortran/fortran.exp libgomp/testsuite/libgomp.fortran/fortran.exp
index 883c416..f684abc 100644
--- libgomp/testsuite/libgomp.fortran/fortran.exp
+++ libgomp/testsuite/libgomp.fortran/fortran.exp
@@ -47,11 +47,6 @@ if { $blddir != "" } {
 }
 
 if { $lang_test_file_found } {
-    if ![info exists GFORTRAN_UNDER_TEST] then {
-	set GFORTRAN_UNDER_TEST $GCC_UNDER_TEST
-    }
-    lappend libgomp_compile_options "compiler=$GFORTRAN_UNDER_TEST"
-
     # Gather a list of all tests.
     set tests [lsort [find $srcdir/$subdir *.\[fF\]{,90,95,03,08}]]
 
diff --git libgomp/testsuite/libgomp.graphite/graphite.exp libgomp/testsuite/libgomp.graphite/graphite.exp
index 716cdc3..d737c85 100644
--- libgomp/testsuite/libgomp.graphite/graphite.exp
+++ libgomp/testsuite/libgomp.graphite/graphite.exp
@@ -48,8 +48,6 @@ dg-init
 # Turn on OpenMP.
 lappend ALWAYS_CFLAGS "additional_flags=-fopenmp"
 
-lappend libgomp_compile_options "compiler=$GCC_UNDER_TEST"
-
 # Gather a list of all tests.
 set tests [lsort [find $srcdir/$subdir *.c]]
 
diff --git libgomp/testsuite/libgomp.oacc-c++/c++.exp libgomp/testsuite/libgomp.oacc-c++/c++.exp
index e5c875c..f513d87 100644
--- libgomp/testsuite/libgomp.oacc-c++/c++.exp
+++ libgomp/testsuite/libgomp.oacc-c++/c++.exp
@@ -13,7 +13,6 @@ load_gcc_lib gcc-dg.exp
 global shlib_ext
 
 set shlib_ext [get_shlib_extension]
-#TODO
 set lang_link_flags "-lstdc++"
 set lang_test_file_found 0
 set lang_library_path "../libstdc++-v3/src/.libs"
@@ -32,6 +31,11 @@ dg-init
 # Turn on OpenACC.
 lappend ALWAYS_CFLAGS "additional_flags=-fopenacc"
 
+# Switch into C++ mode.  Otherwise, the libgomp.oacc-c-c++-common/*.c
+# files would be compiled as C files.
+set SAVE_GCC_UNDER_TEST "$GCC_UNDER_TEST"
+set GCC_UNDER_TEST "$GCC_UNDER_TEST -x c++"
+
 set blddir [lookfor_file [get_multilibs] libgomp]
 
 
@@ -56,14 +60,6 @@ if { $blddir != "" } {
 }
 
 if { $lang_test_file_found } {
-    if ![info exists GXX_UNDER_TEST] then {
-	# Use GCC_UNDER_TEST, but switch into C++ mode, as otherwise the
-	# libgomp.oacc-c-c++-common/*.c files would be compiled as C files.
-	set HAVE_SET_GXX_UNDER_TEST ""
-	set GXX_UNDER_TEST "$GCC_UNDER_TEST -x c++"
-    }
-    lappend libgomp_compile_options "compiler=$GXX_UNDER_TEST"
-
     # Gather a list of all tests.
     set tests [lsort [concat \
 			  [find $srcdir/$subdir *.C] \
@@ -104,17 +100,14 @@ if { $lang_test_file_found } {
     set SAVE_ALWAYS_CFLAGS "$ALWAYS_CFLAGS"
     foreach offload_target_openacc $offload_targets_s_openacc {
 	set ALWAYS_CFLAGS "$SAVE_ALWAYS_CFLAGS"
-	set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
-	# Set $ACC_DEVICE_TYPE.  See the comments in
-	# ../lib/libgomp.exp:libgomp_init.
-	lappend ALWAYS_CFLAGS "ldflags=constructor-setenv-ACC_DEVICE_TYPE-$offload_target_openacc.o"
 
 	# Todo: Determine shared memory or not using run-time test.
-	switch $offload_target_openacc {
-	    host {
+	switch -glob $offload_target_openacc {
+	    disable {
 		set acc_mem_shared 1
+		set tagopt "-DACC_DEVICE_TYPE_host=1"
 	    }
-	    nvidia {
+	    nvptx* {
 		if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		    # Don't bother; execution testing is going to FAIL.
 		    untested "$subdir $offload_target_openacc offloading"
@@ -128,12 +121,14 @@ if { $lang_test_file_found } {
 		lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/libgomp.oacc-c-c++-common"
 
 		set acc_mem_shared 0
+		set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	    }
 	    default {
 		set acc_mem_shared 0
+		#TODO error
 	    }
 	}
-	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
+	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
 	dg-runtest $tests "$tagopt" "$libstdcxx_includes $DEFAULT_CFLAGS"
 	gcc-dg-runtest $ttests "$tagopt" "$libstdcxx_includes"
@@ -141,9 +136,7 @@ if { $lang_test_file_found } {
 }
 
 # See above.
-if { [info exists HAVE_SET_GXX_UNDER_TEST] } {
-    unset GXX_UNDER_TEST
-}
+set GCC_UNDER_TEST "$SAVE_GCC_UNDER_TEST"
 
 unset TORTURE_OPTIONS
 
diff --git libgomp/testsuite/libgomp.oacc-c/c.exp libgomp/testsuite/libgomp.oacc-c/c.exp
index c91a41b..03fe3a4 100644
--- libgomp/testsuite/libgomp.oacc-c/c.exp
+++ libgomp/testsuite/libgomp.oacc-c/c.exp
@@ -32,8 +32,6 @@ dg-init
 # Turn on OpenACC.
 lappend ALWAYS_CFLAGS "additional_flags=-fopenacc"
 
-lappend libgomp_compile_options "compiler=$GCC_UNDER_TEST"
-
 # Gather a list of all tests.
 set tests [lsort [concat \
 		      [find $srcdir/$subdir *.c] \
@@ -62,17 +60,14 @@ set_ld_library_path_env_vars
 set SAVE_ALWAYS_CFLAGS "$ALWAYS_CFLAGS"
 foreach offload_target_openacc $offload_targets_s_openacc {
     set ALWAYS_CFLAGS "$SAVE_ALWAYS_CFLAGS"
-    set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
-    # Set $ACC_DEVICE_TYPE.  See the comments in
-    # ../lib/libgomp.exp:libgomp_init.
-    lappend ALWAYS_CFLAGS "ldflags=constructor-setenv-ACC_DEVICE_TYPE-$offload_target_openacc.o"
 
     # Todo: Determine shared memory or not using run-time test.
-    switch $offload_target_openacc {
-	host {
+    switch -glob $offload_target_openacc {
+	disable {
 	    set acc_mem_shared 1
+	    set tagopt "-DACC_DEVICE_TYPE_host=1"
 	}
-	nvidia {
+	nvptx* {
 	    if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		# Don't bother; execution testing is going to FAIL.
 		untested "$subdir $offload_target_openacc offloading"
@@ -86,12 +81,14 @@ foreach offload_target_openacc $offload_targets_s_openacc {
 	    lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/libgomp.oacc-c-c++-common"
 
 	    set acc_mem_shared 0
+	    set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	}
 	default {
 	    set acc_mem_shared 0
+	    #TODO error
 	}
     }
-    set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
+    set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
     dg-runtest $tests "$tagopt" $DEFAULT_CFLAGS
     gcc-dg-runtest $ttests "$tagopt" ""
diff --git libgomp/testsuite/libgomp.oacc-fortran/fortran.exp libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
index df46004..b9ad6dc 100644
--- libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
+++ libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
@@ -49,11 +49,6 @@ if { $blddir != "" } {
 }
 
 if { $lang_test_file_found } {
-    if ![info exists GFORTRAN_UNDER_TEST] then {
-	set GFORTRAN_UNDER_TEST $GCC_UNDER_TEST
-    }
-    lappend libgomp_compile_options "compiler=$GFORTRAN_UNDER_TEST"
-
     # Gather a list of all tests.
     set tests [lsort [find $srcdir/$subdir *.\[fF\]{,90,95,03,08}]]
 
@@ -74,20 +69,14 @@ if { $lang_test_file_found } {
     set_ld_library_path_env_vars
 
     # Test OpenACC with available accelerators.
-    set SAVE_ALWAYS_CFLAGS "$ALWAYS_CFLAGS"
     foreach offload_target_openacc $offload_targets_s_openacc {
-	set ALWAYS_CFLAGS "$SAVE_ALWAYS_CFLAGS"
-	set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
-	# Set $ACC_DEVICE_TYPE.  See the comments in
-	# ../lib/libgomp.exp:libgomp_init.
-	lappend ALWAYS_CFLAGS "ldflags=constructor-setenv-ACC_DEVICE_TYPE-$offload_target_openacc.o"
-
 	# Todo: Determine shared memory or not using run-time test.
-	switch $offload_target_openacc {
-	    host {
+	switch -glob $offload_target_openacc {
+	    disable {
 		set acc_mem_shared 1
+		set tagopt "-DACC_DEVICE_TYPE_host=1"
 	    }
-	    nvidia {
+	    nvptx* {
 		if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		    # Don't bother; execution testing is going to FAIL.
 		    untested "$subdir $offload_target_openacc offloading"
@@ -95,12 +84,14 @@ if { $lang_test_file_found } {
 		}
 
 		set acc_mem_shared 0
+		set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	    }
 	    default {
 		set acc_mem_shared 0
+		#TODO error
 	    }
 	}
-	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
+	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
 	# For Fortran we're doing torture testing, as Fortran has far more tests
 	# with arrays etc. that testing just -O0 or -O2 is insufficient, that is


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) (was: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming)
  2015-08-14  9:49                               ` Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) (was: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming) Thomas Schwinge
@ 2015-08-14 13:29                                 ` Ilya Verbin
  2015-08-17 13:57                                   ` Martin Jambor
  2015-08-14 17:08                                 ` Joseph Myers
  1 sibling, 1 reply; 62+ messages in thread
From: Ilya Verbin @ 2015-08-14 13:29 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Jakub Jelinek, Richard Biener, Joseph Myers, Richard Biener,
	Jan Hubicka, GCC Patches, Kirill Yukhin, mjambor

2015-08-14 11:47 GMT+02:00 Thomas Schwinge <thomas@codesourcery.com>:
> On Wed, 5 Aug 2015 18:09:04 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
>> > > @@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
>> > >      return gomp_target_fallback (fn, hostaddrs);
>> > >
>> > >    void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
>> > > +  if (fn_addr == NULL)
>> > > +    return gomp_target_fallback (fn, hostaddrs);
>
> Is that reliable?  Consider the following scenario, with f1 and f2
> implemented in separate TUs:
>
>     #pragma omp target data [map clauses]
>     {
>       f1([...]);
>       f2([...]);
>     }
>
> Consider that in f1 we have a OpenMP target region with offloading data
> available, and in f2 we have a OpenMP target region without offloading
> data available.  In this case, the GOMP_target in f1 will execute on the
> offloading target, but the GOMP_target in f2 will resort to host fallback
> -- and we then likely have data inconsistencies, as the data specified by
> the map clauses is not synchronized between host and device.
>
> Admittedly, this is user error (inconsistent set of offloading functions
> available -- need either all, or none), but in such a scenario probably
> we should be doing a better job (at detecting this).  (Note, I'm not sure
> whether my current patch actually does any better.)  ;-)

You're right. That's why I didn't send this patch for review yet.
My current plan is as follows:
* Use this approach for architectures with shared memory, since it
allows mixing host and target functions.
* For non-shared memory, at the first splay tree lookup:
** If target fn is not found, run the whole program in host-fallback mode.
** If it's found, then all target fns must exist. I.e. if some
tgt_addr (not first) is NULL, then libgomp will issue an error as it
does now.

  -- Ilya

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) (was: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming)
  2015-08-14  9:49                               ` Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) (was: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming) Thomas Schwinge
  2015-08-14 13:29                                 ` Ilya Verbin
@ 2015-08-14 17:08                                 ` Joseph Myers
  2015-08-14 21:48                                   ` Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) Thomas Schwinge
  1 sibling, 1 reply; 62+ messages in thread
From: Joseph Myers @ 2015-08-14 17:08 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Jakub Jelinek, Ilya Verbin, Richard Biener, Richard Biener,
	Jan Hubicka, GCC Patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 793 bytes --]

On Fri, 14 Aug 2015, Thomas Schwinge wrote:

> This function »generate[s] a C source file containing a constructor call
> to GOMP_set_offload_targets [...], and adds that as an infile«.  This
> "basically" works ;-) -- but really only for C source code, and for C++
> and Fortran it fails if there are command-line options used that conflict
> with the C compilation of add-omp-infile, such as (from a libgomp
> testsuite run): for C++: -std=c++11, -fno-extern-tls-init, or for
> Fortran: -fcray-pointer, -fintrinsic-modules-path.  Any suggestion about
> how to overcome that?

I suppose you need to use the option-handling information about which 
options are for which languages to filter out any options that aren't 
valid for C or Common.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
  2015-08-14 17:08                                 ` Joseph Myers
@ 2015-08-14 21:48                                   ` Thomas Schwinge
  2015-08-15  4:03                                     ` Joseph Myers
  0 siblings, 1 reply; 62+ messages in thread
From: Thomas Schwinge @ 2015-08-14 21:48 UTC (permalink / raw)
  To: Joseph Myers
  Cc: Jakub Jelinek, Ilya Verbin, Richard Biener, Richard Biener,
	Jan Hubicka, GCC Patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 2312 bytes --]

Hi!

On Fri, 14 Aug 2015 16:56:25 +0000, Joseph Myers <joseph@codesourcery.com> wrote:
> On Fri, 14 Aug 2015, Thomas Schwinge wrote:
> 
> > This function »generate[s] a C source file containing a constructor call
> > to GOMP_set_offload_targets [...], and adds that as an infile«.  This
> > "basically" works ;-) -- but really only for C source code, and for C++
> > and Fortran it fails if there are command-line options used that conflict
> > with the C compilation of add-omp-infile, such as (from a libgomp
> > testsuite run): for C++: -std=c++11, -fno-extern-tls-init, or for
> > Fortran: -fcray-pointer, -fintrinsic-modules-path.  Any suggestion about
> > how to overcome that?

The "problem", as (I hope) I understand it, is that gcc/gcc.c:cc1_options
includes %{std*[...]} and %{f*}, which will match/accept the
C++/Fortran-specific command-line arguments (as cited above) even if
actually operating in C language mode for the add-omp-infile compilation.

> I suppose you need to use the option-handling information about which 
> options are for which languages to filter out any options that aren't 
> valid for C or Common.

OK, that sounds simple enough, conceptually.  So, you are invalidating my
worry that the driver might in fact not be able to do this kind of thing
(mixed language compilation).

I'm currently trying to understand how all that command-line option
parsing code works, and the handoff from the driver to the frontends;
processing of the specs language.

Can you suggest off-hand where you'd expect this option filtering to
happen?  Should this be during specs parsing in the driver; something
like adding a lang_mask to gcc/gcc.c:struct switchstr, and then in
gcc/gcc.c:give_switch ignore any switches that don't match the expected
CL_*?  I seem to have difficulties to properly populate/deduce that
lang_mask at the call sites of gcc/gcc.c:save_switch.  Or, did you
imagine that to be done differently?

Alternatively, what about changing gcc/opts-global.c:complain_wrong_lang
to silently ignore options that don't apply instead of emitting a »is
valid for [...] but not for [...]« diagnostic, if a (new) flag
(-f[something]?) has been set, which would be active only during the
add-omp-infile compilation?


Grüße,
 Thomas

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
  2015-08-14 21:48                                   ` Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) Thomas Schwinge
@ 2015-08-15  4:03                                     ` Joseph Myers
  2015-08-18 16:55                                       ` Thomas Schwinge
  0 siblings, 1 reply; 62+ messages in thread
From: Joseph Myers @ 2015-08-15  4:03 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Jakub Jelinek, Ilya Verbin, Richard Biener, Richard Biener,
	Jan Hubicka, GCC Patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 1064 bytes --]

On Fri, 14 Aug 2015, Thomas Schwinge wrote:

> Can you suggest off-hand where you'd expect this option filtering to
> happen?  Should this be during specs parsing in the driver; something
> like adding a lang_mask to gcc/gcc.c:struct switchstr, and then in
> gcc/gcc.c:give_switch ignore any switches that don't match the expected
> CL_*?  I seem to have difficulties to properly populate/deduce that
> lang_mask at the call sites of gcc/gcc.c:save_switch.  Or, did you
> imagine that to be done differently?

I don't have a particular design in mind; I was simply noting that the 
relevant information is available to the driver through the option 
handling data.

> Alternatively, what about changing gcc/opts-global.c:complain_wrong_lang
> to silently ignore options that don't apply instead of emitting a »is
> valid for [...] but not for [...]« diagnostic, if a (new) flag
> (-f[something]?) has been set, which would be active only during the
> add-omp-infile compilation?

That would be a possibility, yes.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) (was: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming)
  2015-08-14 13:29                                 ` Ilya Verbin
@ 2015-08-17 13:57                                   ` Martin Jambor
  0 siblings, 0 replies; 62+ messages in thread
From: Martin Jambor @ 2015-08-17 13:57 UTC (permalink / raw)
  To: Ilya Verbin
  Cc: Thomas Schwinge, Jakub Jelinek, Richard Biener, Joseph Myers,
	Richard Biener, Jan Hubicka, GCC Patches, Kirill Yukhin

Hi,

On Fri, Aug 14, 2015 at 03:19:26PM +0200, Ilya Verbin wrote:
> 2015-08-14 11:47 GMT+02:00 Thomas Schwinge <thomas@codesourcery.com>:
> > On Wed, 5 Aug 2015 18:09:04 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> >> > > @@ -1095,6 +1092,8 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
> >> > >      return gomp_target_fallback (fn, hostaddrs);
> >> > >
> >> > >    void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
> >> > > +  if (fn_addr == NULL)
> >> > > +    return gomp_target_fallback (fn, hostaddrs);
> >
> > Is that reliable?  Consider the following scenario, with f1 and f2
> > implemented in separate TUs:
> >
> >     #pragma omp target data [map clauses]
> >     {
> >       f1([...]);
> >       f2([...]);
> >     }
> >
> > Consider that in f1 we have a OpenMP target region with offloading data
> > available, and in f2 we have a OpenMP target region without offloading
> > data available.  In this case, the GOMP_target in f1 will execute on the
> > offloading target, but the GOMP_target in f2 will resort to host fallback
> > -- and we then likely have data inconsistencies, as the data specified by
> > the map clauses is not synchronized between host and device.
> >
> > Admittedly, this is user error (inconsistent set of offloading functions
> > available -- need either all, or none), but in such a scenario probably
> > we should be doing a better job (at detecting this).  (Note, I'm not sure
> > whether my current patch actually does any better.)  ;-)
> 
> You're right. That's why I didn't send this patch for review yet.
> My current plan is as follows:
> * Use this approach for architectures with shared memory, since it
> allows mixing host and target functions.

Great, please keep me posted on these changes.

Thanks!

Martin

> * For non-shared memory, at the first splay tree lookup:
> ** If target fn is not found, run the whole program in host-fallback mode.
> ** If it's found, then all target fns must exist. I.e. if some
> tgt_addr (not first) is NULL, then libgomp will issue an error as it
> does now.
> 
>   -- Ilya

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
  2015-08-15  4:03                                     ` Joseph Myers
@ 2015-08-18 16:55                                       ` Thomas Schwinge
  2015-08-20 23:38                                         ` Joseph Myers
  0 siblings, 1 reply; 62+ messages in thread
From: Thomas Schwinge @ 2015-08-18 16:55 UTC (permalink / raw)
  To: Joseph Myers; +Cc: GCC Patches, Nathan Sidwell


[-- Attachment #1.1: Type: text/plain, Size: 53660 bytes --]

Hi!

On Fri, 14 Aug 2015 22:56:30 +0000, Joseph Myers <joseph@codesourcery.com> wrote:
> On Fri, 14 Aug 2015, Thomas Schwinge wrote:
> 
> > Can you suggest off-hand where you'd expect this option filtering to
> > happen?  Should this be during specs parsing in the driver; something
> > like adding a lang_mask to gcc/gcc.c:struct switchstr, and then in
> > gcc/gcc.c:give_switch ignore any switches that don't match the expected
> > CL_*?  I seem to have difficulties to properly populate/deduce that
> > lang_mask at the call sites of gcc/gcc.c:save_switch.

(I figured that out.)

> > Alternatively, what about changing gcc/opts-global.c:complain_wrong_lang
> > to silently ignore options that don't apply instead of emitting a »is
> > valid for [...] but not for [...]« diagnostic, if a (new) flag
> > (-f[something]?) has been set, which would be active only during the
> > add-omp-infile compilation?
> 
> That would be a possibility, yes.

..., and that even looked like a sensible thing to do, also given that I
found where you added the internal -lang-asm flag five years ago,
gcc/c-family/c-opts.c:accept_all_c_family_options »Whether options from
all C-family languages should be accepted quietly«, which does a rather
similar thing.

Unfortunately, going that route turned out to not work correctly:
consider the Fortran -ffixed-form option, and likewise the
-ffixed-line-length-[...] options.  If not compiling for Fortran, these
will be passed to C-family front ends, and be recognized there by means
of the Common option -ffixed-[...], resulting in »cc1: warning: unknown
register name: form«, for example.  (Yay!)  ;-)

So, back to modifying the driver; here is my current messy WIP patch with
still a lot of TODOs in it -- but it appears to work at last.  :-)

Maybe somebody else is able to continue with that task while I'm out of
office.  This has been developed on top of gomp-4_0-branch r226832.  I'm
also attaching a tarball of the even more messy indivdual patches,
foffload.tar.bz2, in case there's anything to salvage in there, or if
that helps to understand the development options/history.  Earlier
messages in this thread should give enough context what this is about,
<http://news.gmane.org/find-root.php?message_id=%3C87egjopgh0.fsf%40kepler.schwinge.homeip.net%3E>.

 gcc/doc/invoke.texi                                |   4 +
 gcc/gcc.c                                          | 200 ++++++++++++++++++---
 libgomp/config.h.in                                |   8 +-
 libgomp/configure                                  |  33 +++-
 libgomp/env.c                                      |   6 +-
 libgomp/libgomp.h                                  |   1 +
 libgomp/libgomp.map                                |   7 +-
 libgomp/libgomp_g.h                                |   1 +
 libgomp/oacc-init.c                                |  18 +-
 libgomp/plugin/configfrag.ac                       |  10 +-
 libgomp/target.c                                   | 172 ++++++++++++++----
 libgomp/testsuite/lib/libgomp.exp                  |  75 ++------
 libgomp/testsuite/libgomp.c++/c++.exp              |  13 --
 libgomp/testsuite/libgomp.c/c.exp                  |   2 -
 libgomp/testsuite/libgomp.fortran/fortran.exp      |   5 -
 libgomp/testsuite/libgomp.graphite/graphite.exp    |   2 -
 libgomp/testsuite/libgomp.oacc-c++/c++.exp         |  33 ++--
 libgomp/testsuite/libgomp.oacc-c/c.exp             |  17 +-
 libgomp/testsuite/libgomp.oacc-fortran/fortran.exp |  23 +--
 19 files changed, 408 insertions(+), 222 deletions(-)

diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index 8c96ca5..80bc639 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -24036,6 +24036,10 @@ macro in the machine description macro file.
 This flag does not have a negative form, because it specifies a
 three-way choice.
 
+Note that this flag may conflict with the @option{-ffixed-form} as
+well as @option{-ffixed-line-length-none} and
+@option{-ffixed-line-length-<n>} options of the Fortran front end.
+
 @item -fcall-used-@var{reg}
 @opindex fcall-used
 Treat the register named @var{reg} as an allocable register that is
diff --git gcc/gcc.c gcc/gcc.c
index 0642be1..5c7c462 100644
--- gcc/gcc.c
+++ gcc/gcc.c
@@ -1,3 +1,5 @@
+#define FPRINTF if (getenv("DEBUG")) fprintf
+
 /* Compiler driver program that can handle many languages.
    Copyright (C) 1987-2015 Free Software Foundation, Inc.
 
@@ -158,7 +160,7 @@ static const char *const spec_version = DEFAULT_TARGET_VERSION;
 static const char *spec_machine = DEFAULT_TARGET_MACHINE;
 static const char *spec_host_machine = DEFAULT_REAL_TARGET_MACHINE;
 
-/* List of offload targets.  */
+/* List of offload targets.  Empty string for -foffload=disable.  */
 
 static char *offload_targets = NULL;
 
@@ -275,6 +277,8 @@ static const char *compare_debug_auxbase_opt_spec_function (int, const char **);
 static const char *pass_through_libs_spec_func (int, const char **);
 static const char *replace_extension_spec_func (int, const char **);
 static const char *greater_than_spec_func (int, const char **);
+static const char *add_omp_infile_spec_func (int, const char **);
+
 static char *convert_white_space (char *);
 \f
 /* The Specs Language
@@ -1061,6 +1065,14 @@ static const char *const multilib_defaults_raw[] = MULTILIB_DEFAULTS;
 
 static const char *const driver_self_specs[] = {
   "%{fdump-final-insns:-fdump-final-insns=.} %<fdump-final-insns",
+#ifdef ENABLE_OFFLOADING
+  /* If the user didn't specify any, default to all configured offload
+     targets.  */
+  "%{!foffload=*:-foffload=" OFFLOAD_TARGETS "}",
+  /* If linking against libgomp, add a setup file.  */
+  "%{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*} 1):" \
+  "%:add-omp-infile()}",
+#endif /* ENABLE_OFFLOADING */
   DRIVER_SELF_SPECS, CONFIGURE_SPECS, GOMP_SELF_SPECS, GTM_SELF_SPECS,
   CILK_SELF_SPECS
 };
@@ -1189,6 +1201,12 @@ static const struct compiler default_compilers[] =
   {".i", "@cpp-output", 0, 0, 0},
   {"@cpp-output",
    "%{!M:%{!MM:%{!E:cc1 -fpreprocessed %i %(cc1_options) %{!fsyntax-only:%(invoke_as)}}}}", 0, 0, 0},
+  /* TODO: "@cpp-output-with-lang-complain-none" is a duplicate of
+     "@cpp-output" just for the purpose of using it in
+     add_omp_infile_spec_func, and detecting that in give_switch.  This should
+     be done differently.  */
+  {"@cpp-output-with-lang-complain-none",
+   "%{!M:%{!MM:%{!E:cc1 -fpreprocessed %i %(cc1_options) %{!fsyntax-only:%(invoke_as)}}}}", 0, 0, 0},
   {".s", "@assembler", 0, 0, 0},
   {"@assembler",
    "%{!M:%{!MM:%{!E:%{!S:as %(asm_debug) %(asm_options) %i %A }}}}", 0, 0, 0},
@@ -1491,6 +1509,7 @@ static const struct spec_function static_spec_functions[] =
   { "pass-through-libs",	pass_through_libs_spec_func },
   { "replace-extension",	replace_extension_spec_func },
   { "gt",			greater_than_spec_func },
+  { "add-omp-infile",		add_omp_infile_spec_func },
 #ifdef EXTRA_SPEC_FUNCTIONS
   EXTRA_SPEC_FUNCTIONS
 #endif
@@ -3073,10 +3092,14 @@ execute (void)
    SWITCH_LIVE to indicate this switch is true in a conditional spec.
    SWITCH_FALSE to indicate this switch is overridden by a later switch.
    SWITCH_IGNORE to indicate this switch should be ignored (used in %<S).
-   SWITCH_IGNORE_PERMANENTLY to indicate this switch should be ignored
+   SWITCH_IGNORE_PERMANENTLY to indicate this switch should be ignored.
+   SWITCH_KEEP_FOR_GCC TODO.
    in all do_spec calls afterwards.  Used for %<S from self specs.
-   The `validated' field is nonzero if any spec has looked at this switch;
-   if it remains zero at the end of the run, it must be meaningless.  */
+   The `known' field describes whether this is an internal switch.
+   The `validated' field describes whether any spec has looked at this switch;
+   if it remains false at the end of the run, the switch must be meaningless.
+   The `ordering' field is used to temporarily mark switches that have to be
+   kept in a specific order.  */
 
 #define SWITCH_LIVE    			(1 << 0)
 #define SWITCH_FALSE   			(1 << 1)
@@ -3092,6 +3115,7 @@ struct switchstr
   bool known;
   bool validated;
   bool ordering;
+  unsigned int lang_mask;
 };
 
 static struct switchstr *switches;
@@ -3370,8 +3394,9 @@ alloc_switch (void)
 
 static void
 save_switch (const char *opt, size_t n_args, const char *const *args,
-	     bool validated, bool known)
+	     bool validated, bool known, unsigned int lang_mask)
 {
+  FPRINTF(stderr, "%s (%s, 0x%x)\n", __FUNCTION__, opt, lang_mask);
   alloc_switch ();
   switches[n_switches].part1 = opt + 1;
   if (n_args == 0)
@@ -3387,6 +3412,7 @@ save_switch (const char *opt, size_t n_args, const char *const *args,
   switches[n_switches].validated = validated;
   switches[n_switches].known = known;
   switches[n_switches].ordering = 0;
+  switches[n_switches].lang_mask = lang_mask;
   n_switches++;
 }
 
@@ -3397,6 +3423,7 @@ static bool
 driver_unknown_option_callback (const struct cl_decoded_option *decoded)
 {
   const char *opt = decoded->arg;
+  FPRINTF(stderr, "%s: %s\n", __FUNCTION__, opt);
   if (opt[1] == 'W' && opt[2] == 'n' && opt[3] == 'o' && opt[4] == '-'
       && !(decoded->errors & CL_ERR_NEGATIVE))
     {
@@ -3404,7 +3431,8 @@ driver_unknown_option_callback (const struct cl_decoded_option *decoded)
 	 diagnosed only if there are warnings.  */
       save_switch (decoded->canonical_option[0],
 		   decoded->canonical_option_num_elements - 1,
-		   &decoded->canonical_option[1], false, true);
+		   &decoded->canonical_option[1], false, true,
+		   /* TODO */ cl_options[decoded->opt_index].flags);
       return false;
     }
   if (decoded->opt_index == OPT_SPECIAL_unknown)
@@ -3412,7 +3440,8 @@ driver_unknown_option_callback (const struct cl_decoded_option *decoded)
       /* Give it a chance to define it a spec file.  */
       save_switch (decoded->canonical_option[0],
 		   decoded->canonical_option_num_elements - 1,
-		   &decoded->canonical_option[1], false, false);
+		   &decoded->canonical_option[1], false, false,
+		   /* TODO */ cl_options[decoded->opt_index].flags);
       return false;
     }
   else
@@ -3433,13 +3462,20 @@ driver_wrong_lang_callback (const struct cl_decoded_option *decoded,
      options.  */
   const struct cl_option *option = &cl_options[decoded->opt_index];
 
+  FPRINTF(stderr, "%s: %s\n", __FUNCTION__, decoded->orig_option_with_args_text);
+  FPRINTF(stderr, "  D->opt_index: %d\n", decoded->opt_index);
+  FPRINTF(stderr, "  O->back_chain: %d\n", option->back_chain);
+  if (option->back_chain != N_OPTS)
+    FPRINTF(stderr, "    %s\n", cl_options[option->back_chain].opt_text);
+  FPRINTF(stderr, "  O->flags: 0x%x\n", option->flags);
   if (option->cl_reject_driver)
     error ("unrecognized command line option %qs",
 	   decoded->orig_option_with_args_text);
   else
     save_switch (decoded->canonical_option[0],
 		 decoded->canonical_option_num_elements - 1,
-		 &decoded->canonical_option[1], false, true);
+		 &decoded->canonical_option[1], false, true,
+		 /* TODO */ /* lang_mask */ option->flags);
 }
 
 static const char *spec_lang = 0;
@@ -3450,6 +3486,8 @@ static int last_language_n_infiles;
 static void
 handle_foffload_option (const char *arg)
 {
+  FPRINTF(stderr, "%s (\"%s\")\n", __FUNCTION__, arg);
+
   const char *c, *cur, *n, *next, *end;
   char *target;
 
@@ -3689,7 +3727,8 @@ driver_handle_option (struct gcc_options *opts,
 	compare_debug_opt = NULL;
       else
 	compare_debug_opt = arg;
-      save_switch (compare_debug_replacement_opt, 0, NULL, validated, true);
+      save_switch (compare_debug_replacement_opt, 0, NULL, validated, true,
+		   /* TODO */ /* lang_mask */ cl_options[opt_index].flags);
       return true;
 
     case OPT_fdiagnostics_color_:
@@ -3783,12 +3822,14 @@ driver_handle_option (struct gcc_options *opts,
     case OPT_L:
       /* Similarly, canonicalize -L for linkers that may not accept
 	 separate arguments.  */
-      save_switch (concat ("-L", arg, NULL), 0, NULL, validated, true);
+      save_switch (concat ("-L", arg, NULL), 0, NULL, validated, true,
+		   /* TODO */ /* lang_mask */ cl_options[opt_index].flags);
       return true;
 
     case OPT_F:
       /* Likewise -F.  */
-      save_switch (concat ("-F", arg, NULL), 0, NULL, validated, true);
+      save_switch (concat ("-F", arg, NULL), 0, NULL, validated, true,
+		   /* TODO */ /* lang_mask */ cl_options[opt_index].flags);
       return true;
 
     case OPT_save_temps:
@@ -3911,7 +3952,8 @@ driver_handle_option (struct gcc_options *opts,
       save_temps_prefix = xstrdup (arg);
       /* On some systems, ld cannot handle "-o" without a space.  So
 	 split the option from its argument.  */
-      save_switch ("-o", 1, &arg, validated, true);
+      save_switch ("-o", 1, &arg, validated, true,
+		   /* TODO */ /* lang_mask */ cl_options[opt_index].flags);
       return true;
 
 #ifdef ENABLE_DEFAULT_PIE
@@ -3945,9 +3987,13 @@ driver_handle_option (struct gcc_options *opts,
     }
 
   if (do_save)
+    {
+        FPRINTF(stderr, "%s: %s\n", __FUNCTION__, decoded->orig_option_with_args_text);
     save_switch (decoded->canonical_option[0],
 		 decoded->canonical_option_num_elements - 1,
-		 &decoded->canonical_option[1], validated, true);
+		 &decoded->canonical_option[1], validated, true,
+		 /* TODO */ /* lang_mask */ cl_options[opt_index].flags);
+    }
   return true;
 }
 
@@ -4386,7 +4432,8 @@ process_command (unsigned int decoded_options_count,
   if (compare_debug == 2 || compare_debug == 3)
     {
       const char *opt = concat ("-fcompare-debug=", compare_debug_opt, NULL);
-      save_switch (opt, 0, NULL, false, true);
+      save_switch (opt, 0, NULL, false, true,
+		   /* TODO */ cl_options[OPT_fcompare_debug_].flags);
       compare_debug = 1;
     }
 
@@ -4779,7 +4826,8 @@ do_self_spec (const char *spec)
 	      save_switch (decoded_options[j].canonical_option[0],
 			   (decoded_options[j].canonical_option_num_elements
 			    - 1),
-			   &decoded_options[j].canonical_option[1], false, true);
+			   &decoded_options[j].canonical_option[1], false, true,
+			   /* TODO */ cl_options[decoded_options[j].opt_index].flags);
 	      break;
 
 	    default:
@@ -6010,6 +6058,7 @@ mark_matching_switches (const char *atom, const char *end_atom, int starred)
 static inline void
 process_marked_switches (void)
 {
+  //FPRINTF(stderr, "%s\n", __FUNCTION__);
   int i;
 
   for (i = 0; i < n_switches; i++)
@@ -6204,6 +6253,7 @@ static const char *
 process_brace_body (const char *p, const char *atom, const char *end_atom,
 		    int starred, int matched)
 {
+  //FPRINTF(stderr, "%s(\"%s\")\n", __FUNCTION__, p);
   const char *body, *end_body;
   unsigned int nesting_level;
   bool have_subst     = false;
@@ -6269,6 +6319,7 @@ process_brace_body (const char *p, const char *atom, const char *end_atom,
 	      }
 	}
     }
+  //FPRINTF(stderr, "%s\n", __FUNCTION__);
 
   return p;
 
@@ -6368,6 +6419,35 @@ check_live_switch (int switchnum, int prefix_length)
 static void
 give_switch (int switchnum, int omit_first_word)
 {
+  FPRINTF(stderr, "%s (%d, %d)\n", __FUNCTION__, switchnum, omit_first_word);
+  int lang_mask = switches[switchnum].lang_mask & ((1U << cl_lang_count) - 1);
+  FPRINTF(stderr, "  -%s 0x%x (0x%x)\n",
+	  switches[switchnum].part1, switches[switchnum].lang_mask, lang_mask);
+  unsigned int lang_mask_accept = (1U << cl_lang_count) - 1;
+  //gcc_assert (input_file_compiler);
+  /* TODO: It seems to work, but I'm not concinved that looking at
+     infiles[input_file_number] here is actually correct; that it will always
+     be correctly set up here, that is, will always point to the infile we're
+     currently evaluating specs for.  So we need to have some way of knowing
+     what our current "specs processing context" is.  */
+  /* TODO: hard-coding "cpp-output-with-lang-complain-none" and CL_C here is
+     ugly.  We should instead have a new field in struct infile, or something
+     like that.  */
+  if (infiles[input_file_number].language
+      && strcmp (infiles[input_file_number].language,
+		 "cpp-output-with-lang-complain-none") == 0)
+    lang_mask_accept = CL_C;
+  FPRINTF(stderr, "  %s: lang_mask_accept=0x%x\n", infiles[input_file_number].language, lang_mask_accept);
+  /* TODO: now we know whether a switch is not specific to a language (keep),
+     specific to languages but including the one we're interested in (keep), or
+     not (drop).  */
+  if (lang_mask != 0
+      && !(lang_mask & lang_mask_accept))
+    {
+      FPRINTF(stderr, "  dropped\n");
+      return;
+    }
+
   if ((switches[switchnum].live_cond & SWITCH_IGNORE) != 0)
     return;
 
@@ -7162,6 +7242,8 @@ driver::build_multilib_strings () const
 void
 driver::set_up_specs () const
 {
+  FPRINTF(stderr, "%s\n", __FUNCTION__);
+
   const char *spec_machine_suffix;
   char *specs_file;
   size_t i;
@@ -7448,22 +7530,16 @@ driver::maybe_putenv_COLLECT_LTO_WRAPPER () const
 void
 driver::maybe_putenv_OFFLOAD_TARGETS () const
 {
-  const char *targets = offload_targets;
+  FPRINTF(stderr, "OFFLOAD_TARGETS = %s\n", offload_targets);
 
-  /* If no targets specified by -foffload, use all available targets.  */
-  if (!targets)
-    targets = OFFLOAD_TARGETS;
-
-  if (strlen (targets) > 0)
+  if (offload_targets && offload_targets[0] != '\0')
     {
       obstack_grow (&collect_obstack, "OFFLOAD_TARGET_NAMES=",
 		    sizeof ("OFFLOAD_TARGET_NAMES=") - 1);
-      obstack_grow (&collect_obstack, targets,
-		    strlen (targets) + 1);
+      obstack_grow (&collect_obstack, offload_targets,
+		    strlen (offload_targets) + 1);
       xputenv (XOBFINISH (&collect_obstack, char *));
     }
-
-  free (offload_targets);
 }
 
 /* Reject switches that no pass was interested in.  */
@@ -7767,6 +7843,8 @@ driver::do_spec_on_infiles () const
 		  debug_check_temp_file[1] = NULL;
 		}
 
+	      FPRINTF(stderr, "%s: %s\n", __FUNCTION__, gcc_input_filename);
+	      FPRINTF(stderr, "  %s\n", input_file_compiler->spec);
 	      value = do_spec (input_file_compiler->spec);
 	      infiles[i].compiled = true;
 	      if (value < 0)
@@ -9507,6 +9585,78 @@ greater_than_spec_func (int argc, const char **argv)
   return NULL;
 }
 
+/* If applicable, generate a C source file containing a constructor call to
+   GOMP_set_offload_targets, to inform libgomp which offload targets have
+   actually been requested (-foffload=[...]), and adds that as an infile.  */
+
+static const char *
+add_omp_infile_spec_func (int argc, const char **)
+{
+  gcc_assert (argc == 0);
+  gcc_assert (offload_targets != NULL);
+
+  /* Nothing to do if we're not actually linking.  */
+  if (have_c)
+    return NULL;
+
+  int err;
+  const char *tmp_filename;
+  tmp_filename = make_temp_file (".c");
+  FPRINTF(stderr, "%s: %s\n", __FUNCTION__, tmp_filename);
+  record_temp_file (tmp_filename, !save_temps_flag, !save_temps_flag);
+  FILE *f = fopen (tmp_filename, "w");
+  if (f == NULL)
+    fatal_error (input_location,
+		 "could not open temporary file %s", tmp_filename);
+  /* As libgomp uses constructors internally, and this code is only added when
+     linking against libgomp, it is fine to use a constructor here.  */
+  err = fprintf (f,
+		 "extern void GOMP_set_offload_targets (const char *);\n"
+		 "static __attribute__ ((constructor)) void\n"
+		 "init (void)\n"
+		 "{\n"
+		 "  GOMP_set_offload_targets (\"%s\");\n"
+		 "}\n",
+		 offload_targets);
+  if (err < 0)
+    fatal_error (input_location,
+		 "could not write to temporary file %s", tmp_filename);
+  err = fclose (f);
+  if (err == EOF)
+    fatal_error (input_location,
+		 "could not close temporary file %s", tmp_filename);
+
+  //TODO: correct thing to do?
+#if 1
+  add_infile (tmp_filename, "cpp-output-with-lang-complain-none");
+  return NULL;
+#elif 0
+  return tmp_filename;
+#elif 0
+  store_arg ("-x", 0, 0);
+  store_arg ("cpp-output", 0, 0);
+  store_arg (tmp_filename, 0, 0);
+  return NULL;
+#else
+  //add_infile (tmp_filename, /* TODO */ "cpp-output");
+  //int i = n_infiles - 1;
+  //input_file_number = i;
+  set_input (tmp_filename);
+  //outfiles[i] = gcc_input_filename;
+  input_file_compiler
+    = lookup_compiler (gcc_input_filename, input_filename_length,
+		       "cpp-output");
+  err = do_spec (input_file_compiler->spec);
+  //infiles[i].compiled = true;
+  if (err < 0)
+    {
+      delete_failure_queue ();
+      errorcount++;
+    }
+  clear_failure_queue ();
+#endif
+}
+
 /* Insert backslash before spaces in ORIG (usually a file path), to 
    avoid being broken by spec parser.
 
diff --git libgomp/config.h.in libgomp/config.h.in
index 8533f03..d9d5914 100644
--- libgomp/config.h.in
+++ libgomp/config.h.in
@@ -24,6 +24,12 @@
 /* Define to 1 if you have the <dlfcn.h> header file. */
 #undef HAVE_DLFCN_H
 
+/* Define to 1 if you have the `fnmatch' function. */
+#undef HAVE_FNMATCH
+
+/* Define to 1 if you have the <fnmatch.h> header file. */
+#undef HAVE_FNMATCH_H
+
 /* Define to 1 if you have the `getloadavg' function. */
 #undef HAVE_GETLOADAVG
 
@@ -95,7 +101,7 @@
    */
 #undef LT_OBJDIR
 
-/* Define to hold the list of target names suitable for offloading. */
+/* Define to hold the list of offload targets, separated by colons. */
 #undef OFFLOAD_TARGETS
 
 /* Name of package */
diff --git libgomp/configure libgomp/configure
index c93e877..3d990bb 100755
--- libgomp/configure
+++ libgomp/configure
@@ -15119,6 +15119,33 @@ esac
 offload_targets=
 
 plugin_support=yes
+for ac_header in fnmatch.h
+do :
+  ac_fn_c_check_header_mongrel "$LINENO" "fnmatch.h" "ac_cv_header_fnmatch_h" "$ac_includes_default"
+if test "x$ac_cv_header_fnmatch_h" = x""yes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_FNMATCH_H 1
+_ACEOF
+
+else
+  plugin_support=no
+fi
+
+done
+
+for ac_func in fnmatch
+do :
+  ac_fn_c_check_func "$LINENO" "fnmatch" "ac_cv_func_fnmatch"
+if test "x$ac_cv_func_fnmatch" = x""yes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_FNMATCH 1
+_ACEOF
+
+else
+  plugin_support=no
+fi
+done
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for dlsym in -ldl" >&5
 $as_echo_n "checking for dlsym in -ldl... " >&6; }
 if test "${ac_cv_lib_dl_dlsym+set}" = set; then :
@@ -15236,10 +15263,8 @@ if test x"$enable_offload_targets" != x; then
     tgt=`echo $tgt | sed 's/=.*//'`
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
-	tgt_name=intelmic
 	;;
       nvptx*)
-        tgt_name=nvptx
 	PLUGIN_NVPTX=$tgt
 	PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
 	PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
@@ -15282,9 +15307,9 @@ rm -f core conftest.err conftest.$ac_objext \
 	;;
     esac
     if test x"$offload_targets" = x; then
-      offload_targets=$tgt_name
+      offload_targets=$tgt
     else
-      offload_targets=$offload_targets,$tgt_name
+      offload_targets=$offload_targets:$tgt
     fi
     if test x"$tgt_dir" != x; then
       offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
diff --git libgomp/env.c libgomp/env.c
index 1811bf5..6b5e963 100644
--- libgomp/env.c
+++ libgomp/env.c
@@ -1175,11 +1175,7 @@ handle_omp_display_env (unsigned long stacksize, int wait_policy)
 }
 
 
-/* TODO.  See testsuite/lib/libgomp.exp:libgomp_init.  */
-#if 0
-static
-#endif
-void __attribute__((constructor))
+static void __attribute__((constructor))
 initialize_env (void)
 {
   unsigned long thread_limit_var, stacksize;
diff --git libgomp/libgomp.h libgomp/libgomp.h
index 420a525..1f95906 100644
--- libgomp/libgomp.h
+++ libgomp/libgomp.h
@@ -785,6 +785,7 @@ extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
 extern void gomp_fini_device (struct gomp_device_descr *);
 extern void gomp_unload_device (struct gomp_device_descr *);
+extern bool gomp_offload_target_available_p (int);
 
 /* work.c */
 
diff --git libgomp/libgomp.map libgomp/libgomp.map
index 36932d3..05e75fb0 100644
--- libgomp/libgomp.map
+++ libgomp/libgomp.map
@@ -339,6 +339,7 @@ GOACC_2.0.GOMP_4_BRANCH {
 	GOACC_get_ganglocal_ptr;
 	GOACC_parallel_keyed;
 	GOACC_register_static;
+	GOMP_set_offload_targets;
 } GOACC_2.0;
 
 GOMP_PLUGIN_1.0 {
@@ -352,9 +353,3 @@ GOMP_PLUGIN_1.0 {
 	GOMP_PLUGIN_async_unmap_vars;
 	GOMP_PLUGIN_acc_thread;
 };
-
-# TODO.  See testsuite/lib/libgomp.exp:libgomp_init.
-INTERNAL {
-  global:
-	initialize_env;
-};
diff --git libgomp/libgomp_g.h libgomp/libgomp_g.h
index e65b888..e67fc86 100644
--- libgomp/libgomp_g.h
+++ libgomp/libgomp_g.h
@@ -206,6 +206,7 @@ extern void GOMP_single_copy_end (void *);
 
 /* target.c */
 
+extern void GOMP_set_offload_targets (const char *);
 extern void GOMP_target (int, void (*) (void *), const void *,
 			 size_t, void **, size_t *, unsigned char *);
 extern void GOMP_target_data (int, const void *,
diff --git libgomp/oacc-init.c libgomp/oacc-init.c
index 8dfe4a7..78f130c 100644
--- libgomp/oacc-init.c
+++ libgomp/oacc-init.c
@@ -122,7 +122,9 @@ resolve_device (acc_device_t d, bool fail_is_error)
       {
 	if (goacc_device_type)
 	  {
-	    /* Lookup the named device.  */
+	    /* Lookup the device that has been explicitly named, so do not pay
+	       attention to gomp_offload_target_available_p.  (That is, hard
+	       error if not actually available.)  */
 	    while (++d != _ACC_device_hwm)
 	      if (dispatchers[d]
 		  && !strcasecmp (goacc_device_type,
@@ -148,8 +150,14 @@ resolve_device (acc_device_t d, bool fail_is_error)
     case acc_device_not_host:
       /* Find the first available device after acc_device_not_host.  */
       while (++d != _ACC_device_hwm)
-	if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0)
+	if (dispatchers[d]
+	    && dispatchers[d]->get_num_devices_func () > 0
+	    /* No device has been explicitly named, so pay attention to
+	       gomp_offload_target_available_p, to not decide on an offload
+	       target that we don't have offload data available for.  */
+	    && gomp_offload_target_available_p (dispatchers[d]->type))
 	  goto found;
+      /* No non-host device found.  */
       if (d_arg == acc_device_default)
 	{
 	  d = acc_device_host;
@@ -164,9 +172,6 @@ resolve_device (acc_device_t d, bool fail_is_error)
         return NULL;
       break;
 
-    case acc_device_host:
-      break;
-
     default:
       if (d > _ACC_device_hwm)
 	{
@@ -181,7 +186,8 @@ resolve_device (acc_device_t d, bool fail_is_error)
 
   assert (d != acc_device_none
 	  && d != acc_device_default
-	  && d != acc_device_not_host);
+	  && d != acc_device_not_host
+	  && d < _ACC_device_hwm);
 
   if (dispatchers[d] == NULL && fail_is_error)
     {
diff --git libgomp/plugin/configfrag.ac libgomp/plugin/configfrag.ac
index 8c2a420..e2392e1 100644
--- libgomp/plugin/configfrag.ac
+++ libgomp/plugin/configfrag.ac
@@ -29,6 +29,8 @@
 offload_targets=
 AC_SUBST(offload_targets)
 plugin_support=yes
+AC_CHECK_HEADERS([fnmatch.h], , [plugin_support=no])
+AC_CHECK_FUNCS([fnmatch], , [plugin_support=no])
 AC_CHECK_LIB(dl, dlsym, , [plugin_support=no])
 if test x"$plugin_support" = xyes; then
   AC_DEFINE(PLUGIN_SUPPORT, 1,
@@ -92,10 +94,8 @@ if test x"$enable_offload_targets" != x; then
     tgt=`echo $tgt | sed 's/=.*//'`
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
-	tgt_name=intelmic
 	;;
       nvptx*)
-        tgt_name=nvptx
 	PLUGIN_NVPTX=$tgt
 	PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
 	PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
@@ -127,9 +127,9 @@ if test x"$enable_offload_targets" != x; then
 	;;
     esac
     if test x"$offload_targets" = x; then
-      offload_targets=$tgt_name
+      offload_targets=$tgt
     else
-      offload_targets=$offload_targets,$tgt_name
+      offload_targets=$offload_targets:$tgt
     fi
     if test x"$tgt_dir" != x; then
       offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
@@ -141,7 +141,7 @@ if test x"$enable_offload_targets" != x; then
   done
 fi
 AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets",
-  [Define to hold the list of target names suitable for offloading.])
+  [Define to hold the list of offload targets, separated by colons.])
 AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
   [Define to 1 if the NVIDIA plugin is built, 0 if not.])
diff --git libgomp/target.c libgomp/target.c
index 6426254..9cf5251 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -41,6 +41,7 @@
 
 #ifdef PLUGIN_SUPPORT
 #include <dlfcn.h>
+#include <fnmatch.h>
 #include "plugin-suffix.h"
 #endif
 
@@ -122,17 +123,26 @@ gomp_get_num_devices (void)
 }
 
 static struct gomp_device_descr *
-resolve_device (int device_id)
+resolve_device (int device)
 {
-  if (device_id == GOMP_DEVICE_ICV)
+  int device_id;
+  if (device == GOMP_DEVICE_ICV)
     {
       struct gomp_task_icv *icv = gomp_icv (false);
       device_id = icv->default_device_var;
     }
+  else
+    device_id = device;
 
   if (device_id < 0 || device_id >= gomp_get_num_devices ())
     return NULL;
 
+  /* If the device-var ICV does not actually have offload data available, don't
+     try use it (which will fail), and use host fallback instead.  */
+  if (device == GOMP_DEVICE_ICV
+      && !gomp_offload_target_available_p (devices[device_id].type))
+    return NULL;
+
   return &devices[device_id];
 }
 
@@ -947,6 +957,49 @@ gomp_fini_device (struct gomp_device_descr *devicep)
   devicep->is_initialized = false;
 }
 
+/* Do we have offload data available for the given offload target type?
+   Instead of verifying that *all* offload data is available that could
+   possibly be required, we instead just look for *any*.  If we later find any
+   offload data missing, that's user error.  */
+
+attribute_hidden bool
+gomp_offload_target_available_p (int type)
+{
+  bool available = false;
+
+  /* Has the offload target already been initialized?  */
+  for (int i = 0; !available && i < num_devices; i++)
+    {
+      struct gomp_device_descr *devicep = &devices[i];
+      gomp_mutex_lock (&devicep->lock);
+      if (devicep->type == type && devicep->is_initialized)
+	available = true;
+      gomp_mutex_unlock (&devicep->lock);
+    }
+
+  if (!available)
+    {
+      // TODO: locking correct?
+      gomp_mutex_lock (&register_lock);
+
+      /* If there is no offload data available at all, we cannot later fail to
+	 find any of it for a specific offload target.  This is the case where
+	 there are no offloaded code regions in user code, but there can still
+	 be executable directives used, or runtime library calls made.  */
+      if (num_offload_images == 0)
+	available = true;
+
+      /* Can the offload target be initialized?  */
+      for (int i = 0; !available && i < num_offload_images; i++)
+	if (offload_images[i].type == type)
+	  available = true;
+
+      gomp_mutex_unlock (&register_lock);
+    }
+
+  return available;
+}
+
 /* Called when encountering a target directive.  If DEVICE
    is GOMP_DEVICE_ICV, it means use device-var ICV.  If it is
    GOMP_DEVICE_HOST_FALLBACK (or any value
@@ -1116,6 +1169,8 @@ static bool
 gomp_load_plugin_for_device (struct gomp_device_descr *device,
 			     const char *plugin_name)
 {
+  gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, plugin_name);
+
   const char *err = NULL, *last_missing = NULL;
 
   void *plugin_handle = dlopen (plugin_name, RTLD_LAZY);
@@ -1212,6 +1267,38 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   return 0;
 }
 
+/* Helper, to translate from an offload target to the corresponding plugin name.  */
+
+static const char *
+offload_target_to_plugin_name (const char *offload_target)
+{
+  if (fnmatch ("*-intelmic*", offload_target, 0) == 0)
+    return "intelmic";
+  if (fnmatch ("nvptx*", offload_target, 0) == 0)
+    return "nvptx";
+  gomp_fatal ("Unknown offload target: %s", offload_target);
+}
+
+/* List of offload targets, separated by colon.  Defaults to the list
+   determined when configuring libgomp.  */
+static const char *gomp_offload_targets = OFFLOAD_TARGETS;
+
+/* Override the list of offload targets.  This must be called early, and only
+   once.  */
+
+void
+GOMP_set_offload_targets (const char *offload_targets)
+{
+  gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, offload_targets);
+
+  //TODO: any locking?
+  /* Make sure this gets called early.  */
+  assert (gomp_is_initialized == PTHREAD_ONCE_INIT);
+  ///* Make sure this only gets called once.  */
+  //assert (gomp_offload_targets == OFFLOAD_TARGETS);
+  gomp_offload_targets = offload_targets;
+}
+
 /* This function initializes the runtime needed for offloading.
    It parses the list of offload targets and tries to load the plugins for
    these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -1219,11 +1306,12 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
    corresponding devices, first the GOMP_OFFLOAD_CAP_OPENMP_400 ones, follows
    by the others.  */
 
+static const char *gomp_plugin_prefix ="libgomp-plugin-";
+static const char *gomp_plugin_suffix = SONAME_SUFFIX (1);
+
 static void
 gomp_target_init (void)
 {
-  const char *prefix ="libgomp-plugin-";
-  const char *suffix = SONAME_SUFFIX (1);
   const char *cur, *next;
   char *plugin_name;
   int i, new_num_devices;
@@ -1231,48 +1319,60 @@ gomp_target_init (void)
   num_devices = 0;
   devices = NULL;
 
-  cur = OFFLOAD_TARGETS;
+  cur = gomp_offload_targets;
   if (*cur)
     do
       {
+	next = strchr (cur, ':');
+	/* If no other offload target following...  */
+	if (next == NULL)
+	  /* ..., point to the terminating NUL character.  */
+	  next = cur + strlen (cur);
+
+	size_t gomp_plugin_prefix_len = strlen (gomp_plugin_prefix);
+	size_t cur_len = next - cur;
+	size_t gomp_plugin_suffix_len = strlen (gomp_plugin_suffix);
+	plugin_name = gomp_malloc (gomp_plugin_prefix_len
+				   + cur_len
+				   + gomp_plugin_suffix_len
+				   + 1);
+	memcpy (plugin_name, gomp_plugin_prefix, gomp_plugin_prefix_len);
+	memcpy (plugin_name + gomp_plugin_prefix_len, cur, cur_len);
+	/* NUL-terminate the string here...  */
+	plugin_name[gomp_plugin_prefix_len + cur_len] = '\0';
+	/* ..., so that we can then use it to translate the offload target to
+	   the plugin name...  */
+	const char *cur_plugin_name
+	  = offload_target_to_plugin_name (plugin_name
+					   + gomp_plugin_prefix_len);
+	size_t cur_plugin_name_len = strlen (cur_plugin_name);
+	assert (cur_plugin_name_len <= cur_len);
+	/* ..., and then rewrite it.  */
+	memcpy (plugin_name + gomp_plugin_prefix_len,
+		cur_plugin_name, cur_plugin_name_len);
+	memcpy (plugin_name + gomp_plugin_prefix_len + cur_plugin_name_len,
+		gomp_plugin_suffix, gomp_plugin_suffix_len);
+	plugin_name[gomp_plugin_prefix_len
+		    + cur_plugin_name_len
+		    + gomp_plugin_suffix_len] = '\0';
+
 	struct gomp_device_descr current_device;
-
-	next = strchr (cur, ',');
-
-	plugin_name = (char *) malloc (1 + (next ? next - cur : strlen (cur))
-				       + strlen (prefix) + strlen (suffix));
-	if (!plugin_name)
-	  {
-	    num_devices = 0;
-	    break;
-	  }
-
-	strcpy (plugin_name, prefix);
-	strncat (plugin_name, cur, next ? next - cur : strlen (cur));
-	strcat (plugin_name, suffix);
-
 	if (gomp_load_plugin_for_device (&current_device, plugin_name))
 	  {
 	    new_num_devices = current_device.get_num_devices_func ();
 	    if (new_num_devices >= 1)
 	      {
-		/* Augment DEVICES and NUM_DEVICES.  */
-
-		devices = realloc (devices, (num_devices + new_num_devices)
-				   * sizeof (struct gomp_device_descr));
-		if (!devices)
-		  {
-		    num_devices = 0;
-		    free (plugin_name);
-		    break;
-		  }
-
 		current_device.name = current_device.get_name_func ();
 		/* current_device.capabilities has already been set.  */
 		current_device.type = current_device.get_type_func ();
 		current_device.mem_map.root = NULL;
 		current_device.is_initialized = false;
 		current_device.openacc.data_environ = NULL;
+
+		/* Augment DEVICES and NUM_DEVICES.  */
+		devices = gomp_realloc (devices,
+					((num_devices + new_num_devices)
+					 * sizeof (struct gomp_device_descr)));
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.target_id = i;
@@ -1286,18 +1386,12 @@ gomp_target_init (void)
 	free (plugin_name);
 	cur = next + 1;
       }
-    while (next);
+    while (*next);
 
   /* In DEVICES, sort the GOMP_OFFLOAD_CAP_OPENMP_400 ones first, and set
      NUM_DEVICES_OPENMP.  */
   struct gomp_device_descr *devices_s
-    = malloc (num_devices * sizeof (struct gomp_device_descr));
-  if (!devices_s)
-    {
-      num_devices = 0;
-      free (devices);
-      devices = NULL;
-    }
+    = gomp_malloc (num_devices * sizeof (struct gomp_device_descr));
   num_devices_openmp = 0;
   for (i = 0; i < num_devices; i++)
     if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
diff --git libgomp/testsuite/lib/libgomp.exp libgomp/testsuite/lib/libgomp.exp
index 33d1a54..898dfc3 100644
--- libgomp/testsuite/lib/libgomp.exp
+++ libgomp/testsuite/lib/libgomp.exp
@@ -36,24 +36,21 @@ load_gcc_lib fortran-modules.exp
 load_file libgomp-test-support.exp
 
 # Populate offload_targets_s (offloading targets separated by a space), and
-# offload_targets_s_openacc (the same, but with OpenACC names; OpenACC spells
-# some of them a little differently).
-set offload_targets_s [split $offload_targets ","]
+# offload_targets_s_openacc (those suitable for OpenACC).
+set offload_targets_s [split $offload_targets ":"]
 set offload_targets_s_openacc {}
 foreach offload_target_openacc $offload_targets_s {
-    switch $offload_target_openacc {
-	intelmic {
+    switch -glob $offload_target_openacc {
+	*-intelmic* {
 	    # TODO.  Skip; will all FAIL because of missing
 	    # GOMP_OFFLOAD_CAP_OPENACC_200.
 	    continue
 	}
-	nvptx {
-	    set offload_target_openacc "nvidia"
-	}
     }
     lappend offload_targets_s_openacc "$offload_target_openacc"
 }
-lappend offload_targets_s_openacc "host"
+# Host fallback.
+lappend offload_targets_s_openacc "disable"
 
 set dg-do-what-default run
 
@@ -134,7 +131,7 @@ proc libgomp_init { args } {
     # Add liboffloadmic build directory in LD_LIBRARY_PATH to support
     # non-fallback testing for Intel MIC targets
     global offload_targets
-    if { [string match "*,intelmic,*" ",$offload_targets,"] } {
+    if { [string match "*:*-intelmic*:*" ":$offload_targets:"] } {
 	append always_ld_library_path ":${blddir}/../liboffloadmic/.libs"
 	append always_ld_library_path ":${blddir}/../liboffloadmic/plugin/.libs"
 	# libstdc++ is required by liboffloadmic
@@ -235,56 +232,6 @@ proc libgomp_init { args } {
     if { $offload_additional_options != "" } {
 	lappend ALWAYS_CFLAGS "additional_flags=${offload_additional_options}"
     }
-
-    # TODO.  Evil hack.  DejaGnu doesn't have a mechanism for setting
-    # environment variables on remote boards.  Thus, we have to fake it, using
-    # GCC's constructor attributes to create object files that install the
-    # desired environment variables.
-    set e_list [list \
-		    [list defaults DUMMY=dummy ] \
-		    [list ACC_DEVICE_TYPE-host ACC_DEVICE_TYPE=host ] \
-		    [list ACC_DEVICE_TYPE-nvidia ACC_DEVICE_TYPE=nvidia ] ]
-    foreach e $e_list {
-	set v [lindex $e 0]
-	set s [lindex $e 1]
-	verbose "creating constructor-setenv: $v: $s"
-	set src constructor-setenv-$v.c
-	set obj constructor-setenv-$v.o
-	set f_src [open $src "w"]
-	puts $f_src "static void __attribute__((constructor(1000)))"
-	puts $f_src "init_env(void) {"
-	puts $f_src "  int putenv(char *);"
-	puts $f_src "  putenv(\"$s\");"
-	puts $f_src "}"
-	if { $v == "defaults" } {
-	    # TODO.  We want libgomp to initialize after the putenv calls.
-	    # But: shared libraries' constructors (and thus
-	    # env.c:initialize_env) will be called before the executable's
-	    # (init_env functions created above), so it will already have been
-	    # initialized (and has to be, in case we're not linking in this
-	    # gunk).  Assuming no execution of other libgomp functionality in
-	    # between (which we're not doing during initialization),
-	    # initialize_env's effects are idempotent when calling it again, so
-	    # we'll do that now, after the putenv calls have been executed.
-	    puts $f_src "static void __attribute__((constructor(1001)))"
-	    puts $f_src "init_libgomp(void) {"
-	    # Some test cases specify -fno-openmp, so libgomp isn't linked in.
-	    puts $f_src "  void initialize_env(void) __attribute__((weak));"
-	    puts $f_src "  if (initialize_env)"
-	    puts $f_src "    initialize_env();"
-	    puts $f_src "}"
-	}
-	close $f_src
-	# TODO.  Using whichever compiler is currently configured...  At least
-	# switch it into C mode.
-	set lines [libgomp_target_compile $src $obj object "additional_flags=-xc"]
-	# TODO.  Error checking.
-	file delete $src
-    }
-    # When adding constructor-setenv-*.o files, make sure to cancel any -x flag
-    # that may have been set before.
-    lappend ALWAYS_CFLAGS "ldflags=-x none"
-    lappend ALWAYS_CFLAGS "ldflags=constructor-setenv-defaults.o"
 }
 
 #
@@ -296,6 +243,7 @@ proc libgomp_target_compile { source dest type options } {
     global libgomp_compile_options
     global gluefile wrap_flags
     global ALWAYS_CFLAGS
+    global GCC_UNDER_TEST
     global lang_test_file
     global lang_library_path
     global lang_link_flags
@@ -323,6 +271,7 @@ proc libgomp_target_compile { source dest type options } {
 
     lappend options "additional_flags=[libio_include_flags]"
     lappend options "timeout=[timeout_value]"
+    lappend options "compiler=$GCC_UNDER_TEST"
 
     set options [concat $libgomp_compile_options $options]
 
@@ -370,7 +319,7 @@ proc check_effective_target_offload_device { } {
 
 proc check_effective_target_openacc_nvidia_accel_supported { } {
     global offload_targets_s_openacc
-    set res [lsearch $offload_targets_s_openacc "nvidia" ]
+    set res [lsearch -glob $offload_targets_s_openacc "nvptx*" ]
     if { $res != -1 } {
 	return 1;
     }
@@ -396,7 +345,7 @@ proc check_effective_target_openacc_nvidia_accel_selected { } {
 	return 0;
     }
     global offload_target_openacc
-    if { $offload_target_openacc == "nvidia" } {
+    if { [string match "nvptx*" $offload_target_openacc] } {
         return 1;
     }
     return 0;
@@ -406,7 +355,7 @@ proc check_effective_target_openacc_nvidia_accel_selected { } {
 
 proc check_effective_target_openacc_host_selected { } {
     global offload_target_openacc
-    if { $offload_target_openacc == "host" } {
+    if { $offload_target_openacc == "disable" } {
         return 1;
     }
     return 0;
diff --git libgomp/testsuite/libgomp.c++/c++.exp libgomp/testsuite/libgomp.c++/c++.exp
index d6d525a..0454f95 100644
--- libgomp/testsuite/libgomp.c++/c++.exp
+++ libgomp/testsuite/libgomp.c++/c++.exp
@@ -4,7 +4,6 @@ load_gcc_lib gcc-dg.exp
 global shlib_ext
 
 set shlib_ext [get_shlib_extension]
-#TODO
 set lang_link_flags "-lstdc++"
 set lang_test_file_found 0
 set lang_library_path "../libstdc++-v3/src/.libs"
@@ -47,13 +46,6 @@ if { $blddir != "" } {
 }
 
 if { $lang_test_file_found } {
-    if ![info exists GXX_UNDER_TEST] then {
-	# TODO.  See libgomp.oacc-c++/c++.exp.
-	set HAVE_SET_GXX_UNDER_TEST ""
-	set GXX_UNDER_TEST "$GCC_UNDER_TEST"
-    }
-    lappend libgomp_compile_options "compiler=$GXX_UNDER_TEST"
-
     # Gather a list of all tests.
     set tests [lsort [find $srcdir/$subdir *.C]]
 
@@ -76,10 +68,5 @@ if { $lang_test_file_found } {
     dg-runtest $tests "" "$libstdcxx_includes $DEFAULT_CFLAGS"
 }
 
-# TODO.  See above.
-if { [info exists HAVE_SET_GXX_UNDER_TEST] } {
-    unset GXX_UNDER_TEST
-}
-
 # All done.
 dg-finish
diff --git libgomp/testsuite/libgomp.c/c.exp libgomp/testsuite/libgomp.c/c.exp
index 25f347b..300b921 100644
--- libgomp/testsuite/libgomp.c/c.exp
+++ libgomp/testsuite/libgomp.c/c.exp
@@ -23,8 +23,6 @@ dg-init
 # Turn on OpenMP.
 lappend ALWAYS_CFLAGS "additional_flags=-fopenmp"
 
-lappend libgomp_compile_options "compiler=$GCC_UNDER_TEST"
-
 # Gather a list of all tests.
 set tests [lsort [find $srcdir/$subdir *.c]]
 
diff --git libgomp/testsuite/libgomp.fortran/fortran.exp libgomp/testsuite/libgomp.fortran/fortran.exp
index 883c416..f684abc 100644
--- libgomp/testsuite/libgomp.fortran/fortran.exp
+++ libgomp/testsuite/libgomp.fortran/fortran.exp
@@ -47,11 +47,6 @@ if { $blddir != "" } {
 }
 
 if { $lang_test_file_found } {
-    if ![info exists GFORTRAN_UNDER_TEST] then {
-	set GFORTRAN_UNDER_TEST $GCC_UNDER_TEST
-    }
-    lappend libgomp_compile_options "compiler=$GFORTRAN_UNDER_TEST"
-
     # Gather a list of all tests.
     set tests [lsort [find $srcdir/$subdir *.\[fF\]{,90,95,03,08}]]
 
diff --git libgomp/testsuite/libgomp.graphite/graphite.exp libgomp/testsuite/libgomp.graphite/graphite.exp
index 716cdc3..d737c85 100644
--- libgomp/testsuite/libgomp.graphite/graphite.exp
+++ libgomp/testsuite/libgomp.graphite/graphite.exp
@@ -48,8 +48,6 @@ dg-init
 # Turn on OpenMP.
 lappend ALWAYS_CFLAGS "additional_flags=-fopenmp"
 
-lappend libgomp_compile_options "compiler=$GCC_UNDER_TEST"
-
 # Gather a list of all tests.
 set tests [lsort [find $srcdir/$subdir *.c]]
 
diff --git libgomp/testsuite/libgomp.oacc-c++/c++.exp libgomp/testsuite/libgomp.oacc-c++/c++.exp
index e5c875c..f513d87 100644
--- libgomp/testsuite/libgomp.oacc-c++/c++.exp
+++ libgomp/testsuite/libgomp.oacc-c++/c++.exp
@@ -13,7 +13,6 @@ load_gcc_lib gcc-dg.exp
 global shlib_ext
 
 set shlib_ext [get_shlib_extension]
-#TODO
 set lang_link_flags "-lstdc++"
 set lang_test_file_found 0
 set lang_library_path "../libstdc++-v3/src/.libs"
@@ -32,6 +31,11 @@ dg-init
 # Turn on OpenACC.
 lappend ALWAYS_CFLAGS "additional_flags=-fopenacc"
 
+# Switch into C++ mode.  Otherwise, the libgomp.oacc-c-c++-common/*.c
+# files would be compiled as C files.
+set SAVE_GCC_UNDER_TEST "$GCC_UNDER_TEST"
+set GCC_UNDER_TEST "$GCC_UNDER_TEST -x c++"
+
 set blddir [lookfor_file [get_multilibs] libgomp]
 
 
@@ -56,14 +60,6 @@ if { $blddir != "" } {
 }
 
 if { $lang_test_file_found } {
-    if ![info exists GXX_UNDER_TEST] then {
-	# Use GCC_UNDER_TEST, but switch into C++ mode, as otherwise the
-	# libgomp.oacc-c-c++-common/*.c files would be compiled as C files.
-	set HAVE_SET_GXX_UNDER_TEST ""
-	set GXX_UNDER_TEST "$GCC_UNDER_TEST -x c++"
-    }
-    lappend libgomp_compile_options "compiler=$GXX_UNDER_TEST"
-
     # Gather a list of all tests.
     set tests [lsort [concat \
 			  [find $srcdir/$subdir *.C] \
@@ -104,17 +100,14 @@ if { $lang_test_file_found } {
     set SAVE_ALWAYS_CFLAGS "$ALWAYS_CFLAGS"
     foreach offload_target_openacc $offload_targets_s_openacc {
 	set ALWAYS_CFLAGS "$SAVE_ALWAYS_CFLAGS"
-	set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
-	# Set $ACC_DEVICE_TYPE.  See the comments in
-	# ../lib/libgomp.exp:libgomp_init.
-	lappend ALWAYS_CFLAGS "ldflags=constructor-setenv-ACC_DEVICE_TYPE-$offload_target_openacc.o"
 
 	# Todo: Determine shared memory or not using run-time test.
-	switch $offload_target_openacc {
-	    host {
+	switch -glob $offload_target_openacc {
+	    disable {
 		set acc_mem_shared 1
+		set tagopt "-DACC_DEVICE_TYPE_host=1"
 	    }
-	    nvidia {
+	    nvptx* {
 		if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		    # Don't bother; execution testing is going to FAIL.
 		    untested "$subdir $offload_target_openacc offloading"
@@ -128,12 +121,14 @@ if { $lang_test_file_found } {
 		lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/libgomp.oacc-c-c++-common"
 
 		set acc_mem_shared 0
+		set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	    }
 	    default {
 		set acc_mem_shared 0
+		#TODO error
 	    }
 	}
-	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
+	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
 	dg-runtest $tests "$tagopt" "$libstdcxx_includes $DEFAULT_CFLAGS"
 	gcc-dg-runtest $ttests "$tagopt" "$libstdcxx_includes"
@@ -141,9 +136,7 @@ if { $lang_test_file_found } {
 }
 
 # See above.
-if { [info exists HAVE_SET_GXX_UNDER_TEST] } {
-    unset GXX_UNDER_TEST
-}
+set GCC_UNDER_TEST "$SAVE_GCC_UNDER_TEST"
 
 unset TORTURE_OPTIONS
 
diff --git libgomp/testsuite/libgomp.oacc-c/c.exp libgomp/testsuite/libgomp.oacc-c/c.exp
index c91a41b..03fe3a4 100644
--- libgomp/testsuite/libgomp.oacc-c/c.exp
+++ libgomp/testsuite/libgomp.oacc-c/c.exp
@@ -32,8 +32,6 @@ dg-init
 # Turn on OpenACC.
 lappend ALWAYS_CFLAGS "additional_flags=-fopenacc"
 
-lappend libgomp_compile_options "compiler=$GCC_UNDER_TEST"
-
 # Gather a list of all tests.
 set tests [lsort [concat \
 		      [find $srcdir/$subdir *.c] \
@@ -62,17 +60,14 @@ set_ld_library_path_env_vars
 set SAVE_ALWAYS_CFLAGS "$ALWAYS_CFLAGS"
 foreach offload_target_openacc $offload_targets_s_openacc {
     set ALWAYS_CFLAGS "$SAVE_ALWAYS_CFLAGS"
-    set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
-    # Set $ACC_DEVICE_TYPE.  See the comments in
-    # ../lib/libgomp.exp:libgomp_init.
-    lappend ALWAYS_CFLAGS "ldflags=constructor-setenv-ACC_DEVICE_TYPE-$offload_target_openacc.o"
 
     # Todo: Determine shared memory or not using run-time test.
-    switch $offload_target_openacc {
-	host {
+    switch -glob $offload_target_openacc {
+	disable {
 	    set acc_mem_shared 1
+	    set tagopt "-DACC_DEVICE_TYPE_host=1"
 	}
-	nvidia {
+	nvptx* {
 	    if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		# Don't bother; execution testing is going to FAIL.
 		untested "$subdir $offload_target_openacc offloading"
@@ -86,12 +81,14 @@ foreach offload_target_openacc $offload_targets_s_openacc {
 	    lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/libgomp.oacc-c-c++-common"
 
 	    set acc_mem_shared 0
+	    set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	}
 	default {
 	    set acc_mem_shared 0
+	    #TODO error
 	}
     }
-    set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
+    set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
     dg-runtest $tests "$tagopt" $DEFAULT_CFLAGS
     gcc-dg-runtest $ttests "$tagopt" ""
diff --git libgomp/testsuite/libgomp.oacc-fortran/fortran.exp libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
index df46004..b9ad6dc 100644
--- libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
+++ libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
@@ -49,11 +49,6 @@ if { $blddir != "" } {
 }
 
 if { $lang_test_file_found } {
-    if ![info exists GFORTRAN_UNDER_TEST] then {
-	set GFORTRAN_UNDER_TEST $GCC_UNDER_TEST
-    }
-    lappend libgomp_compile_options "compiler=$GFORTRAN_UNDER_TEST"
-
     # Gather a list of all tests.
     set tests [lsort [find $srcdir/$subdir *.\[fF\]{,90,95,03,08}]]
 
@@ -74,20 +69,14 @@ if { $lang_test_file_found } {
     set_ld_library_path_env_vars
 
     # Test OpenACC with available accelerators.
-    set SAVE_ALWAYS_CFLAGS "$ALWAYS_CFLAGS"
     foreach offload_target_openacc $offload_targets_s_openacc {
-	set ALWAYS_CFLAGS "$SAVE_ALWAYS_CFLAGS"
-	set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
-	# Set $ACC_DEVICE_TYPE.  See the comments in
-	# ../lib/libgomp.exp:libgomp_init.
-	lappend ALWAYS_CFLAGS "ldflags=constructor-setenv-ACC_DEVICE_TYPE-$offload_target_openacc.o"
-
 	# Todo: Determine shared memory or not using run-time test.
-	switch $offload_target_openacc {
-	    host {
+	switch -glob $offload_target_openacc {
+	    disable {
 		set acc_mem_shared 1
+		set tagopt "-DACC_DEVICE_TYPE_host=1"
 	    }
-	    nvidia {
+	    nvptx* {
 		if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		    # Don't bother; execution testing is going to FAIL.
 		    untested "$subdir $offload_target_openacc offloading"
@@ -95,12 +84,14 @@ if { $lang_test_file_found } {
 		}
 
 		set acc_mem_shared 0
+		set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	    }
 	    default {
 		set acc_mem_shared 0
+		#TODO error
 	    }
 	}
-	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
+	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
 	# For Fortran we're doing torture testing, as Fortran has far more tests
 	# with arrays etc. that testing just -O0 or -O2 is insufficient, that is


Grüße,
 Thomas



[-- Attachment #1.2: foffload.tar.bz2 --]
[-- Type: application/x-bzip-compressed-tar, Size: 23396 bytes --]

[-- Attachment #2: Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
  2015-08-18 16:55                                       ` Thomas Schwinge
@ 2015-08-20 23:38                                         ` Joseph Myers
  2015-08-21 16:13                                           ` Nathan Sidwell
                                                             ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Joseph Myers @ 2015-08-20 23:38 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: GCC Patches, Nathan Sidwell

On Tue, 18 Aug 2015, Thomas Schwinge wrote:

> So, back to modifying the driver; here is my current messy WIP patch with
> still a lot of TODOs in it -- but it appears to work at last.  :-)
> 
> Maybe somebody else is able to continue with that task while I'm out of
> office.  This has been developed on top of gomp-4_0-branch r226832.  I'm
> also attaching a tarball of the even more messy indivdual patches,
> foffload.tar.bz2, in case there's anything to salvage in there, or if
> that helps to understand the development options/history.  Earlier
> messages in this thread should give enough context what this is about,
> <http://news.gmane.org/find-root.php?message_id=%3C87egjopgh0.fsf%40kepler.schwinge.homeip.net%3E>.

This is what I've committed to gomp-4_0-branch, with the driver changes 
substantially cleaned up and smaller changes to the other bits of the 
patch.

gcc:
2015-08-20  Thomas Schwinge  <thomas@codesourcery.com>
	    Joseph Myers  <joseph@codesourcery.com>

	* doc/invoke.texi (-ffixed-@var{reg}): Document conflict with
	Fortran options.
	* gcc.c (offload_targets): Update comment.
	(add_omp_infile_spec_func, spec_lang_mask_accept): New.
	(driver_self_specs) [ENABLE_OFFLOADING]: Add spec to use
	%:add-omp-infile().
	(static_spec_functions): Add add-omp-infile.
	(struct switchstr): Add lang_mask field.  Expand comment.
	(struct infile): Add lang_mask field.
	(add_infile, save_switch, do_spec): Add lang_mask argument.
	(driver_unknown_option_callback, driver_wrong_lang_callback)
	(driver_handle_option, process_command, do_self_spec)
	(driver::do_spec_on_infiles): All callers changed.
	(give_switch): Check languages of switch against
	spec_lang_mask_accept.
	(driver::maybe_putenv_OFFLOAD_TARGETS): Do not use intermediate
	targets variable.
	* gcc.h (do_spec): Update prototype.

fortran:
2015-08-20  Joseph Myers  <joseph@codesourcery.com>

	* gfortranspec.c (lang_specific_pre_link): Update call to do_spec.

java:
2015-08-20  Joseph Myers  <joseph@codesourcery.com>

	* jvspec.c (lang_specific_pre_link): Update call to do_spec.

libgomp:
2015-08-20  Thomas Schwinge  <thomas@codesourcery.com>
	    Joseph Myers  <joseph@codesourcery.com>

	* plugin/configfrag.ac (fnmatch.h): Check for header.
	(fnmatch): Check for function.
	(tgt_name): Do not set.
	(offload_targets): Separate with colons not commas.
	* config.h.in, configure: Regenerate.
	* env.c (initialize_env): Make static.  Remove TODO.
	* libgomp.h (gomp_offload_target_available_p): New prototype.
	* libgomp.map (GOACC_2.0.GOMP_4_BRANCH): Add
	GOMP_set_offload_targets.
	(INTERNAL): Remove.
	* libgomp_g.h (GOMP_set_offload_targets): New prototype.
	* oacc-init.c (resolve_device): Do not handle acc_device_host.
	Add comments.
	* target.c: Include <fnmatch.h>.
	(resolve_device): Use host fallback when offload data not
	available.
	(gomp_offload_target_available_p, offload_target_to_plugin_name)
	(gomp_offload_targets, gomp_offload_targets_init)
	(GOMP_set_offload_targets, gomp_plugin_prefix)
	(gomp_plugin_suffix): New.
	(gomp_load_plugin_for_device): Add gomp_debug call.
	(gomp_target_init): Usegomp_offload_targets instead of
	OFFLOAD_TARGETS.  Handle and rewrie colon-separated string.
	* testsuite/lib/libgomp.exp: Expect offload targets to be
	colon-separated.  Adjust matching of offload targets.  Don't
	generate constructor here.
	(libgomp_target_compile): Use GCC_UNDER_TEST.
	(check_effective_target_openacc_nvidia_accel_supported)
	(check_effective_target_openacc_host_selected): Adjust checks of
	offload target names.
	* testsuite/libgomp.c++/c++.exp: Do not set
	HAVE_SET_GXX_UNDER_TEST or GXX_UNDER_TEST.
	* testsuite/libgomp.c/c.exp: Do not append to
	libgomp_compile_options,
	* testsuite/libgomp.fortran/fortran.exp: Do not set
	GFORTRAN_UNDER_TEST or libgomp_compile_options.
	* testsuite/libgomp.graphite/graphite.exp: Do not append to
	libgomp_compile_options.
	* testsuite/libgomp.oacc-c++/c++.exp: Set SAVE_GCC_UNDER_TEST and
	GCC_UNDER_TEST.  Do not set HAVE_SET_GXX_UNDER_TEST and
	GXX_UNDER_TEST.  Do not append to ALWAYS_CFLAGS.  Adjust set of
	offload targets.  Use -foffload=.
	* testsuite/libgomp.oacc-c/c.exp: Do not append to
	libgomp_compile_options or ALWAYS_CFLAGS.  Adjust set of offload
	targets.  Use -foffload=.
	* testsuite/libgomp.oacc-fortran/fortran.exp: Do not set
	GFORTRAN_UNDER_TEST or append to libgomp_compile_options.  Do not
	append to ALWAYS_CFLAGS.  Adjust set of offload targets.  Use
	-foffload=.

Index: libgomp/plugin/configfrag.ac
===================================================================
--- libgomp/plugin/configfrag.ac	(revision 226979)
+++ libgomp/plugin/configfrag.ac	(working copy)
@@ -29,6 +29,8 @@
 offload_targets=
 AC_SUBST(offload_targets)
 plugin_support=yes
+AC_CHECK_HEADERS([fnmatch.h], , [plugin_support=no])
+AC_CHECK_FUNCS([fnmatch], , [plugin_support=no])
 AC_CHECK_LIB(dl, dlsym, , [plugin_support=no])
 if test x"$plugin_support" = xyes; then
   AC_DEFINE(PLUGIN_SUPPORT, 1,
@@ -92,10 +94,8 @@ if test x"$enable_offload_targets" != x; then
     tgt=`echo $tgt | sed 's/=.*//'`
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
-	tgt_name=intelmic
 	;;
       nvptx*)
-        tgt_name=nvptx
 	PLUGIN_NVPTX=$tgt
 	PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
 	PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
@@ -127,9 +127,9 @@ if test x"$enable_offload_targets" != x; then
 	;;
     esac
     if test x"$offload_targets" = x; then
-      offload_targets=$tgt_name
+      offload_targets=$tgt
     else
-      offload_targets=$offload_targets,$tgt_name
+      offload_targets=$offload_targets:$tgt
     fi
     if test x"$tgt_dir" != x; then
       offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
@@ -141,7 +141,7 @@ if test x"$enable_offload_targets" != x; then
   done
 fi
 AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets",
-  [Define to hold the list of target names suitable for offloading.])
+  [Define to hold the list of offload targets, separated by colons.])
 AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
   [Define to 1 if the NVIDIA plugin is built, 0 if not.])
Index: libgomp/config.h.in
===================================================================
--- libgomp/config.h.in	(revision 226979)
+++ libgomp/config.h.in	(working copy)
@@ -24,6 +24,12 @@
 /* Define to 1 if you have the <dlfcn.h> header file. */
 #undef HAVE_DLFCN_H
 
+/* Define to 1 if you have the `fnmatch' function. */
+#undef HAVE_FNMATCH
+
+/* Define to 1 if you have the <fnmatch.h> header file. */
+#undef HAVE_FNMATCH_H
+
 /* Define to 1 if you have the `getloadavg' function. */
 #undef HAVE_GETLOADAVG
 
@@ -95,7 +101,7 @@
    */
 #undef LT_OBJDIR
 
-/* Define to hold the list of target names suitable for offloading. */
+/* Define to hold the list of offload targets, separated by colons. */
 #undef OFFLOAD_TARGETS
 
 /* Name of package */
Index: libgomp/configure
===================================================================
--- libgomp/configure	(revision 226979)
+++ libgomp/configure	(working copy)
@@ -15119,6 +15119,33 @@ esac
 offload_targets=
 
 plugin_support=yes
+for ac_header in fnmatch.h
+do :
+  ac_fn_c_check_header_mongrel "$LINENO" "fnmatch.h" "ac_cv_header_fnmatch_h" "$ac_includes_default"
+if test "x$ac_cv_header_fnmatch_h" = x""yes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_FNMATCH_H 1
+_ACEOF
+
+else
+  plugin_support=no
+fi
+
+done
+
+for ac_func in fnmatch
+do :
+  ac_fn_c_check_func "$LINENO" "fnmatch" "ac_cv_func_fnmatch"
+if test "x$ac_cv_func_fnmatch" = x""yes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_FNMATCH 1
+_ACEOF
+
+else
+  plugin_support=no
+fi
+done
+
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for dlsym in -ldl" >&5
 $as_echo_n "checking for dlsym in -ldl... " >&6; }
 if test "${ac_cv_lib_dl_dlsym+set}" = set; then :
@@ -15236,10 +15263,8 @@ if test x"$enable_offload_targets" != x; then
     tgt=`echo $tgt | sed 's/=.*//'`
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
-	tgt_name=intelmic
 	;;
       nvptx*)
-        tgt_name=nvptx
 	PLUGIN_NVPTX=$tgt
 	PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
 	PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
@@ -15282,9 +15307,9 @@ rm -f core conftest.err conftest.$ac_objext \
 	;;
     esac
     if test x"$offload_targets" = x; then
-      offload_targets=$tgt_name
+      offload_targets=$tgt
     else
-      offload_targets=$offload_targets,$tgt_name
+      offload_targets=$offload_targets:$tgt
     fi
     if test x"$tgt_dir" != x; then
       offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
Index: libgomp/libgomp_g.h
===================================================================
--- libgomp/libgomp_g.h	(revision 226979)
+++ libgomp/libgomp_g.h	(working copy)
@@ -206,6 +206,7 @@ extern void GOMP_single_copy_end (void *);
 
 /* target.c */
 
+extern void GOMP_set_offload_targets (const char *);
 extern void GOMP_target (int, void (*) (void *), const void *,
 			 size_t, void **, size_t *, unsigned char *);
 extern void GOMP_target_data (int, const void *,
Index: libgomp/oacc-init.c
===================================================================
--- libgomp/oacc-init.c	(revision 226979)
+++ libgomp/oacc-init.c	(working copy)
@@ -122,7 +122,9 @@ resolve_device (acc_device_t d, bool fail_is_error
       {
 	if (goacc_device_type)
 	  {
-	    /* Lookup the named device.  */
+	    /* Lookup the device that has been explicitly named, so do not pay
+	       attention to gomp_offload_target_available_p.  (That is, hard
+	       error if not actually available.)  */
 	    while (++d != _ACC_device_hwm)
 	      if (dispatchers[d]
 		  && !strcasecmp (goacc_device_type,
@@ -148,8 +150,14 @@ resolve_device (acc_device_t d, bool fail_is_error
     case acc_device_not_host:
       /* Find the first available device after acc_device_not_host.  */
       while (++d != _ACC_device_hwm)
-	if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0)
+	if (dispatchers[d]
+	    && dispatchers[d]->get_num_devices_func () > 0
+	    /* No device has been explicitly named, so pay attention to
+	       gomp_offload_target_available_p, to not decide on an offload
+	       target that we don't have offload data available for.  */
+	    && gomp_offload_target_available_p (dispatchers[d]->type))
 	  goto found;
+      /* No non-host device found.  */
       if (d_arg == acc_device_default)
 	{
 	  d = acc_device_host;
@@ -164,9 +172,6 @@ resolve_device (acc_device_t d, bool fail_is_error
         return NULL;
       break;
 
-    case acc_device_host:
-      break;
-
     default:
       if (d > _ACC_device_hwm)
 	{
@@ -181,7 +186,8 @@ resolve_device (acc_device_t d, bool fail_is_error
 
   assert (d != acc_device_none
 	  && d != acc_device_default
-	  && d != acc_device_not_host);
+	  && d != acc_device_not_host
+	  && d < _ACC_device_hwm);
 
   if (dispatchers[d] == NULL && fail_is_error)
     {
Index: libgomp/libgomp.map
===================================================================
--- libgomp/libgomp.map	(revision 226979)
+++ libgomp/libgomp.map	(working copy)
@@ -339,6 +339,7 @@ GOACC_2.0.GOMP_4_BRANCH {
 	GOACC_get_ganglocal_ptr;
 	GOACC_parallel_keyed;
 	GOACC_register_static;
+	GOMP_set_offload_targets;
 } GOACC_2.0;
 
 GOMP_PLUGIN_1.0 {
@@ -352,9 +353,3 @@ GOMP_PLUGIN_1.0 {
 	GOMP_PLUGIN_async_unmap_vars;
 	GOMP_PLUGIN_acc_thread;
 };
-
-# TODO.  See testsuite/lib/libgomp.exp:libgomp_init.
-INTERNAL {
-  global:
-	initialize_env;
-};
Index: libgomp/target.c
===================================================================
--- libgomp/target.c	(revision 226979)
+++ libgomp/target.c	(working copy)
@@ -41,6 +41,7 @@
 
 #ifdef PLUGIN_SUPPORT
 #include <dlfcn.h>
+#include <fnmatch.h>
 #include "plugin-suffix.h"
 #endif
 
@@ -122,17 +123,26 @@ gomp_get_num_devices (void)
 }
 
 static struct gomp_device_descr *
-resolve_device (int device_id)
+resolve_device (int device)
 {
-  if (device_id == GOMP_DEVICE_ICV)
+  int device_id;
+  if (device == GOMP_DEVICE_ICV)
     {
       struct gomp_task_icv *icv = gomp_icv (false);
       device_id = icv->default_device_var;
     }
+  else
+    device_id = device;
 
   if (device_id < 0 || device_id >= gomp_get_num_devices ())
     return NULL;
 
+  /* If the device-var ICV does not actually have offload data available, don't
+     try use it (which will fail), and use host fallback instead.  */
+  if (device == GOMP_DEVICE_ICV
+      && !gomp_offload_target_available_p (devices[device_id].type))
+    return NULL;
+
   return &devices[device_id];
 }
 
@@ -947,6 +957,48 @@ gomp_fini_device (struct gomp_device_descr *device
   devicep->is_initialized = false;
 }
 
+/* Do we have offload data available for the given offload target type?
+   Instead of verifying that *all* offload data is available that could
+   possibly be required, we instead just look for *any*.  If we later find any
+   offload data missing, that's user error.  */
+
+attribute_hidden bool
+gomp_offload_target_available_p (int type)
+{
+  bool available = false;
+
+  /* Has the offload target already been initialized?  */
+  for (int i = 0; !available && i < num_devices; i++)
+    {
+      struct gomp_device_descr *devicep = &devices[i];
+      gomp_mutex_lock (&devicep->lock);
+      if (devicep->type == type && devicep->is_initialized)
+	available = true;
+      gomp_mutex_unlock (&devicep->lock);
+    }
+
+  if (!available)
+    {
+      gomp_mutex_lock (&register_lock);
+
+      /* If there is no offload data available at all, we cannot later fail to
+	 find any of it for a specific offload target.  This is the case where
+	 there are no offloaded code regions in user code, but there can still
+	 be executable directives used, or runtime library calls made.  */
+      if (num_offload_images == 0)
+	available = true;
+
+      /* Can the offload target be initialized?  */
+      for (int i = 0; !available && i < num_offload_images; i++)
+	if (offload_images[i].type == type)
+	  available = true;
+
+      gomp_mutex_unlock (&register_lock);
+    }
+
+  return available;
+}
+
 /* Called when encountering a target directive.  If DEVICE
    is GOMP_DEVICE_ICV, it means use device-var ICV.  If it is
    GOMP_DEVICE_HOST_FALLBACK (or any value
@@ -1116,6 +1168,8 @@ static bool
 gomp_load_plugin_for_device (struct gomp_device_descr *device,
 			     const char *plugin_name)
 {
+  gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, plugin_name);
+
   const char *err = NULL, *last_missing = NULL;
 
   void *plugin_handle = dlopen (plugin_name, RTLD_LAZY);
@@ -1212,6 +1266,39 @@ gomp_load_plugin_for_device (struct gomp_device_de
   return 0;
 }
 
+/* Helper, to translate from an offload target to the corresponding plugin name.  */
+
+static const char *
+offload_target_to_plugin_name (const char *offload_target)
+{
+  if (fnmatch ("*-intelmic*", offload_target, 0) == 0)
+    return "intelmic";
+  if (fnmatch ("nvptx*", offload_target, 0) == 0)
+    return "nvptx";
+  gomp_fatal ("Unknown offload target: %s", offload_target);
+}
+
+/* List of offload targets, separated by colon.  Defaults to the list
+   determined when configuring libgomp.  */
+static const char *gomp_offload_targets = OFFLOAD_TARGETS;
+static bool gomp_offload_targets_init = false;
+
+/* Override the list of offload targets.  This must be called early, and only
+   once.  */
+
+void
+GOMP_set_offload_targets (const char *offload_targets)
+{
+  gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, offload_targets);
+
+  /* Make sure this gets called early.  */
+  assert (gomp_is_initialized == PTHREAD_ONCE_INIT);
+  /* Make sure this only gets called once.  */
+  assert (!gomp_offload_targets_init);
+  gomp_offload_targets_init = true;
+  gomp_offload_targets = offload_targets;
+}
+
 /* This function initializes the runtime needed for offloading.
    It parses the list of offload targets and tries to load the plugins for
    these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -1219,11 +1306,12 @@ gomp_load_plugin_for_device (struct gomp_device_de
    corresponding devices, first the GOMP_OFFLOAD_CAP_OPENMP_400 ones, follows
    by the others.  */
 
+static const char *gomp_plugin_prefix ="libgomp-plugin-";
+static const char *gomp_plugin_suffix = SONAME_SUFFIX (1);
+
 static void
 gomp_target_init (void)
 {
-  const char *prefix ="libgomp-plugin-";
-  const char *suffix = SONAME_SUFFIX (1);
   const char *cur, *next;
   char *plugin_name;
   int i, new_num_devices;
@@ -1231,48 +1319,60 @@ gomp_target_init (void)
   num_devices = 0;
   devices = NULL;
 
-  cur = OFFLOAD_TARGETS;
+  cur = gomp_offload_targets;
   if (*cur)
     do
       {
-	struct gomp_device_descr current_device;
+	next = strchr (cur, ':');
+	/* If no other offload target following...  */
+	if (next == NULL)
+	  /* ..., point to the terminating NUL character.  */
+	  next = cur + strlen (cur);
 
-	next = strchr (cur, ',');
+	size_t gomp_plugin_prefix_len = strlen (gomp_plugin_prefix);
+	size_t cur_len = next - cur;
+	size_t gomp_plugin_suffix_len = strlen (gomp_plugin_suffix);
+	plugin_name = gomp_malloc (gomp_plugin_prefix_len
+				   + cur_len
+				   + gomp_plugin_suffix_len
+				   + 1);
+	memcpy (plugin_name, gomp_plugin_prefix, gomp_plugin_prefix_len);
+	memcpy (plugin_name + gomp_plugin_prefix_len, cur, cur_len);
+	/* NUL-terminate the string here...  */
+	plugin_name[gomp_plugin_prefix_len + cur_len] = '\0';
+	/* ..., so that we can then use it to translate the offload target to
+	   the plugin name...  */
+	const char *cur_plugin_name
+	  = offload_target_to_plugin_name (plugin_name
+					   + gomp_plugin_prefix_len);
+	size_t cur_plugin_name_len = strlen (cur_plugin_name);
+	assert (cur_plugin_name_len <= cur_len);
+	/* ..., and then rewrite it.  */
+	memcpy (plugin_name + gomp_plugin_prefix_len,
+		cur_plugin_name, cur_plugin_name_len);
+	memcpy (plugin_name + gomp_plugin_prefix_len + cur_plugin_name_len,
+		gomp_plugin_suffix, gomp_plugin_suffix_len);
+	plugin_name[gomp_plugin_prefix_len
+		    + cur_plugin_name_len
+		    + gomp_plugin_suffix_len] = '\0';
 
-	plugin_name = (char *) malloc (1 + (next ? next - cur : strlen (cur))
-				       + strlen (prefix) + strlen (suffix));
-	if (!plugin_name)
-	  {
-	    num_devices = 0;
-	    break;
-	  }
-
-	strcpy (plugin_name, prefix);
-	strncat (plugin_name, cur, next ? next - cur : strlen (cur));
-	strcat (plugin_name, suffix);
-
+	struct gomp_device_descr current_device;
 	if (gomp_load_plugin_for_device (&current_device, plugin_name))
 	  {
 	    new_num_devices = current_device.get_num_devices_func ();
 	    if (new_num_devices >= 1)
 	      {
-		/* Augment DEVICES and NUM_DEVICES.  */
-
-		devices = realloc (devices, (num_devices + new_num_devices)
-				   * sizeof (struct gomp_device_descr));
-		if (!devices)
-		  {
-		    num_devices = 0;
-		    free (plugin_name);
-		    break;
-		  }
-
 		current_device.name = current_device.get_name_func ();
 		/* current_device.capabilities has already been set.  */
 		current_device.type = current_device.get_type_func ();
 		current_device.mem_map.root = NULL;
 		current_device.is_initialized = false;
 		current_device.openacc.data_environ = NULL;
+
+		/* Augment DEVICES and NUM_DEVICES.  */
+		devices = gomp_realloc (devices,
+					((num_devices + new_num_devices)
+					 * sizeof (struct gomp_device_descr)));
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.target_id = i;
@@ -1286,18 +1386,12 @@ gomp_target_init (void)
 	free (plugin_name);
 	cur = next + 1;
       }
-    while (next);
+    while (*next);
 
   /* In DEVICES, sort the GOMP_OFFLOAD_CAP_OPENMP_400 ones first, and set
      NUM_DEVICES_OPENMP.  */
   struct gomp_device_descr *devices_s
-    = malloc (num_devices * sizeof (struct gomp_device_descr));
-  if (!devices_s)
-    {
-      num_devices = 0;
-      free (devices);
-      devices = NULL;
-    }
+    = gomp_malloc (num_devices * sizeof (struct gomp_device_descr));
   num_devices_openmp = 0;
   for (i = 0; i < num_devices; i++)
     if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
Index: libgomp/env.c
===================================================================
--- libgomp/env.c	(revision 226979)
+++ libgomp/env.c	(working copy)
@@ -1175,11 +1175,7 @@ handle_omp_display_env (unsigned long stacksize, i
 }
 
 
-/* TODO.  See testsuite/lib/libgomp.exp:libgomp_init.  */
-#if 0
-static
-#endif
-void __attribute__((constructor))
+static void __attribute__((constructor))
 initialize_env (void)
 {
   unsigned long thread_limit_var, stacksize;
Index: libgomp/libgomp.h
===================================================================
--- libgomp/libgomp.h	(revision 226979)
+++ libgomp/libgomp.h	(working copy)
@@ -785,6 +785,7 @@ extern void gomp_unmap_vars (struct target_mem_des
 extern void gomp_init_device (struct gomp_device_descr *);
 extern void gomp_fini_device (struct gomp_device_descr *);
 extern void gomp_unload_device (struct gomp_device_descr *);
+extern bool gomp_offload_target_available_p (int);
 
 /* work.c */
 
Index: libgomp/testsuite/libgomp.oacc-c/c.exp
===================================================================
--- libgomp/testsuite/libgomp.oacc-c/c.exp	(revision 226979)
+++ libgomp/testsuite/libgomp.oacc-c/c.exp	(working copy)
@@ -32,8 +32,6 @@ dg-init
 # Turn on OpenACC.
 lappend ALWAYS_CFLAGS "additional_flags=-fopenacc"
 
-lappend libgomp_compile_options "compiler=$GCC_UNDER_TEST"
-
 # Gather a list of all tests.
 set tests [lsort [concat \
 		      [find $srcdir/$subdir *.c] \
@@ -62,17 +60,14 @@ set_ld_library_path_env_vars
 set SAVE_ALWAYS_CFLAGS "$ALWAYS_CFLAGS"
 foreach offload_target_openacc $offload_targets_s_openacc {
     set ALWAYS_CFLAGS "$SAVE_ALWAYS_CFLAGS"
-    set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
-    # Set $ACC_DEVICE_TYPE.  See the comments in
-    # ../lib/libgomp.exp:libgomp_init.
-    lappend ALWAYS_CFLAGS "ldflags=constructor-setenv-ACC_DEVICE_TYPE-$offload_target_openacc.o"
 
     # Todo: Determine shared memory or not using run-time test.
-    switch $offload_target_openacc {
-	host {
+    switch -glob $offload_target_openacc {
+	disable {
 	    set acc_mem_shared 1
+	    set tagopt "-DACC_DEVICE_TYPE_host=1"
 	}
-	nvidia {
+	nvptx* {
 	    if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		# Don't bother; execution testing is going to FAIL.
 		untested "$subdir $offload_target_openacc offloading"
@@ -86,12 +81,14 @@ foreach offload_target_openacc $offload_targets_s_
 	    lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/libgomp.oacc-c-c++-common"
 
 	    set acc_mem_shared 0
+	    set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	}
 	default {
 	    set acc_mem_shared 0
+	    #TODO error
 	}
     }
-    set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
+    set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
     dg-runtest $tests "$tagopt" $DEFAULT_CFLAGS
     gcc-dg-runtest $ttests "$tagopt" ""
Index: libgomp/testsuite/libgomp.c++/c++.exp
===================================================================
--- libgomp/testsuite/libgomp.c++/c++.exp	(revision 226979)
+++ libgomp/testsuite/libgomp.c++/c++.exp	(working copy)
@@ -4,7 +4,6 @@ load_gcc_lib gcc-dg.exp
 global shlib_ext
 
 set shlib_ext [get_shlib_extension]
-#TODO
 set lang_link_flags "-lstdc++"
 set lang_test_file_found 0
 set lang_library_path "../libstdc++-v3/src/.libs"
@@ -47,13 +46,6 @@ if { $blddir != "" } {
 }
 
 if { $lang_test_file_found } {
-    if ![info exists GXX_UNDER_TEST] then {
-	# TODO.  See libgomp.oacc-c++/c++.exp.
-	set HAVE_SET_GXX_UNDER_TEST ""
-	set GXX_UNDER_TEST "$GCC_UNDER_TEST"
-    }
-    lappend libgomp_compile_options "compiler=$GXX_UNDER_TEST"
-
     # Gather a list of all tests.
     set tests [lsort [find $srcdir/$subdir *.C]]
 
@@ -76,10 +68,5 @@ if { $lang_test_file_found } {
     dg-runtest $tests "" "$libstdcxx_includes $DEFAULT_CFLAGS"
 }
 
-# TODO.  See above.
-if { [info exists HAVE_SET_GXX_UNDER_TEST] } {
-    unset GXX_UNDER_TEST
-}
-
 # All done.
 dg-finish
Index: libgomp/testsuite/libgomp.fortran/fortran.exp
===================================================================
--- libgomp/testsuite/libgomp.fortran/fortran.exp	(revision 226979)
+++ libgomp/testsuite/libgomp.fortran/fortran.exp	(working copy)
@@ -47,11 +47,6 @@ if { $blddir != "" } {
 }
 
 if { $lang_test_file_found } {
-    if ![info exists GFORTRAN_UNDER_TEST] then {
-	set GFORTRAN_UNDER_TEST $GCC_UNDER_TEST
-    }
-    lappend libgomp_compile_options "compiler=$GFORTRAN_UNDER_TEST"
-
     # Gather a list of all tests.
     set tests [lsort [find $srcdir/$subdir *.\[fF\]{,90,95,03,08}]]
 
Index: libgomp/testsuite/libgomp.oacc-c++/c++.exp
===================================================================
--- libgomp/testsuite/libgomp.oacc-c++/c++.exp	(revision 226979)
+++ libgomp/testsuite/libgomp.oacc-c++/c++.exp	(working copy)
@@ -13,7 +13,6 @@ load_gcc_lib gcc-dg.exp
 global shlib_ext
 
 set shlib_ext [get_shlib_extension]
-#TODO
 set lang_link_flags "-lstdc++"
 set lang_test_file_found 0
 set lang_library_path "../libstdc++-v3/src/.libs"
@@ -32,6 +31,11 @@ dg-init
 # Turn on OpenACC.
 lappend ALWAYS_CFLAGS "additional_flags=-fopenacc"
 
+# Switch into C++ mode.  Otherwise, the libgomp.oacc-c-c++-common/*.c
+# files would be compiled as C files.
+set SAVE_GCC_UNDER_TEST "$GCC_UNDER_TEST"
+set GCC_UNDER_TEST "$GCC_UNDER_TEST -x c++"
+
 set blddir [lookfor_file [get_multilibs] libgomp]
 
 
@@ -56,14 +60,6 @@ if { $blddir != "" } {
 }
 
 if { $lang_test_file_found } {
-    if ![info exists GXX_UNDER_TEST] then {
-	# Use GCC_UNDER_TEST, but switch into C++ mode, as otherwise the
-	# libgomp.oacc-c-c++-common/*.c files would be compiled as C files.
-	set HAVE_SET_GXX_UNDER_TEST ""
-	set GXX_UNDER_TEST "$GCC_UNDER_TEST -x c++"
-    }
-    lappend libgomp_compile_options "compiler=$GXX_UNDER_TEST"
-
     # Gather a list of all tests.
     set tests [lsort [concat \
 			  [find $srcdir/$subdir *.C] \
@@ -104,17 +100,14 @@ if { $lang_test_file_found } {
     set SAVE_ALWAYS_CFLAGS "$ALWAYS_CFLAGS"
     foreach offload_target_openacc $offload_targets_s_openacc {
 	set ALWAYS_CFLAGS "$SAVE_ALWAYS_CFLAGS"
-	set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
-	# Set $ACC_DEVICE_TYPE.  See the comments in
-	# ../lib/libgomp.exp:libgomp_init.
-	lappend ALWAYS_CFLAGS "ldflags=constructor-setenv-ACC_DEVICE_TYPE-$offload_target_openacc.o"
 
 	# Todo: Determine shared memory or not using run-time test.
-	switch $offload_target_openacc {
-	    host {
+	switch -glob $offload_target_openacc {
+	    disable {
 		set acc_mem_shared 1
+		set tagopt "-DACC_DEVICE_TYPE_host=1"
 	    }
-	    nvidia {
+	    nvptx* {
 		if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		    # Don't bother; execution testing is going to FAIL.
 		    untested "$subdir $offload_target_openacc offloading"
@@ -128,12 +121,14 @@ if { $lang_test_file_found } {
 		lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/libgomp.oacc-c-c++-common"
 
 		set acc_mem_shared 0
+		set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	    }
 	    default {
 		set acc_mem_shared 0
+		#TODO error
 	    }
 	}
-	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
+	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
 	dg-runtest $tests "$tagopt" "$libstdcxx_includes $DEFAULT_CFLAGS"
 	gcc-dg-runtest $ttests "$tagopt" "$libstdcxx_includes"
@@ -141,9 +136,7 @@ if { $lang_test_file_found } {
 }
 
 # See above.
-if { [info exists HAVE_SET_GXX_UNDER_TEST] } {
-    unset GXX_UNDER_TEST
-}
+set GCC_UNDER_TEST "$SAVE_GCC_UNDER_TEST"
 
 unset TORTURE_OPTIONS
 
Index: libgomp/testsuite/lib/libgomp.exp
===================================================================
--- libgomp/testsuite/lib/libgomp.exp	(revision 226979)
+++ libgomp/testsuite/lib/libgomp.exp	(working copy)
@@ -36,24 +36,21 @@ load_gcc_lib fortran-modules.exp
 load_file libgomp-test-support.exp
 
 # Populate offload_targets_s (offloading targets separated by a space), and
-# offload_targets_s_openacc (the same, but with OpenACC names; OpenACC spells
-# some of them a little differently).
-set offload_targets_s [split $offload_targets ","]
+# offload_targets_s_openacc (those suitable for OpenACC).
+set offload_targets_s [split $offload_targets ":"]
 set offload_targets_s_openacc {}
 foreach offload_target_openacc $offload_targets_s {
-    switch $offload_target_openacc {
-	intelmic {
+    switch -glob $offload_target_openacc {
+	*-intelmic* {
 	    # TODO.  Skip; will all FAIL because of missing
 	    # GOMP_OFFLOAD_CAP_OPENACC_200.
 	    continue
 	}
-	nvptx {
-	    set offload_target_openacc "nvidia"
-	}
     }
     lappend offload_targets_s_openacc "$offload_target_openacc"
 }
-lappend offload_targets_s_openacc "host"
+# Host fallback.
+lappend offload_targets_s_openacc "disable"
 
 set dg-do-what-default run
 
@@ -134,7 +131,7 @@ proc libgomp_init { args } {
     # Add liboffloadmic build directory in LD_LIBRARY_PATH to support
     # non-fallback testing for Intel MIC targets
     global offload_targets
-    if { [string match "*,intelmic,*" ",$offload_targets,"] } {
+    if { [string match "*:*-intelmic*:*" ":$offload_targets:"] } {
 	append always_ld_library_path ":${blddir}/../liboffloadmic/.libs"
 	append always_ld_library_path ":${blddir}/../liboffloadmic/plugin/.libs"
 	# libstdc++ is required by liboffloadmic
@@ -235,56 +232,6 @@ proc libgomp_init { args } {
     if { $offload_additional_options != "" } {
 	lappend ALWAYS_CFLAGS "additional_flags=${offload_additional_options}"
     }
-
-    # TODO.  Evil hack.  DejaGnu doesn't have a mechanism for setting
-    # environment variables on remote boards.  Thus, we have to fake it, using
-    # GCC's constructor attributes to create object files that install the
-    # desired environment variables.
-    set e_list [list \
-		    [list defaults DUMMY=dummy ] \
-		    [list ACC_DEVICE_TYPE-host ACC_DEVICE_TYPE=host ] \
-		    [list ACC_DEVICE_TYPE-nvidia ACC_DEVICE_TYPE=nvidia ] ]
-    foreach e $e_list {
-	set v [lindex $e 0]
-	set s [lindex $e 1]
-	verbose "creating constructor-setenv: $v: $s"
-	set src constructor-setenv-$v.c
-	set obj constructor-setenv-$v.o
-	set f_src [open $src "w"]
-	puts $f_src "static void __attribute__((constructor(1000)))"
-	puts $f_src "init_env(void) {"
-	puts $f_src "  int putenv(char *);"
-	puts $f_src "  putenv(\"$s\");"
-	puts $f_src "}"
-	if { $v == "defaults" } {
-	    # TODO.  We want libgomp to initialize after the putenv calls.
-	    # But: shared libraries' constructors (and thus
-	    # env.c:initialize_env) will be called before the executable's
-	    # (init_env functions created above), so it will already have been
-	    # initialized (and has to be, in case we're not linking in this
-	    # gunk).  Assuming no execution of other libgomp functionality in
-	    # between (which we're not doing during initialization),
-	    # initialize_env's effects are idempotent when calling it again, so
-	    # we'll do that now, after the putenv calls have been executed.
-	    puts $f_src "static void __attribute__((constructor(1001)))"
-	    puts $f_src "init_libgomp(void) {"
-	    # Some test cases specify -fno-openmp, so libgomp isn't linked in.
-	    puts $f_src "  void initialize_env(void) __attribute__((weak));"
-	    puts $f_src "  if (initialize_env)"
-	    puts $f_src "    initialize_env();"
-	    puts $f_src "}"
-	}
-	close $f_src
-	# TODO.  Using whichever compiler is currently configured...  At least
-	# switch it into C mode.
-	set lines [libgomp_target_compile $src $obj object "additional_flags=-xc"]
-	# TODO.  Error checking.
-	file delete $src
-    }
-    # When adding constructor-setenv-*.o files, make sure to cancel any -x flag
-    # that may have been set before.
-    lappend ALWAYS_CFLAGS "ldflags=-x none"
-    lappend ALWAYS_CFLAGS "ldflags=constructor-setenv-defaults.o"
 }
 
 #
@@ -296,6 +243,7 @@ proc libgomp_target_compile { source dest type opt
     global libgomp_compile_options
     global gluefile wrap_flags
     global ALWAYS_CFLAGS
+    global GCC_UNDER_TEST
     global lang_test_file
     global lang_library_path
     global lang_link_flags
@@ -323,6 +271,7 @@ proc libgomp_target_compile { source dest type opt
 
     lappend options "additional_flags=[libio_include_flags]"
     lappend options "timeout=[timeout_value]"
+    lappend options "compiler=$GCC_UNDER_TEST"
 
     set options [concat $libgomp_compile_options $options]
 
@@ -370,7 +319,7 @@ proc check_effective_target_offload_device { } {
 
 proc check_effective_target_openacc_nvidia_accel_supported { } {
     global offload_targets_s_openacc
-    set res [lsearch $offload_targets_s_openacc "nvidia" ]
+    set res [lsearch -glob $offload_targets_s_openacc "nvptx*" ]
     if { $res != -1 } {
 	return 1;
     }
@@ -396,7 +345,7 @@ proc check_effective_target_openacc_nvidia_accel_s
 	return 0;
     }
     global offload_target_openacc
-    if { $offload_target_openacc == "nvidia" } {
+    if { [string match "nvptx*" $offload_target_openacc] } {
         return 1;
     }
     return 0;
@@ -406,7 +355,7 @@ proc check_effective_target_openacc_nvidia_accel_s
 
 proc check_effective_target_openacc_host_selected { } {
     global offload_target_openacc
-    if { $offload_target_openacc == "host" } {
+    if { $offload_target_openacc == "disable" } {
         return 1;
     }
     return 0;
Index: libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
===================================================================
--- libgomp/testsuite/libgomp.oacc-fortran/fortran.exp	(revision 226979)
+++ libgomp/testsuite/libgomp.oacc-fortran/fortran.exp	(working copy)
@@ -49,11 +49,6 @@ if { $blddir != "" } {
 }
 
 if { $lang_test_file_found } {
-    if ![info exists GFORTRAN_UNDER_TEST] then {
-	set GFORTRAN_UNDER_TEST $GCC_UNDER_TEST
-    }
-    lappend libgomp_compile_options "compiler=$GFORTRAN_UNDER_TEST"
-
     # Gather a list of all tests.
     set tests [lsort [find $srcdir/$subdir *.\[fF\]{,90,95,03,08}]]
 
@@ -74,20 +69,14 @@ if { $lang_test_file_found } {
     set_ld_library_path_env_vars
 
     # Test OpenACC with available accelerators.
-    set SAVE_ALWAYS_CFLAGS "$ALWAYS_CFLAGS"
     foreach offload_target_openacc $offload_targets_s_openacc {
-	set ALWAYS_CFLAGS "$SAVE_ALWAYS_CFLAGS"
-	set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
-	# Set $ACC_DEVICE_TYPE.  See the comments in
-	# ../lib/libgomp.exp:libgomp_init.
-	lappend ALWAYS_CFLAGS "ldflags=constructor-setenv-ACC_DEVICE_TYPE-$offload_target_openacc.o"
-
 	# Todo: Determine shared memory or not using run-time test.
-	switch $offload_target_openacc {
-	    host {
+	switch -glob $offload_target_openacc {
+	    disable {
 		set acc_mem_shared 1
+		set tagopt "-DACC_DEVICE_TYPE_host=1"
 	    }
-	    nvidia {
+	    nvptx* {
 		if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		    # Don't bother; execution testing is going to FAIL.
 		    untested "$subdir $offload_target_openacc offloading"
@@ -95,12 +84,14 @@ if { $lang_test_file_found } {
 		}
 
 		set acc_mem_shared 0
+		set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	    }
 	    default {
 		set acc_mem_shared 0
+		#TODO error
 	    }
 	}
-	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
+	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
 	# For Fortran we're doing torture testing, as Fortran has far more tests
 	# with arrays etc. that testing just -O0 or -O2 is insufficient, that is
Index: libgomp/testsuite/libgomp.c/c.exp
===================================================================
--- libgomp/testsuite/libgomp.c/c.exp	(revision 226979)
+++ libgomp/testsuite/libgomp.c/c.exp	(working copy)
@@ -23,8 +23,6 @@ dg-init
 # Turn on OpenMP.
 lappend ALWAYS_CFLAGS "additional_flags=-fopenmp"
 
-lappend libgomp_compile_options "compiler=$GCC_UNDER_TEST"
-
 # Gather a list of all tests.
 set tests [lsort [find $srcdir/$subdir *.c]]
 
Index: libgomp/testsuite/libgomp.graphite/graphite.exp
===================================================================
--- libgomp/testsuite/libgomp.graphite/graphite.exp	(revision 226979)
+++ libgomp/testsuite/libgomp.graphite/graphite.exp	(working copy)
@@ -48,8 +48,6 @@ dg-init
 # Turn on OpenMP.
 lappend ALWAYS_CFLAGS "additional_flags=-fopenmp"
 
-lappend libgomp_compile_options "compiler=$GCC_UNDER_TEST"
-
 # Gather a list of all tests.
 set tests [lsort [find $srcdir/$subdir *.c]]
 
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 226979)
+++ gcc/doc/invoke.texi	(working copy)
@@ -24036,6 +24036,10 @@ macro in the machine description macro file.
 This flag does not have a negative form, because it specifies a
 three-way choice.
 
+Note that this flag may conflict with the @option{-ffixed-form} as
+well as @option{-ffixed-line-length-none} and
+@option{-ffixed-line-length-<n>} options of the Fortran front end.
+
 @item -fcall-used-@var{reg}
 @opindex fcall-used
 Treat the register named @var{reg} as an allocable register that is
Index: gcc/gcc.c
===================================================================
--- gcc/gcc.c	(revision 226979)
+++ gcc/gcc.c	(working copy)
@@ -158,7 +158,7 @@ static const char *const spec_version = DEFAULT_TA
 static const char *spec_machine = DEFAULT_TARGET_MACHINE;
 static const char *spec_host_machine = DEFAULT_REAL_TARGET_MACHINE;
 
-/* List of offload targets.  */
+/* List of offload targets.  Empty string for -foffload=disable.  */
 
 static char *offload_targets = NULL;
 
@@ -275,6 +275,8 @@ static const char *compare_debug_auxbase_opt_spec_
 static const char *pass_through_libs_spec_func (int, const char **);
 static const char *replace_extension_spec_func (int, const char **);
 static const char *greater_than_spec_func (int, const char **);
+static const char *add_omp_infile_spec_func (int, const char **);
+
 static char *convert_white_space (char *);
 \f
 /* The Specs Language
@@ -1061,6 +1063,14 @@ static const char *const multilib_defaults_raw[] =
 
 static const char *const driver_self_specs[] = {
   "%{fdump-final-insns:-fdump-final-insns=.} %<fdump-final-insns",
+#ifdef ENABLE_OFFLOADING
+  /* If the user didn't specify any, default to all configured offload
+     targets.  */
+  "%{!foffload=*:-foffload=" OFFLOAD_TARGETS "}",
+  /* If linking against libgomp, add a setup file.  */
+  "%{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*} 1):" \
+  "%:add-omp-infile()}",
+#endif /* ENABLE_OFFLOADING */
   DRIVER_SELF_SPECS, CONFIGURE_SPECS, GOMP_SELF_SPECS, GTM_SELF_SPECS,
   CILK_SELF_SPECS
 };
@@ -1491,6 +1501,7 @@ static const struct spec_function static_spec_func
   { "pass-through-libs",	pass_through_libs_spec_func },
   { "replace-extension",	replace_extension_spec_func },
   { "gt",			greater_than_spec_func },
+  { "add-omp-infile",		add_omp_infile_spec_func },
 #ifdef EXTRA_SPEC_FUNCTIONS
   EXTRA_SPEC_FUNCTIONS
 #endif
@@ -3073,10 +3084,16 @@ execute (void)
    SWITCH_LIVE to indicate this switch is true in a conditional spec.
    SWITCH_FALSE to indicate this switch is overridden by a later switch.
    SWITCH_IGNORE to indicate this switch should be ignored (used in %<S).
-   SWITCH_IGNORE_PERMANENTLY to indicate this switch should be ignored
+   SWITCH_IGNORE_PERMANENTLY to indicate this switch should be ignored.
+   SWITCH_KEEP_FOR_GCC to indicate that this switch, otherwise ignored,
+   should be included in COLLECT_GCC_OPTIONS.
    in all do_spec calls afterwards.  Used for %<S from self specs.
-   The `validated' field is nonzero if any spec has looked at this switch;
-   if it remains zero at the end of the run, it must be meaningless.  */
+   The `known' field describes whether this is an internal switch.
+   The `validated' field describes whether any spec has looked at this switch;
+   if it remains false at the end of the run, the switch must be meaningless.
+   The `ordering' field is used to temporarily mark switches that have to be
+   kept in a specific order.
+   The `lang_mask' field stores the flags associated with this option.  */
 
 #define SWITCH_LIVE    			(1 << 0)
 #define SWITCH_FALSE   			(1 << 1)
@@ -3092,6 +3109,7 @@ struct switchstr
   bool known;
   bool validated;
   bool ordering;
+  unsigned int lang_mask;
 };
 
 static struct switchstr *switches;
@@ -3100,6 +3118,10 @@ static int n_switches;
 
 static int n_switches_alloc;
 
+/* If nonzero, do not pass through switches for languages not matching
+   this mask.  */
+static unsigned int spec_lang_mask_accept;
+
 /* Set to zero if -fcompare-debug is disabled, positive if it's
    enabled and we're running the first compilation, negative if it's
    enabled and we're running the second compilation.  For most of the
@@ -3137,6 +3159,7 @@ struct infile
   const char *name;
   const char *language;
   struct compiler *incompiler;
+  unsigned int lang_mask;
   bool compiled;
   bool preprocessed;
 };
@@ -3337,15 +3360,16 @@ alloc_infile (void)
     }
 }
 
-/* Store an input file with the given NAME and LANGUAGE in
+/* Store an input file with the given NAME and LANGUAGE and LANG_MASK in
    infiles.  */
 
 static void
-add_infile (const char *name, const char *language)
+add_infile (const char *name, const char *language, unsigned int lang_mask)
 {
   alloc_infile ();
   infiles[n_infiles].name = name;
-  infiles[n_infiles++].language = language;
+  infiles[n_infiles].language = language;
+  infiles[n_infiles++].lang_mask = lang_mask;
 }
 
 /* Allocate space for a switch in switches.  */
@@ -3366,11 +3390,12 @@ alloc_switch (void)
 }
 
 /* Save an option OPT with N_ARGS arguments in array ARGS, marking it
-   as validated if VALIDATED and KNOWN if it is an internal switch.  */
+   as validated if VALIDATED and KNOWN if it is an internal switch.
+   LANG_MASK is the flags associated with this option.  */
 
 static void
 save_switch (const char *opt, size_t n_args, const char *const *args,
-	     bool validated, bool known)
+	     bool validated, bool known, unsigned int lang_mask)
 {
   alloc_switch ();
   switches[n_switches].part1 = opt + 1;
@@ -3387,6 +3412,7 @@ save_switch (const char *opt, size_t n_args, const
   switches[n_switches].validated = validated;
   switches[n_switches].known = known;
   switches[n_switches].ordering = 0;
+  switches[n_switches].lang_mask = lang_mask;
   n_switches++;
 }
 
@@ -3404,7 +3430,8 @@ driver_unknown_option_callback (const struct cl_de
 	 diagnosed only if there are warnings.  */
       save_switch (decoded->canonical_option[0],
 		   decoded->canonical_option_num_elements - 1,
-		   &decoded->canonical_option[1], false, true);
+		   &decoded->canonical_option[1], false, true,
+		   cl_options[decoded->opt_index].flags);
       return false;
     }
   if (decoded->opt_index == OPT_SPECIAL_unknown)
@@ -3412,7 +3439,8 @@ driver_unknown_option_callback (const struct cl_de
       /* Give it a chance to define it a spec file.  */
       save_switch (decoded->canonical_option[0],
 		   decoded->canonical_option_num_elements - 1,
-		   &decoded->canonical_option[1], false, false);
+		   &decoded->canonical_option[1], false, false,
+		   cl_options[decoded->opt_index].flags);
       return false;
     }
   else
@@ -3439,7 +3467,8 @@ driver_wrong_lang_callback (const struct cl_decode
   else
     save_switch (decoded->canonical_option[0],
 		 decoded->canonical_option_num_elements - 1,
-		 &decoded->canonical_option[1], false, true);
+		 &decoded->canonical_option[1], false, true,
+		 option->flags);
 }
 
 static const char *spec_lang = 0;
@@ -3689,7 +3718,8 @@ driver_handle_option (struct gcc_options *opts,
 	compare_debug_opt = NULL;
       else
 	compare_debug_opt = arg;
-      save_switch (compare_debug_replacement_opt, 0, NULL, validated, true);
+      save_switch (compare_debug_replacement_opt, 0, NULL, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
     case OPT_fdiagnostics_color_:
@@ -3744,17 +3774,17 @@ driver_handle_option (struct gcc_options *opts,
 	for (j = 0; arg[j]; j++)
 	  if (arg[j] == ',')
 	    {
-	      add_infile (save_string (arg + prev, j - prev), "*");
+	      add_infile (save_string (arg + prev, j - prev), "*", 0);
 	      prev = j + 1;
 	    }
 	/* Record the part after the last comma.  */
-	add_infile (arg + prev, "*");
+	add_infile (arg + prev, "*", 0);
       }
       do_save = false;
       break;
 
     case OPT_Xlinker:
-      add_infile (arg, "*");
+      add_infile (arg, "*", 0);
       do_save = false;
       break;
 
@@ -3776,19 +3806,21 @@ driver_handle_option (struct gcc_options *opts,
     case OPT_l:
       /* POSIX allows separation of -l and the lib arg; canonicalize
 	 by concatenating -l with its arg */
-      add_infile (concat ("-l", arg, NULL), "*");
+      add_infile (concat ("-l", arg, NULL), "*", 0);
       do_save = false;
       break;
 
     case OPT_L:
       /* Similarly, canonicalize -L for linkers that may not accept
 	 separate arguments.  */
-      save_switch (concat ("-L", arg, NULL), 0, NULL, validated, true);
+      save_switch (concat ("-L", arg, NULL), 0, NULL, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
     case OPT_F:
       /* Likewise -F.  */
-      save_switch (concat ("-F", arg, NULL), 0, NULL, validated, true);
+      save_switch (concat ("-F", arg, NULL), 0, NULL, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
     case OPT_save_temps:
@@ -3911,7 +3943,8 @@ driver_handle_option (struct gcc_options *opts,
       save_temps_prefix = xstrdup (arg);
       /* On some systems, ld cannot handle "-o" without a space.  So
 	 split the option from its argument.  */
-      save_switch ("-o", 1, &arg, validated, true);
+      save_switch ("-o", 1, &arg, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
 #ifdef ENABLE_DEFAULT_PIE
@@ -3945,9 +3978,12 @@ driver_handle_option (struct gcc_options *opts,
     }
 
   if (do_save)
+    {
     save_switch (decoded->canonical_option[0],
 		 decoded->canonical_option_num_elements - 1,
-		 &decoded->canonical_option[1], validated, true);
+		 &decoded->canonical_option[1], validated, true,
+		 cl_options[opt_index].flags);
+    }
   return true;
 }
 
@@ -4244,7 +4280,7 @@ process_command (unsigned int decoded_options_coun
           if (strcmp (fname, "-") != 0 && access (fname, F_OK) < 0)
 	    perror_with_name (fname);
           else
-	    add_infile (arg, spec_lang);
+	    add_infile (arg, spec_lang, 0);
 
           free (fname);
 	  continue;
@@ -4386,7 +4422,8 @@ process_command (unsigned int decoded_options_coun
   if (compare_debug == 2 || compare_debug == 3)
     {
       const char *opt = concat ("-fcompare-debug=", compare_debug_opt, NULL);
-      save_switch (opt, 0, NULL, false, true);
+      save_switch (opt, 0, NULL, false, true,
+		   cl_options[OPT_fcompare_debug_].flags);
       compare_debug = 1;
     }
 
@@ -4397,7 +4434,7 @@ process_command (unsigned int decoded_options_coun
 
       /* Create a dummy input file, so that we can pass
 	 the help option on to the various sub-processes.  */
-      add_infile ("help-dummy", "c");
+      add_infile ("help-dummy", "c", 0);
     }
 
   alloc_switch ();
@@ -4615,13 +4652,15 @@ insert_wrapper (const char *wrapper)
 }
 
 /* Process the spec SPEC and run the commands specified therein.
+   If LANG_MASK is nonzero, switches for other languages are discarded.
    Returns 0 if the spec is successfully processed; -1 if failed.  */
 
 int
-do_spec (const char *spec)
+do_spec (const char *spec, unsigned int lang_mask)
 {
   int value;
 
+  spec_lang_mask_accept = lang_mask;
   value = do_spec_2 (spec);
 
   /* Force out any unfinished command.
@@ -4779,7 +4818,8 @@ do_self_spec (const char *spec)
 	      save_switch (decoded_options[j].canonical_option[0],
 			   (decoded_options[j].canonical_option_num_elements
 			    - 1),
-			   &decoded_options[j].canonical_option[1], false, true);
+			   &decoded_options[j].canonical_option[1], false, true,
+			   cl_options[decoded_options[j].opt_index].flags);
 	      break;
 
 	    default:
@@ -6368,6 +6408,14 @@ check_live_switch (int switchnum, int prefix_lengt
 static void
 give_switch (int switchnum, int omit_first_word)
 {
+  int lang_mask = switches[switchnum].lang_mask & ((1U << cl_lang_count) - 1);
+  unsigned int lang_mask_accept = (1U << cl_lang_count) - 1;
+  if (spec_lang_mask_accept != 0)
+    lang_mask_accept = spec_lang_mask_accept;
+  /* Drop switches specific to a language not in the given mask.  */
+  if (lang_mask != 0 && !(lang_mask & lang_mask_accept))
+    return;
+
   if ((switches[switchnum].live_cond & SWITCH_IGNORE) != 0)
     return;
 
@@ -7448,22 +7496,14 @@ driver::maybe_putenv_COLLECT_LTO_WRAPPER () const
 void
 driver::maybe_putenv_OFFLOAD_TARGETS () const
 {
-  const char *targets = offload_targets;
-
-  /* If no targets specified by -foffload, use all available targets.  */
-  if (!targets)
-    targets = OFFLOAD_TARGETS;
-
-  if (strlen (targets) > 0)
+  if (offload_targets && offload_targets[0] != '\0')
     {
       obstack_grow (&collect_obstack, "OFFLOAD_TARGET_NAMES=",
 		    sizeof ("OFFLOAD_TARGET_NAMES=") - 1);
-      obstack_grow (&collect_obstack, targets,
-		    strlen (targets) + 1);
+      obstack_grow (&collect_obstack, offload_targets,
+		    strlen (offload_targets) + 1);
       xputenv (XOBFINISH (&collect_obstack, char *));
     }
-
-  free (offload_targets);
 }
 
 /* Reject switches that no pass was interested in.  */
@@ -7767,7 +7807,8 @@ driver::do_spec_on_infiles () const
 		  debug_check_temp_file[1] = NULL;
 		}
 
-	      value = do_spec (input_file_compiler->spec);
+	      value = do_spec (input_file_compiler->spec,
+			       infiles[i].lang_mask);
 	      infiles[i].compiled = true;
 	      if (value < 0)
 		this_file_error = 1;
@@ -7781,7 +7822,8 @@ driver::do_spec_on_infiles () const
 		  n_switches_alloc = n_switches_alloc_debug_check[1];
 		  switches = switches_debug_check[1];
 
-		  value = do_spec (input_file_compiler->spec);
+		  value = do_spec (input_file_compiler->spec,
+				   infiles[i].lang_mask);
 
 		  compare_debug = -compare_debug;
 		  n_switches = n_switches_debug_check[0];
@@ -7936,7 +7978,7 @@ driver::maybe_run_linker (const char *argv0) const
 		    " to the linker.\n\n"));
 	  fflush (stdout);
 	}
-      int value = do_spec (link_command_spec);
+      int value = do_spec (link_command_spec, 0);
       if (value < 0)
 	errorcount = 1;
       linker_was_run = (tmp != execution_count);
@@ -9507,6 +9549,50 @@ greater_than_spec_func (int argc, const char **arg
   return NULL;
 }
 
+/* If applicable, generate a C source file containing a constructor call to
+   GOMP_set_offload_targets, to inform libgomp which offload targets have
+   actually been requested (-foffload=[...]), and adds that as an infile.  */
+
+static const char *
+add_omp_infile_spec_func (int argc, const char **)
+{
+  gcc_assert (argc == 0);
+  gcc_assert (offload_targets != NULL);
+
+  /* Nothing to do if we're not actually linking.  */
+  if (have_c)
+    return NULL;
+
+  int err;
+  const char *tmp_filename;
+  tmp_filename = make_temp_file (".c");
+  record_temp_file (tmp_filename, !save_temps_flag, 0);
+  FILE *f = fopen (tmp_filename, "w");
+  if (f == NULL)
+    fatal_error (input_location,
+		 "could not open temporary file %s", tmp_filename);
+  /* As libgomp uses constructors internally, and this code is only added when
+     linking against libgomp, it is fine to use a constructor here.  */
+  err = fprintf (f,
+		 "extern void GOMP_set_offload_targets (const char *);\n"
+		 "static __attribute__ ((constructor)) void\n"
+		 "init (void)\n"
+		 "{\n"
+		 "  GOMP_set_offload_targets (\"%s\");\n"
+		 "}\n",
+		 offload_targets);
+  if (err < 0)
+    fatal_error (input_location,
+		 "could not write to temporary file %s", tmp_filename);
+  err = fclose (f);
+  if (err == EOF)
+    fatal_error (input_location,
+		 "could not close temporary file %s", tmp_filename);
+
+  add_infile (tmp_filename, "cpp-output", CL_C);
+  return NULL;
+}
+
 /* Insert backslash before spaces in ORIG (usually a file path), to 
    avoid being broken by spec parser.
 
Index: gcc/gcc.h
===================================================================
--- gcc/gcc.h	(revision 226979)
+++ gcc/gcc.h	(working copy)
@@ -65,7 +65,7 @@ struct spec_function
 };
 
 /* These are exported by gcc.c.  */
-extern int do_spec (const char *);
+extern int do_spec (const char *, unsigned int);
 extern void record_temp_file (const char *, int, int);
 extern void pfatal_with_name (const char *) ATTRIBUTE_NORETURN;
 extern void set_input (const char *);
Index: gcc/fortran/gfortranspec.c
===================================================================
--- gcc/fortran/gfortranspec.c	(revision 226979)
+++ gcc/fortran/gfortranspec.c	(working copy)
@@ -441,7 +441,7 @@ int
 lang_specific_pre_link (void)
 {
   if (library)
-    do_spec ("%:include(libgfortran.spec)");
+    do_spec ("%:include(libgfortran.spec)", 0);
 
   return 0;
 }
Index: gcc/java/jvspec.c
===================================================================
--- gcc/java/jvspec.c	(revision 226979)
+++ gcc/java/jvspec.c	(working copy)
@@ -629,7 +629,7 @@ lang_specific_pre_link (void)
      class name.  Append dummy `.c' that can be stripped by set_input so %b
      is correct.  */ 
   set_input (concat (main_class_name, "main.c", NULL));
-  err = do_spec (jvgenmain_spec);
+  err = do_spec (jvgenmain_spec, 0);
   if (err == 0)
     {
       /* Shift the outfiles array so the generated main comes first.


-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
  2015-08-20 23:38                                         ` Joseph Myers
@ 2015-08-21 16:13                                           ` Nathan Sidwell
  2015-08-21 16:21                                             ` Joseph Myers
  2015-08-25 15:04                                           ` Joseph Myers
  2018-05-20 20:30                                           ` [og7] " Thomas Schwinge
  2 siblings, 1 reply; 62+ messages in thread
From: Nathan Sidwell @ 2015-08-21 16:13 UTC (permalink / raw)
  To: Joseph Myers, Thomas Schwinge; +Cc: GCC Patches

On 08/20/15 18:52, Joseph Myers wrote:
> On Tue, 18 Aug 2015, Thomas Schwinge wrote:

> This is what I've committed to gomp-4_0-branch, with the driver changes
> substantially cleaned up and smaller changes to the other bits of the
> patch.
>
> gcc:
> 2015-08-20  Thomas Schwinge  <thomas@codesourcery.com>
> 	    Joseph Myers  <joseph@codesourcery.com>
>
> 	* doc/invoke.texi (-ffixed-@var{reg}): Document conflict with
> 	Fortran options.
> 	* gcc.c (offload_targets): Update comment.
> 	(add_omp_infile_spec_func, spec_lang_mask_accept): New.
> 	(driver_self_specs) [ENABLE_OFFLOADING]: Add spec to use
> 	%:add-omp-infile().
> 	(static_spec_functions): Add add-omp-infile.
> 	(struct switchstr): Add lang_mask field.  Expand comment.
> 	(struct infile): Add lang_mask field.
> 	(add_infile, save_switch, do_spec): Add lang_mask argument.
> 	(driver_unknown_option_callback, driver_wrong_lang_callback)
> 	(driver_handle_option, process_command, do_self_spec)
> 	(driver::do_spec_on_infiles): All callers changed.
> 	(give_switch): Check languages of switch against
> 	spec_lang_mask_accept.
> 	(driver::maybe_putenv_OFFLOAD_TARGETS): Do not use intermediate
> 	targets variable.
> 	* gcc.h (do_spec): Update prototype.

this appears to cause an ICE in add_omp_infile_spec_func at;
   gcc_assert (offload_targets != NULL);

when you use something like -foffload='-save-temps -v -fdump-rtl-all 
-fdump-tree-all -fno-verbose-asm'

Is that use ill-formed?

nathan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
  2015-08-21 16:13                                           ` Nathan Sidwell
@ 2015-08-21 16:21                                             ` Joseph Myers
  2015-08-24 18:05                                               ` Joseph Myers
  0 siblings, 1 reply; 62+ messages in thread
From: Joseph Myers @ 2015-08-21 16:21 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: Thomas Schwinge, GCC Patches

On Fri, 21 Aug 2015, Nathan Sidwell wrote:

> this appears to cause an ICE in add_omp_infile_spec_func at;
>   gcc_assert (offload_targets != NULL);
> 
> when you use something like -foffload='-save-temps -v -fdump-rtl-all
> -fdump-tree-all -fno-verbose-asm'
> 
> Is that use ill-formed?

I'll need to reverse-engineer the question of what's a well-formed 
-foffload= option (bug 67300 filed yesterday for the lack of any 
documentation of that option).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
  2015-08-21 16:21                                             ` Joseph Myers
@ 2015-08-24 18:05                                               ` Joseph Myers
  2015-08-24 22:50                                                 ` Joseph Myers
  0 siblings, 1 reply; 62+ messages in thread
From: Joseph Myers @ 2015-08-24 18:05 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: Thomas Schwinge, GCC Patches

On Fri, 21 Aug 2015, Joseph Myers wrote:

> On Fri, 21 Aug 2015, Nathan Sidwell wrote:
> 
> > this appears to cause an ICE in add_omp_infile_spec_func at;
> >   gcc_assert (offload_targets != NULL);
> > 
> > when you use something like -foffload='-save-temps -v -fdump-rtl-all
> > -fdump-tree-all -fno-verbose-asm'
> > 
> > Is that use ill-formed?
> 
> I'll need to reverse-engineer the question of what's a well-formed 
> -foffload= option (bug 67300 filed yesterday for the lack of any 
> documentation of that option).

Although there is no documentation for the -foffload options in the
manual, I found something at <https://gcc.gnu.org/wiki/Offloading>
that I hope is current.

It turns out the problem wasn't in the assertion, but in how a default
-foffload option was generated.  Generating it via specs meant that if
the only -foffload option specified options without specifying a
target (i.e., options applicable to all the configured offload
targets), then the offload_targets variable was never set and so the
assertion failure resulted (as well as OFFLOAD_TARGET_NAMES not being
exported).  Rather than trying to make the specs produce something if
no -foffload=* options other than -foffload=-* options were passed,
I'm testing this patch to default the offload targets after the
original command line is processed (and before extra options from
these specs are processed, so before the assertion is executed), and
will commit it if tests are OK.

2015-08-24  Joseph Myers  <joseph@codesourcery.com>

	* gcc.c (driver_self_specs) [ENABLE_OFFLOADING]: Don't generate a
	-foffload option.
	(process_command): Call handle_foffload_option (OFFLOAD_TARGETS)
	if no offload target specified.

Index: gcc/gcc.c
===================================================================
--- gcc/gcc.c	(revision 227045)
+++ gcc/gcc.c	(working copy)
@@ -1064,9 +1064,6 @@ static const char *const multilib_defaults_raw[] =
 static const char *const driver_self_specs[] = {
   "%{fdump-final-insns:-fdump-final-insns=.} %<fdump-final-insns",
 #ifdef ENABLE_OFFLOADING
-  /* If the user didn't specify any, default to all configured offload
-     targets.  */
-  "%{!foffload=*:-foffload=" OFFLOAD_TARGETS "}",
   /* If linking against libgomp, add a setup file.  */
   "%{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*} 1):" \
   "%:add-omp-infile()}",
@@ -4291,6 +4288,11 @@ process_command (unsigned int decoded_options_coun
 			   CL_DRIVER, &handlers, global_dc);
     }
 
+  /* If the user didn't specify any, default to all configured offload
+     targets.  */
+  if (offload_targets == NULL)
+    handle_foffload_option (OFFLOAD_TARGETS);
+
   if (output_file
       && strcmp (output_file, "-") != 0
       && strcmp (output_file, HOST_BIT_BUCKET) != 0)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
  2015-08-24 18:05                                               ` Joseph Myers
@ 2015-08-24 22:50                                                 ` Joseph Myers
  2015-08-24 23:26                                                   ` Nathan Sidwell
  0 siblings, 1 reply; 62+ messages in thread
From: Joseph Myers @ 2015-08-24 22:50 UTC (permalink / raw)
  To: Nathan Sidwell; +Cc: Thomas Schwinge, GCC Patches

On Mon, 24 Aug 2015, Joseph Myers wrote:

> I'm testing this patch to default the offload targets after the
> original command line is processed (and before extra options from
> these specs are processed, so before the assertion is executed), and
> will commit it if tests are OK.

Now committed to gomp-4_0-branch.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
  2015-08-24 22:50                                                 ` Joseph Myers
@ 2015-08-24 23:26                                                   ` Nathan Sidwell
  0 siblings, 0 replies; 62+ messages in thread
From: Nathan Sidwell @ 2015-08-24 23:26 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Thomas Schwinge, GCC Patches

On 08/24/15 18:22, Joseph Myers wrote:
> On Mon, 24 Aug 2015, Joseph Myers wrote:
>
>> I'm testing this patch to default the offload targets after the
>> original command line is processed (and before extra options from
>> these specs are processed, so before the assertion is executed), and
>> will commit it if tests are OK.
>
> Now committed to gomp-4_0-branch.

thanks!

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
  2015-08-20 23:38                                         ` Joseph Myers
  2015-08-21 16:13                                           ` Nathan Sidwell
@ 2015-08-25 15:04                                           ` Joseph Myers
  2018-05-20 20:30                                           ` [og7] " Thomas Schwinge
  2 siblings, 0 replies; 62+ messages in thread
From: Joseph Myers @ 2015-08-25 15:04 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: GCC Patches, Nathan Sidwell

On reviewing in more detail the changes to pass offloading targets
from the driver to libgomp at link time to identify the minimal
self-contained pieces that can go to trunk, I found that the use of
fnmatch to match against target names was completely unnecessary; the
ISO C90 functions strstr and strncmp could be used instead, so
avoiding the need to add configure tests for fnmatch.  This patch duly
removes the use of and configure tests for fnmatch.

Will commit to gomp-4_0-branch subject to test results.

2015-08-25  Joseph Myers  <joseph@codesourcery.com>

	* plugin/configfrag.ac: Don't test for fnmatch.h or fnmatch.
	* configure, config.h.in: Regenerate.
	* target.c [PLUGIN_SUPPORT]: Don't include <fnmatch.h>.
	(offload_target_to_plugin_name): Use strstr and strncmp instead of
	fnmatch.

Index: libgomp/config.h.in
===================================================================
--- libgomp/config.h.in	(revision 227169)
+++ libgomp/config.h.in	(working copy)
@@ -24,12 +24,6 @@
 /* Define to 1 if you have the <dlfcn.h> header file. */
 #undef HAVE_DLFCN_H
 
-/* Define to 1 if you have the `fnmatch' function. */
-#undef HAVE_FNMATCH
-
-/* Define to 1 if you have the <fnmatch.h> header file. */
-#undef HAVE_FNMATCH_H
-
 /* Define to 1 if you have the `getloadavg' function. */
 #undef HAVE_GETLOADAVG
 
Index: libgomp/target.c
===================================================================
--- libgomp/target.c	(revision 227169)
+++ libgomp/target.c	(working copy)
@@ -41,7 +41,6 @@
 
 #ifdef PLUGIN_SUPPORT
 #include <dlfcn.h>
-#include <fnmatch.h>
 #include "plugin-suffix.h"
 #endif
 
@@ -1271,9 +1270,9 @@
 static const char *
 offload_target_to_plugin_name (const char *offload_target)
 {
-  if (fnmatch ("*-intelmic*", offload_target, 0) == 0)
+  if (strstr (offload_target, "-intelmic") != NULL)
     return "intelmic";
-  if (fnmatch ("nvptx*", offload_target, 0) == 0)
+  if (strncmp (offload_target, "nvptx", 5) == 0)
     return "nvptx";
   gomp_fatal ("Unknown offload target: %s", offload_target);
 }
Index: libgomp/configure
===================================================================
--- libgomp/configure	(revision 227169)
+++ libgomp/configure	(working copy)
@@ -15119,33 +15119,6 @@
 offload_targets=
 
 plugin_support=yes
-for ac_header in fnmatch.h
-do :
-  ac_fn_c_check_header_mongrel "$LINENO" "fnmatch.h" "ac_cv_header_fnmatch_h" "$ac_includes_default"
-if test "x$ac_cv_header_fnmatch_h" = x""yes; then :
-  cat >>confdefs.h <<_ACEOF
-#define HAVE_FNMATCH_H 1
-_ACEOF
-
-else
-  plugin_support=no
-fi
-
-done
-
-for ac_func in fnmatch
-do :
-  ac_fn_c_check_func "$LINENO" "fnmatch" "ac_cv_func_fnmatch"
-if test "x$ac_cv_func_fnmatch" = x""yes; then :
-  cat >>confdefs.h <<_ACEOF
-#define HAVE_FNMATCH 1
-_ACEOF
-
-else
-  plugin_support=no
-fi
-done
-
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for dlsym in -ldl" >&5
 $as_echo_n "checking for dlsym in -ldl... " >&6; }
 if test "${ac_cv_lib_dl_dlsym+set}" = set; then :
Index: libgomp/plugin/configfrag.ac
===================================================================
--- libgomp/plugin/configfrag.ac	(revision 227169)
+++ libgomp/plugin/configfrag.ac	(working copy)
@@ -29,8 +29,6 @@
 offload_targets=
 AC_SUBST(offload_targets)
 plugin_support=yes
-AC_CHECK_HEADERS([fnmatch.h], , [plugin_support=no])
-AC_CHECK_FUNCS([fnmatch], , [plugin_support=no])
 AC_CHECK_LIB(dl, dlsym, , [plugin_support=no])
 if test x"$plugin_support" = xyes; then
   AC_DEFINE(PLUGIN_SUPPORT, 1,

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Pass -foffload targets from driver to libgomp at link time
@ 2015-08-27 20:58 Joseph Myers
  2015-09-03 14:58 ` Ping " Joseph Myers
  0 siblings, 1 reply; 62+ messages in thread
From: Joseph Myers @ 2015-08-27 20:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas, Nathan Sidwell, jakub

This patch, a version of
<https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01264.html> cleaned up
for trunk, arranges for the -foffload= targets specified at link time
to be passed to libgomp via a constructor function generated by the
driver.

In this patch, I've tried to remove all the miscellaneous cleanups in
the gomp-4_0-branch version that didn't appear to be necessarily
required as part of passing -foffload from the driver to libgomp.
Thus, care should be taken when next merging from trunk to
gomp-4_0-branch not to lose those cleanups where patch conflicts
arise, where the cleanups are still desired for merging to trunk
separately.  It's possible I missed some such changes; thus, this
patch should be reviewed carefully to make sure there isn't anything
unrelated mixed in.

This patch uses GOMP_4.0.2 as the symbol version for the new function
GOMP_set_offload_targets (where the gomp-4_0-branch patch had
GOACC_2.0.GOMP_4_BRANCH).  I hope this is the correct version for a
GOMP_* function that is new in GCC 6.

Tested with no regressions for x86_64-none-linux-gnu, offloading to
nvptx-none; 24 libgomp test FAILs start to pass with the patch.  OK to
commit?

gcc:
2015-08-27  Thomas Schwinge  <thomas@codesourcery.com>
	    Joseph Myers  <joseph@codesourcery.com>

	* gcc.c (offload_targets): Update comment.
	(add_omp_infile_spec_func, spec_lang_mask_accept): New.
	(driver_self_specs) [ENABLE_OFFLOADING]: Add spec to use
	%:add-omp-infile().
	(static_spec_functions): Add add-omp-infile.
	(struct switchstr): Add lang_mask field.
	(struct infile): Add lang_mask field.
	(add_infile, save_switch, do_spec): Add lang_mask argument.
	(driver_unknown_option_callback, driver_wrong_lang_callback)
	(driver_handle_option, process_command, do_self_spec)
	(driver::do_spec_on_infiles): All callers changed.
	(process_command): Call handle_foffload_option (OFFLOAD_TARGETS)
	if no offload target specified.
	(give_switch): Check languages of switch against
	spec_lang_mask_accept.
	(driver::maybe_putenv_OFFLOAD_TARGETS): Do not use intermediate
	targets variable.
	* gcc.h (do_spec): Update prototype.

gcc/fortran:
2015-08-27  Joseph Myers  <joseph@codesourcery.com>

	* gfortranspec.c (lang_specific_pre_link): Update call to do_spec.

gcc/java:
2015-08-27  Joseph Myers  <joseph@codesourcery.com>

	* jvspec.c (lang_specific_pre_link): Update call to do_spec.

libgomp:
2015-08-27  Thomas Schwinge  <thomas@codesourcery.com>
	    Joseph Myers  <joseph@codesourcery.com>

	* plugin/configfrag.ac (tgt_name): Do not set.
	(offload_targets): Separate with colons not commas.
	* config.h.in, configure: Regenerate.
	* libgomp.map (GOMP_4.0.2): Add GOMP_set_offload_targets.
	* libgomp_g.h (GOMP_set_offload_targets): New prototype.
	* target.c (offload_target_to_plugin_name, gomp_offload_targets)
	(gomp_offload_targets_init, GOMP_set_offload_targets): New.
	(gomp_target_init): Use gomp_offload_targets instead of
	OFFLOAD_TARGETS.  Handle and rewrite colon-separated string.
	* testsuite/lib/libgomp.exp: Expect offload targets to be
	colon-separated.  Adjust matching of offload targets.
	(libgomp_init)
	(check_effective_target_openacc_nvidia_accel_supported)
	(check_effective_target_openacc_host_selected): Adjust checks of
	offload target names.
	* testsuite/libgomp.oacc-c++/c++.exp: Adjust set of offload
	targets.  Use -foffload=.
	* testsuite/libgomp.oacc-c/c.exp: Adjust set of offload targets.
	Use -foffload=.
	* testsuite/libgomp.oacc-fortran/fortran.exp: Adjust set of
	offload targets.  Use -foffload=.

Index: libgomp/config.h.in
===================================================================
--- libgomp/config.h.in	(revision 227194)
+++ libgomp/config.h.in	(working copy)
@@ -95,7 +95,7 @@
    */
 #undef LT_OBJDIR
 
-/* Define to hold the list of target names suitable for offloading. */
+/* Define to hold the list of offload targets, separated by colons. */
 #undef OFFLOAD_TARGETS
 
 /* Name of package */
Index: libgomp/target.c
===================================================================
--- libgomp/target.c	(revision 227194)
+++ libgomp/target.c	(working copy)
@@ -1209,6 +1209,41 @@ gomp_load_plugin_for_device (struct gomp_device_de
   return 0;
 }
 
+/* Return the corresponding plugin name for the offload target name
+   OFFLOAD_TARGET.  */
+
+static const char *
+offload_target_to_plugin_name (const char *offload_target)
+{
+  if (strstr (offload_target, "-intelmic") != NULL)
+    return "intelmic";
+  if (strncmp (offload_target, "nvptx", 5) == 0)
+    return "nvptx";
+  gomp_fatal ("Unknown offload target: %s", offload_target);
+}
+
+/* List of offload targets, separated by colon.  Defaults to the list
+   determined when configuring libgomp.  */
+static const char *gomp_offload_targets = OFFLOAD_TARGETS;
+static bool gomp_offload_targets_init = false;
+
+/* Override the list of offload targets with OFFLOAD_TARGETS, the set
+   passed to the compiler at link time.  This must be called early,
+   and only once.  */
+
+void
+GOMP_set_offload_targets (const char *offload_targets)
+{
+  gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, offload_targets);
+
+  /* Make sure this gets called early.  */
+  assert (gomp_is_initialized == PTHREAD_ONCE_INIT);
+  /* Make sure this only gets called once.  */
+  assert (!gomp_offload_targets_init);
+  gomp_offload_targets_init = true;
+  gomp_offload_targets = offload_targets;
+}
+
 /* This function initializes the runtime needed for offloading.
    It parses the list of offload targets and tries to load the plugins for
    these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -1228,26 +1263,45 @@ gomp_target_init (void)
   num_devices = 0;
   devices = NULL;
 
-  cur = OFFLOAD_TARGETS;
+  cur = gomp_offload_targets;
   if (*cur)
     do
       {
 	struct gomp_device_descr current_device;
 
-	next = strchr (cur, ',');
-
-	plugin_name = (char *) malloc (1 + (next ? next - cur : strlen (cur))
-				       + strlen (prefix) + strlen (suffix));
+	next = strchr (cur, ':');
+	size_t prefix_len = strlen (prefix);
+	size_t cur_len = next ? next - cur : strlen (cur);
+	size_t suffix_len = strlen (suffix);
+	plugin_name = (char *) malloc (prefix_len
+				       + cur_len
+				       + suffix_len
+				       + 1);
 	if (!plugin_name)
 	  {
 	    num_devices = 0;
 	    break;
 	  }
+	memcpy (plugin_name, prefix, prefix_len);
+	memcpy (plugin_name + prefix_len, cur, cur_len);
+	/* NUL-terminate the string here...  */
+	plugin_name[prefix_len + cur_len] = '\0';
+	/* ..., so that we can then use it to translate the offload target to
+	   the plugin name...  */
+	const char *cur_plugin_name
+	  = offload_target_to_plugin_name (plugin_name
+					   + prefix_len);
+	size_t cur_plugin_name_len = strlen (cur_plugin_name);
+	assert (cur_plugin_name_len <= cur_len);
+	/* ..., and then rewrite it.  */
+	memcpy (plugin_name + prefix_len,
+		cur_plugin_name, cur_plugin_name_len);
+	memcpy (plugin_name + prefix_len + cur_plugin_name_len,
+		suffix, suffix_len);
+	plugin_name[prefix_len
+		    + cur_plugin_name_len
+		    + suffix_len] = '\0';
 
-	strcpy (plugin_name, prefix);
-	strncat (plugin_name, cur, next ? next - cur : strlen (cur));
-	strcat (plugin_name, suffix);
-
 	if (gomp_load_plugin_for_device (&current_device, plugin_name))
 	  {
 	    new_num_devices = current_device.get_num_devices_func ();
Index: libgomp/configure
===================================================================
--- libgomp/configure	(revision 227194)
+++ libgomp/configure	(working copy)
@@ -15236,10 +15236,8 @@ if test x"$enable_offload_targets" != x; then
     tgt=`echo $tgt | sed 's/=.*//'`
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
-	tgt_name=intelmic
 	;;
       nvptx*)
-        tgt_name=nvptx
 	PLUGIN_NVPTX=$tgt
 	PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
 	PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
@@ -15282,9 +15280,9 @@ rm -f core conftest.err conftest.$ac_objext \
 	;;
     esac
     if test x"$offload_targets" = x; then
-      offload_targets=$tgt_name
+      offload_targets=$tgt
     else
-      offload_targets=$offload_targets,$tgt_name
+      offload_targets=$offload_targets:$tgt
     fi
     if test x"$tgt_dir" != x; then
       offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
Index: libgomp/libgomp_g.h
===================================================================
--- libgomp/libgomp_g.h	(revision 227194)
+++ libgomp/libgomp_g.h	(working copy)
@@ -206,6 +206,7 @@ extern void GOMP_single_copy_end (void *);
 
 /* target.c */
 
+extern void GOMP_set_offload_targets (const char *);
 extern void GOMP_target (int, void (*) (void *), const void *,
 			 size_t, void **, size_t *, unsigned char *);
 extern void GOMP_target_data (int, const void *,
Index: libgomp/plugin/configfrag.ac
===================================================================
--- libgomp/plugin/configfrag.ac	(revision 227194)
+++ libgomp/plugin/configfrag.ac	(working copy)
@@ -92,10 +92,8 @@ if test x"$enable_offload_targets" != x; then
     tgt=`echo $tgt | sed 's/=.*//'`
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
-	tgt_name=intelmic
 	;;
       nvptx*)
-        tgt_name=nvptx
 	PLUGIN_NVPTX=$tgt
 	PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
 	PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
@@ -127,9 +125,9 @@ if test x"$enable_offload_targets" != x; then
 	;;
     esac
     if test x"$offload_targets" = x; then
-      offload_targets=$tgt_name
+      offload_targets=$tgt
     else
-      offload_targets=$offload_targets,$tgt_name
+      offload_targets=$offload_targets:$tgt
     fi
     if test x"$tgt_dir" != x; then
       offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
@@ -141,7 +139,7 @@ if test x"$enable_offload_targets" != x; then
   done
 fi
 AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets",
-  [Define to hold the list of target names suitable for offloading.])
+  [Define to hold the list of offload targets, separated by colons.])
 AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
   [Define to 1 if the NVIDIA plugin is built, 0 if not.])
Index: libgomp/libgomp.map
===================================================================
--- libgomp/libgomp.map	(revision 227194)
+++ libgomp/libgomp.map	(working copy)
@@ -238,6 +238,7 @@ GOMP_4.0.2 {
   global:
 	GOMP_offload_register_ver;
 	GOMP_offload_unregister_ver;
+	GOMP_set_offload_targets;
 } GOMP_4.0.1;
 
 OACC_2.0 {
Index: libgomp/testsuite/libgomp.oacc-c++/c++.exp
===================================================================
--- libgomp/testsuite/libgomp.oacc-c++/c++.exp	(revision 227194)
+++ libgomp/testsuite/libgomp.oacc-c++/c++.exp	(working copy)
@@ -75,13 +75,12 @@ if { $lang_test_file_found } {
 
     # Test OpenACC with available accelerators.
     foreach offload_target_openacc $offload_targets_s_openacc {
-	set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
-
-	switch $offload_target_openacc {
-	    host {
+	switch -glob $offload_target_openacc {
+	    disable {
 		set acc_mem_shared 1
+		set tagopt "-DACC_DEVICE_TYPE_host=1"
 	    }
-	    nvidia {
+	    nvptx* {
 		if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		    # Don't bother; execution testing is going to FAIL.
 		    untested "$subdir $offload_target_openacc offloading"
@@ -95,15 +94,14 @@ if { $lang_test_file_found } {
 		lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/libgomp.oacc-c-c++-common"
 
 		set acc_mem_shared 0
+		set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	    }
 	    default {
 		set acc_mem_shared 0
 	    }
 	}
-	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
+	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
-	setenv ACC_DEVICE_TYPE $offload_target_openacc
-
 	dg-runtest $tests "$tagopt" "$libstdcxx_includes $DEFAULT_CFLAGS"
     }
 }
Index: libgomp/testsuite/lib/libgomp.exp
===================================================================
--- libgomp/testsuite/lib/libgomp.exp	(revision 227194)
+++ libgomp/testsuite/lib/libgomp.exp	(working copy)
@@ -36,24 +36,21 @@ load_gcc_lib fortran-modules.exp
 load_file libgomp-test-support.exp
 
 # Populate offload_targets_s (offloading targets separated by a space), and
-# offload_targets_s_openacc (the same, but with OpenACC names; OpenACC spells
-# some of them a little differently).
-set offload_targets_s [split $offload_targets ","]
+# offload_targets_s_openacc (those suitable for OpenACC).
+set offload_targets_s [split $offload_targets ":"]
 set offload_targets_s_openacc {}
 foreach offload_target_openacc $offload_targets_s {
-    switch $offload_target_openacc {
-	intelmic {
+    switch -glob $offload_target_openacc {
+	*-intelmic* {
 	    # Skip; will all FAIL because of missing
 	    # GOMP_OFFLOAD_CAP_OPENACC_200.
 	    continue
 	}
-	nvptx {
-	    set offload_target_openacc "nvidia"
-	}
     }
     lappend offload_targets_s_openacc "$offload_target_openacc"
 }
-lappend offload_targets_s_openacc "host"
+# Host fallback.
+lappend offload_targets_s_openacc "disable"
 
 set dg-do-what-default run
 
@@ -134,7 +131,7 @@ proc libgomp_init { args } {
     # Add liboffloadmic build directory in LD_LIBRARY_PATH to support
     # non-fallback testing for Intel MIC targets
     global offload_targets
-    if { [string match "*,intelmic,*" ",$offload_targets,"] } {
+    if { [string match "*:*-intelmic*:*" ":$offload_targets:"] } {
 	append always_ld_library_path ":${blddir}/../liboffloadmic/.libs"
 	append always_ld_library_path ":${blddir}/../liboffloadmic/plugin/.libs"
 	# libstdc++ is required by liboffloadmic
@@ -332,15 +329,14 @@ proc check_effective_target_openacc_nvidia_accel_p
 }
 
 # Return 1 if at least one nvidia board is present, and the nvidia device type
-# is selected by default by means of setting the environment variable
-# ACC_DEVICE_TYPE.
+# is selected by default.
 
 proc check_effective_target_openacc_nvidia_accel_selected { } {
     if { ![check_effective_target_openacc_nvidia_accel_present] } {
 	return 0;
     }
     global offload_target_openacc
-    if { $offload_target_openacc == "nvidia" } {
+    if { [string match "nvptx*" $offload_target_openacc] } {
         return 1;
     }
     return 0;
@@ -350,7 +346,7 @@ proc check_effective_target_openacc_nvidia_accel_s
 
 proc check_effective_target_openacc_host_selected { } {
     global offload_target_openacc
-    if { $offload_target_openacc == "host" } {
+    if { $offload_target_openacc == "disable" } {
         return 1;
     }
     return 0;
Index: libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
===================================================================
--- libgomp/testsuite/libgomp.oacc-fortran/fortran.exp	(revision 227194)
+++ libgomp/testsuite/libgomp.oacc-fortran/fortran.exp	(working copy)
@@ -67,13 +67,12 @@ if { $lang_test_file_found } {
 
     # Test OpenACC with available accelerators.
     foreach offload_target_openacc $offload_targets_s_openacc {
-	set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
-
-	switch $offload_target_openacc {
-	    host {
+	switch -glob $offload_target_openacc {
+	    disable {
 		set acc_mem_shared 1
+		set tagopt "-DACC_DEVICE_TYPE_host=1"
 	    }
-	    nvidia {
+	    nvptx* {
 		if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		    # Don't bother; execution testing is going to FAIL.
 		    untested "$subdir $offload_target_openacc offloading"
@@ -81,15 +80,14 @@ if { $lang_test_file_found } {
 		}
 
 		set acc_mem_shared 0
+		set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	    }
 	    default {
 		set acc_mem_shared 0
 	    }
 	}
-	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
+	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
-	setenv ACC_DEVICE_TYPE $offload_target_openacc
-
 	# For Fortran we're doing torture testing, as Fortran has far more tests
 	# with arrays etc. that testing just -O0 or -O2 is insufficient, that is
 	# typically not the case for C/C++.
Index: libgomp/testsuite/libgomp.oacc-c/c.exp
===================================================================
--- libgomp/testsuite/libgomp.oacc-c/c.exp	(revision 227194)
+++ libgomp/testsuite/libgomp.oacc-c/c.exp	(working copy)
@@ -38,13 +38,13 @@ set_ld_library_path_env_vars
 set SAVE_ALWAYS_CFLAGS "$ALWAYS_CFLAGS"
 foreach offload_target_openacc $offload_targets_s_openacc {
     set ALWAYS_CFLAGS "$SAVE_ALWAYS_CFLAGS"
-    set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
 
-    switch $offload_target_openacc {
-	host {
+    switch -glob $offload_target_openacc {
+	disable {
 	    set acc_mem_shared 1
+	    set tagopt "-DACC_DEVICE_TYPE_host=1"
 	}
-	nvidia {
+	nvptx* {
 	    if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		# Don't bother; execution testing is going to FAIL.
 		untested "$subdir $offload_target_openacc offloading"
@@ -58,15 +58,14 @@ foreach offload_target_openacc $offload_targets_s_
 	    lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/libgomp.oacc-c-c++-common"
 
 	    set acc_mem_shared 0
+	    set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	}
 	default {
 	    set acc_mem_shared 0
 	}
     }
-    set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
+    set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
-    setenv ACC_DEVICE_TYPE $offload_target_openacc
-
     dg-runtest $tests "$tagopt" $DEFAULT_CFLAGS
 }
 
Index: gcc/java/jvspec.c
===================================================================
--- gcc/java/jvspec.c	(revision 227194)
+++ gcc/java/jvspec.c	(working copy)
@@ -629,7 +629,7 @@ lang_specific_pre_link (void)
      class name.  Append dummy `.c' that can be stripped by set_input so %b
      is correct.  */ 
   set_input (concat (main_class_name, "main.c", NULL));
-  err = do_spec (jvgenmain_spec);
+  err = do_spec (jvgenmain_spec, 0);
   if (err == 0)
     {
       /* Shift the outfiles array so the generated main comes first.
Index: gcc/gcc.c
===================================================================
--- gcc/gcc.c	(revision 227194)
+++ gcc/gcc.c	(working copy)
@@ -284,7 +284,7 @@ static const char *const spec_version = DEFAULT_TA
 static const char *spec_machine = DEFAULT_TARGET_MACHINE;
 static const char *spec_host_machine = DEFAULT_REAL_TARGET_MACHINE;
 
-/* List of offload targets.  */
+/* List of offload targets.  Empty string for -foffload=disable.  */
 
 static char *offload_targets = NULL;
 
@@ -400,6 +400,8 @@ static const char *compare_debug_auxbase_opt_spec_
 static const char *pass_through_libs_spec_func (int, const char **);
 static const char *replace_extension_spec_func (int, const char **);
 static const char *greater_than_spec_func (int, const char **);
+static const char *add_omp_infile_spec_func (int, const char **);
+
 static char *convert_white_space (char *);
 \f
 /* The Specs Language
@@ -1186,6 +1188,11 @@ static const char *const multilib_defaults_raw[] =
 
 static const char *const driver_self_specs[] = {
   "%{fdump-final-insns:-fdump-final-insns=.} %<fdump-final-insns",
+#ifdef ENABLE_OFFLOADING
+  /* If linking against libgomp, add a setup file.  */
+  "%{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*} 1):" \
+  "%:add-omp-infile()}",
+#endif /* ENABLE_OFFLOADING */
   DRIVER_SELF_SPECS, CONFIGURE_SPECS, GOMP_SELF_SPECS, GTM_SELF_SPECS,
   CILK_SELF_SPECS
 };
@@ -1613,6 +1620,7 @@ static const struct spec_function static_spec_func
   { "pass-through-libs",	pass_through_libs_spec_func },
   { "replace-extension",	replace_extension_spec_func },
   { "gt",			greater_than_spec_func },
+  { "add-omp-infile",		add_omp_infile_spec_func },
 #ifdef EXTRA_SPEC_FUNCTIONS
   EXTRA_SPEC_FUNCTIONS
 #endif
@@ -3209,7 +3217,8 @@ execute (void)
    The `validated' field describes whether any spec has looked at this switch;
    if it remains false at the end of the run, the switch must be meaningless.
    The `ordering' field is used to temporarily mark switches that have to be
-   kept in a specific order.  */
+   kept in a specific order.
+   The `lang_mask' field stores the flags associated with this option.  */
 
 #define SWITCH_LIVE    			(1 << 0)
 #define SWITCH_FALSE   			(1 << 1)
@@ -3225,6 +3234,7 @@ struct switchstr
   bool known;
   bool validated;
   bool ordering;
+  unsigned int lang_mask;
 };
 
 static struct switchstr *switches;
@@ -3233,6 +3243,10 @@ static int n_switches;
 
 static int n_switches_alloc;
 
+/* If nonzero, do not pass through switches for languages not matching
+   this mask.  */
+static unsigned int spec_lang_mask_accept;
+
 /* Set to zero if -fcompare-debug is disabled, positive if it's
    enabled and we're running the first compilation, negative if it's
    enabled and we're running the second compilation.  For most of the
@@ -3270,6 +3284,7 @@ struct infile
   const char *name;
   const char *language;
   struct compiler *incompiler;
+  unsigned int lang_mask;
   bool compiled;
   bool preprocessed;
 };
@@ -3463,15 +3478,16 @@ alloc_infile (void)
     }
 }
 
-/* Store an input file with the given NAME and LANGUAGE in
+/* Store an input file with the given NAME and LANGUAGE and LANG_MASK in
    infiles.  */
 
 static void
-add_infile (const char *name, const char *language)
+add_infile (const char *name, const char *language, unsigned int lang_mask)
 {
   alloc_infile ();
   infiles[n_infiles].name = name;
-  infiles[n_infiles++].language = language;
+  infiles[n_infiles].language = language;
+  infiles[n_infiles++].lang_mask = lang_mask;
 }
 
 /* Allocate space for a switch in switches.  */
@@ -3492,11 +3508,12 @@ alloc_switch (void)
 }
 
 /* Save an option OPT with N_ARGS arguments in array ARGS, marking it
-   as validated if VALIDATED and KNOWN if it is an internal switch.  */
+   as validated if VALIDATED and KNOWN if it is an internal switch.
+   LANG_MASK is the flags associated with this option.  */
 
 static void
 save_switch (const char *opt, size_t n_args, const char *const *args,
-	     bool validated, bool known)
+	     bool validated, bool known, unsigned int lang_mask)
 {
   alloc_switch ();
   switches[n_switches].part1 = opt + 1;
@@ -3513,6 +3530,7 @@ save_switch (const char *opt, size_t n_args, const
   switches[n_switches].validated = validated;
   switches[n_switches].known = known;
   switches[n_switches].ordering = 0;
+  switches[n_switches].lang_mask = lang_mask;
   n_switches++;
 }
 
@@ -3530,7 +3548,8 @@ driver_unknown_option_callback (const struct cl_de
 	 diagnosed only if there are warnings.  */
       save_switch (decoded->canonical_option[0],
 		   decoded->canonical_option_num_elements - 1,
-		   &decoded->canonical_option[1], false, true);
+		   &decoded->canonical_option[1], false, true,
+		   cl_options[decoded->opt_index].flags);
       return false;
     }
   if (decoded->opt_index == OPT_SPECIAL_unknown)
@@ -3538,7 +3557,8 @@ driver_unknown_option_callback (const struct cl_de
       /* Give it a chance to define it a spec file.  */
       save_switch (decoded->canonical_option[0],
 		   decoded->canonical_option_num_elements - 1,
-		   &decoded->canonical_option[1], false, false);
+		   &decoded->canonical_option[1], false, false,
+		   cl_options[decoded->opt_index].flags);
       return false;
     }
   else
@@ -3565,7 +3585,8 @@ driver_wrong_lang_callback (const struct cl_decode
   else
     save_switch (decoded->canonical_option[0],
 		 decoded->canonical_option_num_elements - 1,
-		 &decoded->canonical_option[1], false, true);
+		 &decoded->canonical_option[1], false, true,
+		 option->flags);
 }
 
 static const char *spec_lang = 0;
@@ -3815,7 +3836,8 @@ driver_handle_option (struct gcc_options *opts,
 	compare_debug_opt = NULL;
       else
 	compare_debug_opt = arg;
-      save_switch (compare_debug_replacement_opt, 0, NULL, validated, true);
+      save_switch (compare_debug_replacement_opt, 0, NULL, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
     case OPT_fdiagnostics_color_:
@@ -3870,17 +3892,17 @@ driver_handle_option (struct gcc_options *opts,
 	for (j = 0; arg[j]; j++)
 	  if (arg[j] == ',')
 	    {
-	      add_infile (save_string (arg + prev, j - prev), "*");
+	      add_infile (save_string (arg + prev, j - prev), "*", 0);
 	      prev = j + 1;
 	    }
 	/* Record the part after the last comma.  */
-	add_infile (arg + prev, "*");
+	add_infile (arg + prev, "*", 0);
       }
       do_save = false;
       break;
 
     case OPT_Xlinker:
-      add_infile (arg, "*");
+      add_infile (arg, "*", 0);
       do_save = false;
       break;
 
@@ -3897,19 +3919,21 @@ driver_handle_option (struct gcc_options *opts,
     case OPT_l:
       /* POSIX allows separation of -l and the lib arg; canonicalize
 	 by concatenating -l with its arg */
-      add_infile (concat ("-l", arg, NULL), "*");
+      add_infile (concat ("-l", arg, NULL), "*", 0);
       do_save = false;
       break;
 
     case OPT_L:
       /* Similarly, canonicalize -L for linkers that may not accept
 	 separate arguments.  */
-      save_switch (concat ("-L", arg, NULL), 0, NULL, validated, true);
+      save_switch (concat ("-L", arg, NULL), 0, NULL, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
     case OPT_F:
       /* Likewise -F.  */
-      save_switch (concat ("-F", arg, NULL), 0, NULL, validated, true);
+      save_switch (concat ("-F", arg, NULL), 0, NULL, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
     case OPT_save_temps:
@@ -4032,7 +4056,8 @@ driver_handle_option (struct gcc_options *opts,
       save_temps_prefix = xstrdup (arg);
       /* On some systems, ld cannot handle "-o" without a space.  So
 	 split the option from its argument.  */
-      save_switch ("-o", 1, &arg, validated, true);
+      save_switch ("-o", 1, &arg, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
 #ifdef ENABLE_DEFAULT_PIE
@@ -4068,7 +4093,8 @@ driver_handle_option (struct gcc_options *opts,
   if (do_save)
     save_switch (decoded->canonical_option[0],
 		 decoded->canonical_option_num_elements - 1,
-		 &decoded->canonical_option[1], validated, true);
+		 &decoded->canonical_option[1], validated, true,
+		 cl_options[opt_index].flags);
   return true;
 }
 
@@ -4365,7 +4391,7 @@ process_command (unsigned int decoded_options_coun
           if (strcmp (fname, "-") != 0 && access (fname, F_OK) < 0)
 	    perror_with_name (fname);
           else
-	    add_infile (arg, spec_lang);
+	    add_infile (arg, spec_lang, 0);
 
           free (fname);
 	  continue;
@@ -4376,6 +4402,11 @@ process_command (unsigned int decoded_options_coun
 			   CL_DRIVER, &handlers, global_dc);
     }
 
+  /* If the user didn't specify any, default to all configured offload
+     targets.  */
+  if (offload_targets == NULL)
+    handle_foffload_option (OFFLOAD_TARGETS);
+
   if (output_file
       && strcmp (output_file, "-") != 0
       && strcmp (output_file, HOST_BIT_BUCKET) != 0)
@@ -4507,7 +4538,8 @@ process_command (unsigned int decoded_options_coun
   if (compare_debug == 2 || compare_debug == 3)
     {
       const char *opt = concat ("-fcompare-debug=", compare_debug_opt, NULL);
-      save_switch (opt, 0, NULL, false, true);
+      save_switch (opt, 0, NULL, false, true,
+		   cl_options[OPT_fcompare_debug_].flags);
       compare_debug = 1;
     }
 
@@ -4518,7 +4550,7 @@ process_command (unsigned int decoded_options_coun
 
       /* Create a dummy input file, so that we can pass
 	 the help option on to the various sub-processes.  */
-      add_infile ("help-dummy", "c");
+      add_infile ("help-dummy", "c", 0);
     }
 
   alloc_switch ();
@@ -4719,13 +4751,15 @@ insert_wrapper (const char *wrapper)
 }
 
 /* Process the spec SPEC and run the commands specified therein.
+   If LANG_MASK is nonzero, switches for other languages are discarded.
    Returns 0 if the spec is successfully processed; -1 if failed.  */
 
 int
-do_spec (const char *spec)
+do_spec (const char *spec, unsigned int lang_mask)
 {
   int value;
 
+  spec_lang_mask_accept = lang_mask;
   value = do_spec_2 (spec);
 
   /* Force out any unfinished command.
@@ -4883,7 +4917,8 @@ do_self_spec (const char *spec)
 	      save_switch (decoded_options[j].canonical_option[0],
 			   (decoded_options[j].canonical_option_num_elements
 			    - 1),
-			   &decoded_options[j].canonical_option[1], false, true);
+			   &decoded_options[j].canonical_option[1], false, true,
+			   cl_options[decoded_options[j].opt_index].flags);
 	      break;
 
 	    default:
@@ -6479,6 +6514,14 @@ check_live_switch (int switchnum, int prefix_lengt
 static void
 give_switch (int switchnum, int omit_first_word)
 {
+  int lang_mask = switches[switchnum].lang_mask & ((1U << cl_lang_count) - 1);
+  unsigned int lang_mask_accept = (1U << cl_lang_count) - 1;
+  if (spec_lang_mask_accept != 0)
+    lang_mask_accept = spec_lang_mask_accept;
+  /* Drop switches specific to a language not in the given mask.  */
+  if (lang_mask != 0 && !(lang_mask & lang_mask_accept))
+    return;
+
   if ((switches[switchnum].live_cond & SWITCH_IGNORE) != 0)
     return;
 
@@ -7572,22 +7615,14 @@ driver::maybe_putenv_COLLECT_LTO_WRAPPER () const
 void
 driver::maybe_putenv_OFFLOAD_TARGETS () const
 {
-  const char *targets = offload_targets;
-
-  /* If no targets specified by -foffload, use all available targets.  */
-  if (!targets)
-    targets = OFFLOAD_TARGETS;
-
-  if (strlen (targets) > 0)
+  if (offload_targets && offload_targets[0] != '\0')
     {
       obstack_grow (&collect_obstack, "OFFLOAD_TARGET_NAMES=",
 		    sizeof ("OFFLOAD_TARGET_NAMES=") - 1);
-      obstack_grow (&collect_obstack, targets,
-		    strlen (targets) + 1);
+      obstack_grow (&collect_obstack, offload_targets,
+		    strlen (offload_targets) + 1);
       xputenv (XOBFINISH (&collect_obstack, char *));
     }
-
-  free (offload_targets);
 }
 
 /* Reject switches that no pass was interested in.  */
@@ -7891,7 +7926,8 @@ driver::do_spec_on_infiles () const
 		  debug_check_temp_file[1] = NULL;
 		}
 
-	      value = do_spec (input_file_compiler->spec);
+	      value = do_spec (input_file_compiler->spec,
+			       infiles[i].lang_mask);
 	      infiles[i].compiled = true;
 	      if (value < 0)
 		this_file_error = 1;
@@ -7905,7 +7941,8 @@ driver::do_spec_on_infiles () const
 		  n_switches_alloc = n_switches_alloc_debug_check[1];
 		  switches = switches_debug_check[1];
 
-		  value = do_spec (input_file_compiler->spec);
+		  value = do_spec (input_file_compiler->spec,
+				   infiles[i].lang_mask);
 
 		  compare_debug = -compare_debug;
 		  n_switches = n_switches_debug_check[0];
@@ -8060,7 +8097,7 @@ driver::maybe_run_linker (const char *argv0) const
 		    " to the linker.\n\n"));
 	  fflush (stdout);
 	}
-      int value = do_spec (link_command_spec);
+      int value = do_spec (link_command_spec, 0);
       if (value < 0)
 	errorcount = 1;
       linker_was_run = (tmp != execution_count);
@@ -9651,6 +9688,50 @@ greater_than_spec_func (int argc, const char **arg
   return NULL;
 }
 
+/* If applicable, generate a C source file containing a constructor call to
+   GOMP_set_offload_targets, to inform libgomp which offload targets have
+   actually been requested (-foffload=[...]), and adds that as an infile.  */
+
+static const char *
+add_omp_infile_spec_func (int argc, const char **)
+{
+  gcc_assert (argc == 0);
+  gcc_assert (offload_targets != NULL);
+
+  /* Nothing to do if we're not actually linking.  */
+  if (have_c)
+    return NULL;
+
+  int err;
+  const char *tmp_filename;
+  tmp_filename = make_temp_file (".c");
+  record_temp_file (tmp_filename, !save_temps_flag, 0);
+  FILE *f = fopen (tmp_filename, "w");
+  if (f == NULL)
+    fatal_error (input_location,
+		 "could not open temporary file %s", tmp_filename);
+  /* As libgomp uses constructors internally, and this code is only added when
+     linking against libgomp, it is fine to use a constructor here.  */
+  err = fprintf (f,
+		 "extern void GOMP_set_offload_targets (const char *);\n"
+		 "static __attribute__ ((constructor)) void\n"
+		 "init (void)\n"
+		 "{\n"
+		 "  GOMP_set_offload_targets (\"%s\");\n"
+		 "}\n",
+		 offload_targets);
+  if (err < 0)
+    fatal_error (input_location,
+		 "could not write to temporary file %s", tmp_filename);
+  err = fclose (f);
+  if (err == EOF)
+    fatal_error (input_location,
+		 "could not close temporary file %s", tmp_filename);
+
+  add_infile (tmp_filename, "cpp-output", CL_C);
+  return NULL;
+}
+
 /* Insert backslash before spaces in ORIG (usually a file path), to 
    avoid being broken by spec parser.
 
Index: gcc/gcc.h
===================================================================
--- gcc/gcc.h	(revision 227194)
+++ gcc/gcc.h	(working copy)
@@ -68,7 +68,7 @@ struct spec_function
 };
 
 /* These are exported by gcc.c.  */
-extern int do_spec (const char *);
+extern int do_spec (const char *, unsigned int);
 extern void record_temp_file (const char *, int, int);
 extern void pfatal_with_name (const char *) ATTRIBUTE_NORETURN;
 extern void set_input (const char *);
Index: gcc/fortran/gfortranspec.c
===================================================================
--- gcc/fortran/gfortranspec.c	(revision 227194)
+++ gcc/fortran/gfortranspec.c	(working copy)
@@ -439,7 +439,7 @@ int
 lang_specific_pre_link (void)
 {
   if (library)
-    do_spec ("%:include(libgfortran.spec)");
+    do_spec ("%:include(libgfortran.spec)", 0);
 
   return 0;
 }

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Ping Re: Pass -foffload targets from driver to libgomp at link time
  2015-08-27 20:58 Pass -foffload targets from driver to libgomp at link time Joseph Myers
@ 2015-09-03 14:58 ` Joseph Myers
  2015-09-10 14:01   ` Ping^2 " Joseph Myers
  0 siblings, 1 reply; 62+ messages in thread
From: Joseph Myers @ 2015-09-03 14:58 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas, Nathan Sidwell, jakub

Ping.  This patch 
<https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01748.html> is pending 
review.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Ping^2 Re: Pass -foffload targets from driver to libgomp at link time
  2015-09-03 14:58 ` Ping " Joseph Myers
@ 2015-09-10 14:01   ` Joseph Myers
  2015-09-10 14:03     ` Bernd Schmidt
  0 siblings, 1 reply; 62+ messages in thread
From: Joseph Myers @ 2015-09-10 14:01 UTC (permalink / raw)
  To: gcc-patches; +Cc: thomas, Nathan Sidwell, jakub

Ping^2.  This patch 
<https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01748.html> is still 
pending review.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Ping^2 Re: Pass -foffload targets from driver to libgomp at link time
  2015-09-10 14:01   ` Ping^2 " Joseph Myers
@ 2015-09-10 14:03     ` Bernd Schmidt
  2015-09-11 14:29       ` Joseph Myers
  0 siblings, 1 reply; 62+ messages in thread
From: Bernd Schmidt @ 2015-09-10 14:03 UTC (permalink / raw)
  To: Joseph Myers, gcc-patches; +Cc: thomas, Nathan Sidwell, jakub

On 09/10/2015 03:41 PM, Joseph Myers wrote:
> Ping^2.  This patch 
> <https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01748.html> is still 
> pending review.

No fundamental objections, but I have some questions. Cuold you describe
what the handling of flags/lang_mask accomplishes in this patch? Would
option handling be simpler if the creation/compilation of the extra file
happened in lto_wrapper (where we already do similar things through
mkoffload)?

I initially thought the information you're giving to
GOMP_set_offload_targets is already available implicitly, from the calls
to GOMP_offload_register. But digging through the archives it sounds
like the problem is that if there's no offloadable code, no offload
image will be generated. Is that correct?



Bernd

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Ping^2 Re: Pass -foffload targets from driver to libgomp at link time
  2015-09-10 14:03     ` Bernd Schmidt
@ 2015-09-11 14:29       ` Joseph Myers
  2015-09-11 14:48         ` Bernd Schmidt
  0 siblings, 1 reply; 62+ messages in thread
From: Joseph Myers @ 2015-09-11 14:29 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, thomas, Nathan Sidwell, jakub

On Thu, 10 Sep 2015, Bernd Schmidt wrote:

> On 09/10/2015 03:41 PM, Joseph Myers wrote:
> > Ping^2.  This patch 
> > <https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01748.html> is still 
> > pending review.
> 
> No fundamental objections, but I have some questions. Cuold you describe
> what the handling of flags/lang_mask accomplishes in this patch? Would
> option handling be simpler if the creation/compilation of the extra file
> happened in lto_wrapper (where we already do similar things through
> mkoffload)?

The point of the lang_mask handling is that if, say, we're compiling C++ 
or Fortran code, with options that aren't valid for C, we mustn't pass 
those options to cc1 when building the constructor as C code, but we do 
still need to pass options valid for C (which might e.g. affect the ABI).

There's an argument that this sort of option filtering should be done more 
generally.  That is, if we have a mixed-language compilation in a single 
call to the driver, it should filter the options so that cc1 gets those 
options applicable for C, cc1plus those applicable to C++, etc., with 
options for inappropriate languages only being diagnosed if none of the 
source files are for that language.  I don't know if that's the right 
thing to do or not, but it's at least plausible.

I don't see lto-wrapper as being any easier as a place to do this; no 
doubt lto-wrapper or collect2 could create the file and call back into the 
driver to compile it, but I don't see the advantage in doing that over 
having the driver (which already has all the relevant information, since 
it's coming from the command line rather than inspection of object files 
being linked) do it.

> I initially thought the information you're giving to
> GOMP_set_offload_targets is already available implicitly, from the calls
> to GOMP_offload_register. But digging through the archives it sounds
> like the problem is that if there's no offloadable code, no offload
> image will be generated. Is that correct?

Yes.  In the message Thomas referred to, "On the other hand, for example, 
for -foffload=nvptx-none, even if user program code doesn't contain any 
offloaded data (and thus the offload machinery has not been run), the user 
program might still contain any executable directives or OpenACC runtime 
library calls, so we'd still like to use the libgomp nvptx plugin.  
However, we currently cannot detect this situation.".

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Ping^2 Re: Pass -foffload targets from driver to libgomp at link time
  2015-09-11 14:29       ` Joseph Myers
@ 2015-09-11 14:48         ` Bernd Schmidt
  2015-09-11 15:28           ` Joseph Myers
  0 siblings, 1 reply; 62+ messages in thread
From: Bernd Schmidt @ 2015-09-11 14:48 UTC (permalink / raw)
  To: Joseph Myers; +Cc: gcc-patches, thomas, Nathan Sidwell, jakub



On 09/11/2015 04:23 PM, Joseph Myers wrote:
> On Thu, 10 Sep 2015, Bernd Schmidt wrote:
>
>> On 09/10/2015 03:41 PM, Joseph Myers wrote:
>>> Ping^2.  This patch
>>> <https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01748.html> is still
>>> pending review.
>>
>> No fundamental objections, but I have some questions. Cuold you describe
>> what the handling of flags/lang_mask accomplishes in this patch? Would
>> option handling be simpler if the creation/compilation of the extra file
>> happened in lto_wrapper (where we already do similar things through
>> mkoffload)?
>
> The point of the lang_mask handling is that if, say, we're compiling C++
> or Fortran code, with options that aren't valid for C, we mustn't pass
> those options to cc1 when building the constructor as C code, but we do
> still need to pass options valid for C (which might e.g. affect the ABI).
[...]
> I don't see lto-wrapper as being any easier as a place to do this; no
> doubt lto-wrapper or collect2 could create the file and call back into the
> driver to compile it, but I don't see the advantage in doing that over
> having the driver (which already has all the relevant information, since
> it's coming from the command line rather than inspection of object files
> being linked) do it.

The point would be that lto_wrapper already produces such an appropriate 
set of options. But I guess if you're thinking ahead to using this 
filtering in gcc.c for other purposes then that's also a good argument. 
So, patch is ok, but please update the comment for give_switch (document 
the new behaviour and that it depends on a global variable).

I expect you know best what to do in the OpenACC testsuite driver, but 
you might want to run the libgomp.exp parts by Jakub. If the testsuite 
parts are independent of the rest of the patch, please repost them 
separately.


Bernd

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Ping^2 Re: Pass -foffload targets from driver to libgomp at link time
  2015-09-11 14:48         ` Bernd Schmidt
@ 2015-09-11 15:28           ` Joseph Myers
  2015-09-11 15:47             ` Jakub Jelinek
  0 siblings, 1 reply; 62+ messages in thread
From: Joseph Myers @ 2015-09-11 15:28 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: gcc-patches, thomas, Nathan Sidwell, jakub

On Fri, 11 Sep 2015, Bernd Schmidt wrote:

> I expect you know best what to do in the OpenACC testsuite driver, but you
> might want to run the libgomp.exp parts by Jakub. If the testsuite parts are
> independent of the rest of the patch, please repost them separately.

Jakub?  The testsuite changes and the rest of the patch depend on each 
other.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Ping^2 Re: Pass -foffload targets from driver to libgomp at link time
  2015-09-11 15:28           ` Joseph Myers
@ 2015-09-11 15:47             ` Jakub Jelinek
  2015-09-11 16:16               ` Joseph Myers
  2015-09-28 10:09               ` Thomas Schwinge
  0 siblings, 2 replies; 62+ messages in thread
From: Jakub Jelinek @ 2015-09-11 15:47 UTC (permalink / raw)
  To: Joseph Myers; +Cc: Bernd Schmidt, gcc-patches, thomas, Nathan Sidwell

On Fri, Sep 11, 2015 at 03:26:04PM +0000, Joseph Myers wrote:
> On Fri, 11 Sep 2015, Bernd Schmidt wrote:
> 
> > I expect you know best what to do in the OpenACC testsuite driver, but you
> > might want to run the libgomp.exp parts by Jakub. If the testsuite parts are
> > independent of the rest of the patch, please repost them separately.
> 
> Jakub?  The testsuite changes and the rest of the patch depend on each 
> other.

So, do I understand well that you'll call GOMP_set_offload_targets from
constructs of all shared libraries (and the binary) that contain offloaded
code?  If yes, that is surely going to fail the assertions in there.
You can dlopen such libraries etc.  What if you link one library with
-fopenmp=nvptx-none and another one with -fopenmp=x86_64-intelmicemul-linux?
Can't the -foffload= string be passed to GOMP_offload_register_ver
(or just derive the list of plugins that should be loaded or at least those
that should be tried first from the list of offloaded data that has been
registered so far)?
I mean, it is also very well possible some program calls omp_get_num_devices
() etc. say from main binary and only then dlopens shared libraries that
contain offloaded regions and then attempt to offload in those shared
libraries.  So, better it should always load all possible plugins, but
perhaps in order determined by what has been registered?

	Jakub

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Ping^2 Re: Pass -foffload targets from driver to libgomp at link time
  2015-09-11 15:47             ` Jakub Jelinek
@ 2015-09-11 16:16               ` Joseph Myers
  2015-09-28 10:09               ` Thomas Schwinge
  1 sibling, 0 replies; 62+ messages in thread
From: Joseph Myers @ 2015-09-11 16:16 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Bernd Schmidt, gcc-patches, thomas, Nathan Sidwell

On Fri, 11 Sep 2015, Jakub Jelinek wrote:

> On Fri, Sep 11, 2015 at 03:26:04PM +0000, Joseph Myers wrote:
> > On Fri, 11 Sep 2015, Bernd Schmidt wrote:
> > 
> > > I expect you know best what to do in the OpenACC testsuite driver, but you
> > > might want to run the libgomp.exp parts by Jakub. If the testsuite parts are
> > > independent of the rest of the patch, please repost them separately.
> > 
> > Jakub?  The testsuite changes and the rest of the patch depend on each 
> > other.
> 
> So, do I understand well that you'll call GOMP_set_offload_targets from
> constructs of all shared libraries (and the binary) that contain offloaded
> code?  If yes, that is surely going to fail the assertions in there.
> You can dlopen such libraries etc.  What if you link one library with
> -fopenmp=nvptx-none and another one with -fopenmp=x86_64-intelmicemul-linux?

Thomas (I think you're back next week), any comments on how shared 
libraries with different offloading selected fit into your design 
(including the case where some but not all of the executable / shared 
libraries specify -foffload=disable)?

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Ping^2 Re: Pass -foffload targets from driver to libgomp at link time
  2015-09-11 15:47             ` Jakub Jelinek
  2015-09-11 16:16               ` Joseph Myers
@ 2015-09-28 10:09               ` Thomas Schwinge
  2015-09-29  9:48                 ` Jakub Jelinek
  1 sibling, 1 reply; 62+ messages in thread
From: Thomas Schwinge @ 2015-09-28 10:09 UTC (permalink / raw)
  To: Jakub Jelinek, gcc-patches; +Cc: Bernd Schmidt, Nathan Sidwell, Joseph Myers

[-- Attachment #1: Type: text/plain, Size: 10312 bytes --]

Hi!

On Fri, 11 Sep 2015 17:43:49 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> So, do I understand well that you'll call GOMP_set_offload_targets from
> construct[ors] of all shared libraries (and the binary) that contain offloaded
> code?  If yes, that is surely going to fail the assertions in there.

Indeed.  My original plan has been to generate/invoke this constructor
only for/from the final executable and not for any shared libraries, but
it seems I didn't implemented this correctly.

> You can dlopen such libraries etc.  What if you link one library with
> -fopenmp=nvptx-none and another one with -fopenmp=x86_64-intelmicemul-linux?

So, the first question to answer is: what do we expect to happen in this
case, or similarly, if the executable and any shared libraries are
compiled with different/incompatible -foffload options?

Given that OpenMP's default-device-var ICV is per process (that is, not
separate for the executable and any shared library), and thus, once
libgomp has settled on this ICV (by first execution of an offloading
construct, for example), any offloading attempt of code compiled with
incompatible -foffload options will have to fail, because the
corresponding offloading device's code just isn't available.  We can't
avoid this situation, as it is not possible for libgomp to simply switch
to a different offloading device (or host fallback, for that matter):
libgomp doesn't have any knowledge of the current state of data regions
setup between the host and device(s), for instance.

For this, I propose that the only mode of operation that we currently can
support is that all of the executable and any shared libraries agree on
the offload targets specified by -foffload, and I thus propose the
following patch on top of what Joseph has posted before (passes the
testsuite, but not yet tested otherwise):

 libgomp/libgomp-plugin.h |    3 +-
 libgomp/target.c         |  157 +++++++++++++++++++++++++++++++++++++---------
 2 files changed, 130 insertions(+), 30 deletions(-)

diff --git libgomp/libgomp-plugin.h libgomp/libgomp-plugin.h
index 24fbb94..5da4fa7 100644
--- libgomp/libgomp-plugin.h
+++ libgomp/libgomp-plugin.h
@@ -48,7 +48,8 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_HOST = 2,
   /* OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3 removed.  */
   OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5,
-  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
+  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6,
+  OFFLOAD_TARGET_TYPE_HWM
 };
 
 /* Auxiliary struct, used for transferring pairs of addresses from plugin
diff --git libgomp/target.c libgomp/target.c
index 4dd5913..d1e794a 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -68,6 +68,9 @@ static struct offload_image_descr *offload_images;
 /* Total number of offload images.  */
 static int num_offload_images;
 
+/* List of offload targets, separated by colon.  */
+static const char *gomp_offload_targets;
+
 /* Array of descriptors for all available devices.  */
 static struct gomp_device_descr *devices;
 
@@ -1121,6 +1124,8 @@ static bool
 gomp_load_plugin_for_device (struct gomp_device_descr *device,
 			     const char *plugin_name)
 {
+  gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, plugin_name);
+
   const char *err = NULL, *last_missing = NULL;
 
   void *plugin_handle = dlopen (plugin_name, RTLD_LAZY);
@@ -1216,39 +1221,120 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   return 0;
 }
 
-/* Return the corresponding plugin name for the offload target name
-   OFFLOAD_TARGET.  */
+/* Return the corresponding offload target type for the offload target name
+   OFFLOAD_TARGET, or 0 if unknown.  */
 
-static const char *
-offload_target_to_plugin_name (const char *offload_target)
+static enum offload_target_type
+offload_target_to_type (const char *offload_target)
 {
   if (strstr (offload_target, "-intelmic") != NULL)
-    return "intelmic";
-  if (strncmp (offload_target, "nvptx", 5) == 0)
-    return "nvptx";
-  gomp_fatal ("Unknown offload target: %s", offload_target);
+    return OFFLOAD_TARGET_TYPE_INTEL_MIC;
+  else if (strncmp (offload_target, "nvptx", 5) == 0)
+    return OFFLOAD_TARGET_TYPE_NVIDIA_PTX;
+  else
+    return 0;
 }
 
-/* List of offload targets, separated by colon.  Defaults to the list
-   determined when configuring libgomp.  */
-static const char *gomp_offload_targets = OFFLOAD_TARGETS;
-static bool gomp_offload_targets_init = false;
+/* Return the corresponding plugin name for the offload target type TYPE, or
+   NULL if unknown.  */
+
+static const char *
+offload_target_type_to_plugin_name (enum offload_target_type type)
+{
+  switch (type)
+    {
+    case OFFLOAD_TARGET_TYPE_INTEL_MIC:
+      return "intelmic";
+    case OFFLOAD_TARGET_TYPE_NVIDIA_PTX:
+      return "nvptx";
+    default:
+      return NULL;
+    }
+}
 
 /* Override the list of offload targets with OFFLOAD_TARGETS, the set
-   passed to the compiler at link time.  This must be called early,
-   and only once.  */
+   passed to the compiler at link time.  */
 
 void
 GOMP_set_offload_targets (const char *offload_targets)
 {
   gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, offload_targets);
 
-  /* Make sure this gets called early.  */
-  assert (gomp_is_initialized == PTHREAD_ONCE_INIT);
-  /* Make sure this only gets called once.  */
-  assert (!gomp_offload_targets_init);
-  gomp_offload_targets_init = true;
-  gomp_offload_targets = offload_targets;
+  char *offload_targets_dup = strdup (offload_targets);
+  if (offload_targets_dup == NULL)
+    gomp_fatal ("Out of memory");
+
+  const char *err = NULL;
+
+  gomp_mutex_lock (&register_lock);
+
+  if (gomp_offload_targets == NULL)
+    {
+      /* Make sure this gets called early.  */
+      assert (gomp_is_initialized == PTHREAD_ONCE_INIT);
+
+      gomp_offload_targets = offload_targets_dup;
+    }
+  else
+    {
+      /* If this gets called multiple times, or gets called after
+	 initialization, make sure that the same set of offload targets has
+	 been specified.  */
+
+      bool offload_targets_registered[OFFLOAD_TARGET_TYPE_HWM];
+      bool offload_targets_requested[OFFLOAD_TARGET_TYPE_HWM];
+      for (int i = 0; i < OFFLOAD_TARGET_TYPE_HWM; ++i)
+	offload_targets_registered[i] = offload_targets_requested[i] = false;
+
+      char *cur = offload_targets_dup;
+      while (cur)
+	{
+	  char *next = strchr (cur, ':');
+	  if (next != NULL)
+	    {
+	      *next = '\0';
+	      ++next;
+	    }
+	  enum offload_target_type type = offload_target_to_type (cur);
+	  if (type == 0)
+	    {
+	      /* An unknown offload target has been requested; ignore it.  This
+		 makes us (future-)proof if offload targets are requested that
+		 are not supported in this build of libgomp.  */
+	    }
+	  else
+	    offload_targets_requested[type] = true;
+
+	  cur = next;
+	}
+
+      for (int i = 0; i < num_devices; ++i)
+	offload_targets_registered[devices[i].type] = true;
+
+      for (enum offload_target_type type = 0;
+	   type < OFFLOAD_TARGET_TYPE_HWM;
+	   ++type)
+	if (offload_targets_registered[type]
+	    != offload_targets_requested[type])
+	  {
+	    err = "terminating";
+
+	    const char *name = offload_target_type_to_plugin_name (type);
+	    if (offload_targets_requested[type])
+	      gomp_error ("offload target %s has been requested, "
+			  "but not registered before", name);
+	    else
+	      gomp_error ("offload target %s has been registered, "
+			  "but not requested now", name);
+	  }
+
+      free (offload_targets_dup);
+    }
+
+  gomp_mutex_unlock (&register_lock);
+
+  if (err != NULL)
+    gomp_fatal ("%s", err);
 }
 
 /* This function initializes the runtime needed for offloading.
@@ -1264,11 +1350,15 @@ gomp_target_init (void)
   const char *prefix ="libgomp-plugin-";
   const char *suffix = SONAME_SUFFIX (1);
   const char *cur, *next;
-  char *plugin_name;
   int i, new_num_devices;
 
   gomp_mutex_lock (&register_lock);
 
+  /* If no offload targets have been requested explicitly, default to those
+     determined when configuring libgomp.  */
+  if (gomp_offload_targets == NULL)
+    gomp_offload_targets = OFFLOAD_TARGETS;
+
   num_devices = 0;
   devices = NULL;
 
@@ -1276,16 +1366,14 @@ gomp_target_init (void)
   if (*cur)
     do
       {
-	struct gomp_device_descr current_device;
-
 	next = strchr (cur, ':');
 	size_t prefix_len = strlen (prefix);
 	size_t cur_len = next ? next - cur : strlen (cur);
 	size_t suffix_len = strlen (suffix);
-	plugin_name = (char *) malloc (prefix_len
-				       + cur_len
-				       + suffix_len
-				       + 1);
+	char *plugin_name = (char *) malloc (prefix_len
+					     + cur_len
+					     + suffix_len
+					     + 1);
 	if (!plugin_name)
 	  {
 	    num_devices = 0;
@@ -1297,9 +1385,18 @@ gomp_target_init (void)
 	plugin_name[prefix_len + cur_len] = '\0';
 	/* ..., so that we can then use it to translate the offload target to
 	   the plugin name...  */
+	enum offload_target_type type
+	  = offload_target_to_type (plugin_name + prefix_len);
 	const char *cur_plugin_name
-	  = offload_target_to_plugin_name (plugin_name
-					   + prefix_len);
+	  = offload_target_type_to_plugin_name (type);
+	if (cur_plugin_name == NULL)
+	  {
+	    /* An unknown offload target has been requested; ignore it.  This
+	       makes us (future-)proof if offload targets are requested that
+	       are not supported in this build of libgomp.  */
+	    gomp_debug (0, "skipping unknown offload target %d", (int) type);
+	    goto skip;
+	  }
 	size_t cur_plugin_name_len = strlen (cur_plugin_name);
 	assert (cur_plugin_name_len <= cur_len);
 	/* ..., and then rewrite it.  */
@@ -1311,6 +1408,7 @@ gomp_target_init (void)
 		    + cur_plugin_name_len
 		    + suffix_len] = '\0';
 
+	struct gomp_device_descr current_device;
 	if (gomp_load_plugin_for_device (&current_device, plugin_name))
 	  {
 	    new_num_devices = current_device.get_num_devices_func ();
@@ -1343,6 +1441,7 @@ gomp_target_init (void)
 	      }
 	  }
 
+      skip:
 	free (plugin_name);
 	cur = next + 1;
       }


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Ping^2 Re: Pass -foffload targets from driver to libgomp at link time
  2015-09-28 10:09               ` Thomas Schwinge
@ 2015-09-29  9:48                 ` Jakub Jelinek
  2015-09-30 16:15                   ` Thomas Schwinge
  0 siblings, 1 reply; 62+ messages in thread
From: Jakub Jelinek @ 2015-09-29  9:48 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches, Bernd Schmidt, Nathan Sidwell, Joseph Myers

On Mon, Sep 28, 2015 at 11:39:10AM +0200, Thomas Schwinge wrote:
> Hi!
> 
> On Fri, 11 Sep 2015 17:43:49 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> > So, do I understand well that you'll call GOMP_set_offload_targets from
> > construct[ors] of all shared libraries (and the binary) that contain offloaded
> > code?  If yes, that is surely going to fail the assertions in there.
> 
> Indeed.  My original plan has been to generate/invoke this constructor
> only for/from the final executable and not for any shared libraries, but
> it seems I didn't implemented this correctly.

How would you mean to implement it?  -fopenmp or -fopenacc code with
offloading bits might not be in the final executable at all, nor in shared
libraries it is linked against; such libraries could be only dlopened,
consider say python plugin.  And this is not just made up, perhaps not with
offloading yet, but people regularly use OpenMP code in plugins and then we
get complains that fork child of the main program is not allowed to do
anything but async-signal-safe functions.
> 
> > You can dlopen such libraries etc.  What if you link one library with
> > -fopenmp=nvptx-none and another one with -fopenmp=x86_64-intelmicemul-linux?
> 
> So, the first question to answer is: what do we expect to happen in this
> case, or similarly, if the executable and any shared libraries are
> compiled with different/incompatible -foffload options?

As the device numbers are per-process, the only possibility I see is that
all the physically available devices are always available, and just if you
try to offload from some code to a device that doesn't support it, you get
host fallback.  Because, one shared library could carefully use device(xyz)
to offload to say XeonPhi it is compiled for and supports, and another
library device(abc) to offload to PTX it is compiled for and supports.

> For this, I propose that the only mode of operation that we currently can
> support is that all of the executable and any shared libraries agree on
> the offload targets specified by -foffload, and I thus propose the
> following patch on top of what Joseph has posted before (passes the
> testsuite, but not yet tested otherwise):

See above, no.

	Jakub

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Ping^2 Re: Pass -foffload targets from driver to libgomp at link time
  2015-09-29  9:48                 ` Jakub Jelinek
@ 2015-09-30 16:15                   ` Thomas Schwinge
  2015-10-19 16:56                     ` Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) Thomas Schwinge
  0 siblings, 1 reply; 62+ messages in thread
From: Thomas Schwinge @ 2015-09-30 16:15 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Bernd Schmidt, Nathan Sidwell, Joseph Myers

[-- Attachment #1: Type: text/plain, Size: 50015 bytes --]

Hi!

On Tue, 29 Sep 2015 10:18:14 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Sep 28, 2015 at 11:39:10AM +0200, Thomas Schwinge wrote:
> > On Fri, 11 Sep 2015 17:43:49 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> > > So, do I understand well that you'll call GOMP_set_offload_targets from
> > > construct[ors] of all shared libraries (and the binary) that contain offloaded
> > > code?  If yes, that is surely going to fail the assertions in there.
> > 
> > Indeed.  My original plan has been to generate/invoke this constructor
> > only for/from the final executable and not for any shared libraries, but
> > it seems I didn't implemented this correctly.
> 
> How would you mean to implement it?

I have come to realize that we need to generate/invoke this constructor
From everything that links against libgomp (which is what I implemented),
that is, executables as well as shared libraries.

> -fopenmp or -fopenacc code with
> offloading bits might not be in the final executable at all, nor in shared
> libraries it is linked against; such libraries could be only dlopened,
> consider say python plugin.  And this is not just made up, perhaps not with
> offloading yet, but people regularly use OpenMP code in plugins and then we
> get complains that fork child of the main program is not allowed to do
> anything but async-signal-safe functions.

I'm not sure I'm completely understanding that paragraph?  Are you saying
that offloaded code can be in libraries that are not linked against
libgomp?  How would these register (GOMP_offload_register) their
offloaded code?  I think it's a reasonable to expect that every shared
library that contains offloaded code must link against libgomp, which
will happen automatically given that it is built with -fopenmp/-fopenacc?

> > > You can dlopen such libraries etc.  What if you link one library with
> > > -fopenmp=nvptx-none and another one with -fopenmp=x86_64-intelmicemul-linux?
> > 
> > So, the first question to answer is: what do we expect to happen in this
> > case, or similarly, if the executable and any shared libraries are
> > compiled with different/incompatible -foffload options?
> 
> As the device numbers are per-process, the only possibility I see is that
> all the physically available devices are always available, and just if you
> try to offload from some code to a device that doesn't support it, you get
> host fallback.  Because, one shared library could carefully use device(xyz)
> to offload to say XeonPhi it is compiled for and supports, and another
> library device(abc) to offload to PTX it is compiled for and supports.

OK, I think I get that, and it makes sense.  Even though, I don't know
how you'd do that today: as far as I can tell, there is no specification
covering the OpenMP 4 target device IDs, so I have no idea how a user
program/library could realiably use them in practice?  For example, in
the current GCC implementation, the OpenMP 4 target device IDs depend on
the number of individual devices availble in the system, and the order in
which libgomp loads the plugins, which is defined (arbitrarily) by the
GCC configuration?

> > For this, I propose that the only mode of operation that we currently can
> > support is that all of the executable and any shared libraries agree on
> > the offload targets specified by -foffload, and I thus propose the
> > following patch on top of what Joseph has posted before (passes the
> > testsuite, but not yet tested otherwise):
> 
> See above, no.

OK.

How's the following (complete patch instead of incremental patch; the
driver changes are still the same as before)?  The changes are:

  * libgomp/target.c:gomp_target_init again loads all the plugins.
  * libgomp/target.c:resolve_device and
    libgomp/oacc-init.c:resolve_device verify that a default device
    (OpenMP device-var ICV, and acc_device_default, respectively) is
    actually enabled, or resort to host fallback if not.
  * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used
    to enable devices specified by -foffload.  Can be called multiple
    times (executable, any shared libraries); the set of enabled devices
    is the union of all those ever requested.
  * GOMP_offload_register (but not the new GOMP_offload_register_ver)
    changed to enable all devices.  This is to maintain compatibility
    with old executables and shared libraries built without the -foffload
    constructor support.
  * IntelMIC mkoffload changed to use GOMP_offload_register_ver instead
    of GOMP_offload_register, and GOMP_offload_unregister_ver instead of
    GOMP_offload_unregister.  To avoid enabling all devices
    (GOMP_offload_register).
  * New test cases to verify this (-foffload=disable, host fallback).

Ilya, I'm aware of your work on additional changes (shared memory),
<http://news.gmane.org/find-root.php?message_id=%3CCADG%3DZ0EBuhj89WEZdmaNUPy%3DE%3D63BmWofS8An8nY7rygTmdJ_w%40mail.gmail.com%3E>,
but I think my patch is still an improvement already?

Jakub, is this OK as an incremental step forward?

 gcc/config/i386/intelmic-mkoffload.c               |  20 +-
 gcc/fortran/gfortranspec.c                         |   2 +-
 gcc/gcc.c                                          | 139 +++++++++++---
 gcc/gcc.h                                          |   2 +-
 gcc/java/jvspec.c                                  |   2 +-
 libgomp/config.h.in                                |   2 +-
 libgomp/configure                                  |   6 +-
 libgomp/libgomp-plugin.h                           |   3 +-
 libgomp/libgomp.h                                  |   1 +
 libgomp/libgomp.map                                |   1 +
 libgomp/libgomp_g.h                                |   1 +
 libgomp/oacc-init.c                                |  18 +-
 libgomp/plugin/configfrag.ac                       |   8 +-
 libgomp/target.c                                   | 205 +++++++++++++++++----
 libgomp/testsuite/lib/libgomp.exp                  |  24 +--
 .../libgomp.c++/target-1-foffload_disable.C        |   3 +
 .../libgomp.c++/target-foffload_disable.C          |   3 +
 .../libgomp.c/target-1-foffload_disable.c          |   3 +
 .../testsuite/libgomp.c/target-foffload_disable.c  |  18 ++
 .../libgomp.fortran/target-foffload_disable.f      |  14 ++
 .../libgomp.fortran/target1-foffload_disable.f90   |   3 +
 libgomp/testsuite/libgomp.oacc-c++/c++.exp         |  14 +-
 libgomp/testsuite/libgomp.oacc-c/c.exp             |  13 +-
 libgomp/testsuite/libgomp.oacc-fortran/fortran.exp |  14 +-
 24 files changed, 388 insertions(+), 131 deletions(-)

diff --git gcc/config/i386/intelmic-mkoffload.c gcc/config/i386/intelmic-mkoffload.c
index 5195d91..e1cfbf9 100644
--- gcc/config/i386/intelmic-mkoffload.c
+++ gcc/config/i386/intelmic-mkoffload.c
@@ -360,26 +360,34 @@ generate_host_descr_file (const char *host_compiler)
 	   "#ifdef __cplusplus\n"
 	   "extern \"C\"\n"
 	   "#endif\n"
-	   "void GOMP_offload_register (const void *, int, const void *);\n"
+	   "void GOMP_offload_register_ver "
+	   "(unsigned version, const void *, int, const void *);\n"
 	   "#ifdef __cplusplus\n"
 	   "extern \"C\"\n"
 	   "#endif\n"
-	   "void GOMP_offload_unregister (const void *, int, const void *);\n\n"
+	   "void GOMP_offload_unregister_ver "
+	   "(unsigned version, const void *, int, const void *);\n\n"
 
 	   "__attribute__((constructor))\n"
 	   "static void\n"
 	   "init (void)\n"
 	   "{\n"
-	   "  GOMP_offload_register (&__OFFLOAD_TABLE__, %d, __offload_target_data);\n"
-	   "}\n\n", GOMP_DEVICE_INTEL_MIC);
+	   "  GOMP_offload_register_ver (%#x, &__OFFLOAD_TABLE__, "
+	   "%d, __offload_target_data);\n"
+	   "}\n\n",
+	   GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_INTEL_MIC),
+	   GOMP_DEVICE_INTEL_MIC);
 
   fprintf (src_file,
 	   "__attribute__((destructor))\n"
 	   "static void\n"
 	   "fini (void)\n"
 	   "{\n"
-	   "  GOMP_offload_unregister (&__OFFLOAD_TABLE__, %d, __offload_target_data);\n"
-	   "}\n", GOMP_DEVICE_INTEL_MIC);
+	   "  GOMP_offload_unregister_ver (%#x, &__OFFLOAD_TABLE__, "
+	   "%d, __offload_target_data);\n"
+	   "}\n",
+	   GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_INTEL_MIC),
+	   GOMP_DEVICE_INTEL_MIC);
 
   fclose (src_file);
 
diff --git gcc/fortran/gfortranspec.c gcc/fortran/gfortranspec.c
index fe594db..e3e83ba 100644
--- gcc/fortran/gfortranspec.c
+++ gcc/fortran/gfortranspec.c
@@ -439,7 +439,7 @@ int
 lang_specific_pre_link (void)
 {
   if (library)
-    do_spec ("%:include(libgfortran.spec)");
+    do_spec ("%:include(libgfortran.spec)", 0);
 
   return 0;
 }
diff --git gcc/gcc.c gcc/gcc.c
index 55a7255..ce5cba1 100644
--- gcc/gcc.c
+++ gcc/gcc.c
@@ -401,6 +401,8 @@ static const char *compare_debug_auxbase_opt_spec_function (int, const char **);
 static const char *pass_through_libs_spec_func (int, const char **);
 static const char *replace_extension_spec_func (int, const char **);
 static const char *greater_than_spec_func (int, const char **);
+static const char *add_omp_infile_spec_func (int, const char **);
+
 static char *convert_white_space (char *);
 \f
 /* The Specs Language
@@ -1189,6 +1191,11 @@ static const char *const multilib_defaults_raw[] = MULTILIB_DEFAULTS;
 
 static const char *const driver_self_specs[] = {
   "%{fdump-final-insns:-fdump-final-insns=.} %<fdump-final-insns",
+#ifdef ENABLE_OFFLOADING
+  /* If linking against libgomp, add a setup file.  */
+  "%{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*} 1):" \
+  "%:add-omp-infile()}",
+#endif /* ENABLE_OFFLOADING */
   DRIVER_SELF_SPECS, CONFIGURE_SPECS, GOMP_SELF_SPECS, GTM_SELF_SPECS,
   CILK_SELF_SPECS
 };
@@ -1616,6 +1623,7 @@ static const struct spec_function static_spec_functions[] =
   { "pass-through-libs",	pass_through_libs_spec_func },
   { "replace-extension",	replace_extension_spec_func },
   { "gt",			greater_than_spec_func },
+  { "add-omp-infile",		add_omp_infile_spec_func },
 #ifdef EXTRA_SPEC_FUNCTIONS
   EXTRA_SPEC_FUNCTIONS
 #endif
@@ -3212,7 +3220,8 @@ execute (void)
    The `validated' field describes whether any spec has looked at this switch;
    if it remains false at the end of the run, the switch must be meaningless.
    The `ordering' field is used to temporarily mark switches that have to be
-   kept in a specific order.  */
+   kept in a specific order.
+   The `lang_mask' field stores the flags associated with this option.  */
 
 #define SWITCH_LIVE    			(1 << 0)
 #define SWITCH_FALSE   			(1 << 1)
@@ -3228,6 +3237,7 @@ struct switchstr
   bool known;
   bool validated;
   bool ordering;
+  unsigned int lang_mask;
 };
 
 static struct switchstr *switches;
@@ -3236,6 +3246,10 @@ static int n_switches;
 
 static int n_switches_alloc;
 
+/* If nonzero, do not pass through switches for languages not matching
+   this mask.  */
+static unsigned int spec_lang_mask_accept;
+
 /* Set to zero if -fcompare-debug is disabled, positive if it's
    enabled and we're running the first compilation, negative if it's
    enabled and we're running the second compilation.  For most of the
@@ -3273,6 +3287,7 @@ struct infile
   const char *name;
   const char *language;
   struct compiler *incompiler;
+  unsigned int lang_mask;
   bool compiled;
   bool preprocessed;
 };
@@ -3466,15 +3481,16 @@ alloc_infile (void)
     }
 }
 
-/* Store an input file with the given NAME and LANGUAGE in
+/* Store an input file with the given NAME and LANGUAGE and LANG_MASK in
    infiles.  */
 
 static void
-add_infile (const char *name, const char *language)
+add_infile (const char *name, const char *language, unsigned int lang_mask)
 {
   alloc_infile ();
   infiles[n_infiles].name = name;
-  infiles[n_infiles++].language = language;
+  infiles[n_infiles].language = language;
+  infiles[n_infiles++].lang_mask = lang_mask;
 }
 
 /* Allocate space for a switch in switches.  */
@@ -3495,11 +3511,12 @@ alloc_switch (void)
 }
 
 /* Save an option OPT with N_ARGS arguments in array ARGS, marking it
-   as validated if VALIDATED and KNOWN if it is an internal switch.  */
+   as validated if VALIDATED and KNOWN if it is an internal switch.
+   LANG_MASK is the flags associated with this option.  */
 
 static void
 save_switch (const char *opt, size_t n_args, const char *const *args,
-	     bool validated, bool known)
+	     bool validated, bool known, unsigned int lang_mask)
 {
   alloc_switch ();
   switches[n_switches].part1 = opt + 1;
@@ -3516,6 +3533,7 @@ save_switch (const char *opt, size_t n_args, const char *const *args,
   switches[n_switches].validated = validated;
   switches[n_switches].known = known;
   switches[n_switches].ordering = 0;
+  switches[n_switches].lang_mask = lang_mask;
   n_switches++;
 }
 
@@ -3533,7 +3551,8 @@ driver_unknown_option_callback (const struct cl_decoded_option *decoded)
 	 diagnosed only if there are warnings.  */
       save_switch (decoded->canonical_option[0],
 		   decoded->canonical_option_num_elements - 1,
-		   &decoded->canonical_option[1], false, true);
+		   &decoded->canonical_option[1], false, true,
+		   cl_options[decoded->opt_index].flags);
       return false;
     }
   if (decoded->opt_index == OPT_SPECIAL_unknown)
@@ -3541,7 +3560,8 @@ driver_unknown_option_callback (const struct cl_decoded_option *decoded)
       /* Give it a chance to define it a spec file.  */
       save_switch (decoded->canonical_option[0],
 		   decoded->canonical_option_num_elements - 1,
-		   &decoded->canonical_option[1], false, false);
+		   &decoded->canonical_option[1], false, false,
+		   cl_options[decoded->opt_index].flags);
       return false;
     }
   else
@@ -3568,7 +3588,8 @@ driver_wrong_lang_callback (const struct cl_decoded_option *decoded,
   else
     save_switch (decoded->canonical_option[0],
 		 decoded->canonical_option_num_elements - 1,
-		 &decoded->canonical_option[1], false, true);
+		 &decoded->canonical_option[1], false, true,
+		 option->flags);
 }
 
 static const char *spec_lang = 0;
@@ -3817,7 +3838,8 @@ driver_handle_option (struct gcc_options *opts,
 	compare_debug_opt = NULL;
       else
 	compare_debug_opt = arg;
-      save_switch (compare_debug_replacement_opt, 0, NULL, validated, true);
+      save_switch (compare_debug_replacement_opt, 0, NULL, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
     case OPT_fdiagnostics_color_:
@@ -3872,17 +3894,17 @@ driver_handle_option (struct gcc_options *opts,
 	for (j = 0; arg[j]; j++)
 	  if (arg[j] == ',')
 	    {
-	      add_infile (save_string (arg + prev, j - prev), "*");
+	      add_infile (save_string (arg + prev, j - prev), "*", 0);
 	      prev = j + 1;
 	    }
 	/* Record the part after the last comma.  */
-	add_infile (arg + prev, "*");
+	add_infile (arg + prev, "*", 0);
       }
       do_save = false;
       break;
 
     case OPT_Xlinker:
-      add_infile (arg, "*");
+      add_infile (arg, "*", 0);
       do_save = false;
       break;
 
@@ -3899,19 +3921,21 @@ driver_handle_option (struct gcc_options *opts,
     case OPT_l:
       /* POSIX allows separation of -l and the lib arg; canonicalize
 	 by concatenating -l with its arg */
-      add_infile (concat ("-l", arg, NULL), "*");
+      add_infile (concat ("-l", arg, NULL), "*", 0);
       do_save = false;
       break;
 
     case OPT_L:
       /* Similarly, canonicalize -L for linkers that may not accept
 	 separate arguments.  */
-      save_switch (concat ("-L", arg, NULL), 0, NULL, validated, true);
+      save_switch (concat ("-L", arg, NULL), 0, NULL, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
     case OPT_F:
       /* Likewise -F.  */
-      save_switch (concat ("-F", arg, NULL), 0, NULL, validated, true);
+      save_switch (concat ("-F", arg, NULL), 0, NULL, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
     case OPT_save_temps:
@@ -4034,7 +4058,8 @@ driver_handle_option (struct gcc_options *opts,
       save_temps_prefix = xstrdup (arg);
       /* On some systems, ld cannot handle "-o" without a space.  So
 	 split the option from its argument.  */
-      save_switch ("-o", 1, &arg, validated, true);
+      save_switch ("-o", 1, &arg, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
 #ifdef ENABLE_DEFAULT_PIE
@@ -4070,7 +4095,8 @@ driver_handle_option (struct gcc_options *opts,
   if (do_save)
     save_switch (decoded->canonical_option[0],
 		 decoded->canonical_option_num_elements - 1,
-		 &decoded->canonical_option[1], validated, true);
+		 &decoded->canonical_option[1], validated, true,
+		 cl_options[opt_index].flags);
   return true;
 }
 
@@ -4367,7 +4393,7 @@ process_command (unsigned int decoded_options_count,
           if (strcmp (fname, "-") != 0 && access (fname, F_OK) < 0)
 	    perror_with_name (fname);
           else
-	    add_infile (arg, spec_lang);
+	    add_infile (arg, spec_lang, 0);
 
           free (fname);
 	  continue;
@@ -4516,7 +4542,8 @@ process_command (unsigned int decoded_options_count,
   if (compare_debug == 2 || compare_debug == 3)
     {
       const char *opt = concat ("-fcompare-debug=", compare_debug_opt, NULL);
-      save_switch (opt, 0, NULL, false, true);
+      save_switch (opt, 0, NULL, false, true,
+		   cl_options[OPT_fcompare_debug_].flags);
       compare_debug = 1;
     }
 
@@ -4527,7 +4554,7 @@ process_command (unsigned int decoded_options_count,
 
       /* Create a dummy input file, so that we can pass
 	 the help option on to the various sub-processes.  */
-      add_infile ("help-dummy", "c");
+      add_infile ("help-dummy", "c", 0);
     }
 
   alloc_switch ();
@@ -4728,13 +4755,15 @@ insert_wrapper (const char *wrapper)
 }
 
 /* Process the spec SPEC and run the commands specified therein.
+   If LANG_MASK is nonzero, switches for other languages are discarded.
    Returns 0 if the spec is successfully processed; -1 if failed.  */
 
 int
-do_spec (const char *spec)
+do_spec (const char *spec, unsigned int lang_mask)
 {
   int value;
 
+  spec_lang_mask_accept = lang_mask;
   value = do_spec_2 (spec);
 
   /* Force out any unfinished command.
@@ -4892,7 +4921,8 @@ do_self_spec (const char *spec)
 	      save_switch (decoded_options[j].canonical_option[0],
 			   (decoded_options[j].canonical_option_num_elements
 			    - 1),
-			   &decoded_options[j].canonical_option[1], false, true);
+			   &decoded_options[j].canonical_option[1], false, true,
+			   cl_options[decoded_options[j].opt_index].flags);
 	      break;
 
 	    default:
@@ -6488,6 +6518,14 @@ check_live_switch (int switchnum, int prefix_length)
 static void
 give_switch (int switchnum, int omit_first_word)
 {
+  int lang_mask = switches[switchnum].lang_mask & ((1U << cl_lang_count) - 1);
+  unsigned int lang_mask_accept = (1U << cl_lang_count) - 1;
+  if (spec_lang_mask_accept != 0)
+    lang_mask_accept = spec_lang_mask_accept;
+  /* Drop switches specific to a language not in the given mask.  */
+  if (lang_mask != 0 && !(lang_mask & lang_mask_accept))
+    return;
+
   if ((switches[switchnum].live_cond & SWITCH_IGNORE) != 0)
     return;
 
@@ -7589,9 +7627,6 @@ driver::maybe_putenv_OFFLOAD_TARGETS () const
 		    strlen (offload_targets) + 1);
       xputenv (XOBFINISH (&collect_obstack, char *));
     }
-
-  free (offload_targets);
-  offload_targets = NULL;
 }
 
 /* Reject switches that no pass was interested in.  */
@@ -7895,7 +7930,8 @@ driver::do_spec_on_infiles () const
 		  debug_check_temp_file[1] = NULL;
 		}
 
-	      value = do_spec (input_file_compiler->spec);
+	      value = do_spec (input_file_compiler->spec,
+			       infiles[i].lang_mask);
 	      infiles[i].compiled = true;
 	      if (value < 0)
 		this_file_error = 1;
@@ -7909,7 +7945,8 @@ driver::do_spec_on_infiles () const
 		  n_switches_alloc = n_switches_alloc_debug_check[1];
 		  switches = switches_debug_check[1];
 
-		  value = do_spec (input_file_compiler->spec);
+		  value = do_spec (input_file_compiler->spec,
+				   infiles[i].lang_mask);
 
 		  compare_debug = -compare_debug;
 		  n_switches = n_switches_debug_check[0];
@@ -8064,7 +8101,7 @@ driver::maybe_run_linker (const char *argv0) const
 		    " to the linker.\n\n"));
 	  fflush (stdout);
 	}
-      int value = do_spec (link_command_spec);
+      int value = do_spec (link_command_spec, 0);
       if (value < 0)
 	errorcount = 1;
       linker_was_run = (tmp != execution_count);
@@ -9655,6 +9692,50 @@ greater_than_spec_func (int argc, const char **argv)
   return NULL;
 }
 
+/* If applicable, generate a C source file containing a constructor call to
+   GOMP_enable_offload_targets, to inform libgomp which offload targets have
+   actually been requested (-foffload=[...]), and add that as an infile.  */
+
+static const char *
+add_omp_infile_spec_func (int argc, const char **)
+{
+  gcc_assert (argc == 0);
+  gcc_assert (offload_targets != NULL);
+
+  /* Nothing to do if we're not actually linking.  */
+  if (have_c)
+    return NULL;
+
+  int err;
+  const char *tmp_filename;
+  tmp_filename = make_temp_file (".c");
+  record_temp_file (tmp_filename, !save_temps_flag, 0);
+  FILE *f = fopen (tmp_filename, "w");
+  if (f == NULL)
+    fatal_error (input_location,
+		 "could not open temporary file %s", tmp_filename);
+  /* As libgomp uses constructors internally, and this code is only added when
+     linking against libgomp, it is fine to use a constructor here.  */
+  err = fprintf (f,
+		 "extern void GOMP_enable_offload_targets (const char *);\n"
+		 "static __attribute__ ((constructor)) void\n"
+		 "init (void)\n"
+		 "{\n"
+		 "  GOMP_enable_offload_targets (\"%s\");\n"
+		 "}\n",
+		 offload_targets);
+  if (err < 0)
+    fatal_error (input_location,
+		 "could not write to temporary file %s", tmp_filename);
+  err = fclose (f);
+  if (err == EOF)
+    fatal_error (input_location,
+		 "could not close temporary file %s", tmp_filename);
+
+  add_infile (tmp_filename, "cpp-output", CL_C);
+  return NULL;
+}
+
 /* Insert backslash before spaces in ORIG (usually a file path), to 
    avoid being broken by spec parser.
 
diff --git gcc/gcc.h gcc/gcc.h
index e1abe43..c71582d 100644
--- gcc/gcc.h
+++ gcc/gcc.h
@@ -68,7 +68,7 @@ struct spec_function
 };
 
 /* These are exported by gcc.c.  */
-extern int do_spec (const char *);
+extern int do_spec (const char *, unsigned int);
 extern void record_temp_file (const char *, int, int);
 extern void pfatal_with_name (const char *) ATTRIBUTE_NORETURN;
 extern void set_input (const char *);
diff --git gcc/java/jvspec.c gcc/java/jvspec.c
index d4efb73..518aa4d 100644
--- gcc/java/jvspec.c
+++ gcc/java/jvspec.c
@@ -629,7 +629,7 @@ lang_specific_pre_link (void)
      class name.  Append dummy `.c' that can be stripped by set_input so %b
      is correct.  */ 
   set_input (concat (main_class_name, "main.c", NULL));
-  err = do_spec (jvgenmain_spec);
+  err = do_spec (jvgenmain_spec, 0);
   if (err == 0)
     {
       /* Shift the outfiles array so the generated main comes first.
diff --git libgomp/config.h.in libgomp/config.h.in
index 2e4c698..d63e56a 100644
--- libgomp/config.h.in
+++ libgomp/config.h.in
@@ -95,7 +95,7 @@
    */
 #undef LT_OBJDIR
 
-/* Define to offload targets, separated by commas. */
+/* Define to offload targets, separated by colons. */
 #undef OFFLOAD_TARGETS
 
 /* Name of package */
diff --git libgomp/configure libgomp/configure
index 74d4e82..36ae548 100755
--- libgomp/configure
+++ libgomp/configure
@@ -15236,10 +15236,8 @@ if test x"$enable_offload_targets" != x; then
     tgt=`echo $tgt | sed 's/=.*//'`
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
-	tgt_name=intelmic
 	;;
       nvptx*)
-        tgt_name=nvptx
 	PLUGIN_NVPTX=$tgt
 	PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
 	PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
@@ -15282,9 +15280,9 @@ rm -f core conftest.err conftest.$ac_objext \
 	;;
     esac
     if test x"$offload_targets" = x; then
-      offload_targets=$tgt_name
+      offload_targets=$tgt
     else
-      offload_targets=$offload_targets,$tgt_name
+      offload_targets=$offload_targets:$tgt
     fi
     if test x"$tgt_dir" != x; then
       offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
diff --git libgomp/libgomp-plugin.h libgomp/libgomp-plugin.h
index 24fbb94..5da4fa7 100644
--- libgomp/libgomp-plugin.h
+++ libgomp/libgomp-plugin.h
@@ -48,7 +48,8 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_HOST = 2,
   /* OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3 removed.  */
   OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5,
-  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
+  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6,
+  OFFLOAD_TARGET_TYPE_HWM
 };
 
 /* Auxiliary struct, used for transferring pairs of addresses from plugin
diff --git libgomp/libgomp.h libgomp/libgomp.h
index 04262c4..61cb9f3 100644
--- libgomp/libgomp.h
+++ libgomp/libgomp.h
@@ -636,6 +636,7 @@ extern void gomp_free_thread (void *);
 
 extern void gomp_init_targets_once (void);
 extern int gomp_get_num_devices (void);
+extern bool gomp_offload_target_enabled_p (enum offload_target_type);
 
 typedef struct splay_tree_node_s *splay_tree_node;
 typedef struct splay_tree_s *splay_tree;
diff --git libgomp/libgomp.map libgomp/libgomp.map
index 3b3e0c2..5d65dc1 100644
--- libgomp/libgomp.map
+++ libgomp/libgomp.map
@@ -236,6 +236,7 @@ GOMP_4.0.1 {
 
 GOMP_4.0.2 {
   global:
+	GOMP_enable_offload_targets;
 	GOMP_offload_register_ver;
 	GOMP_offload_unregister_ver;
 } GOMP_4.0.1;
diff --git libgomp/libgomp_g.h libgomp/libgomp_g.h
index e7f4eff..d8e7149 100644
--- libgomp/libgomp_g.h
+++ libgomp/libgomp_g.h
@@ -206,6 +206,7 @@ extern void GOMP_single_copy_end (void *);
 
 /* target.c */
 
+extern void GOMP_enable_offload_targets (const char *);
 extern void GOMP_target (int, void (*) (void *), const void *,
 			 size_t, void **, size_t *, unsigned char *);
 extern void GOMP_target_data (int, const void *,
diff --git libgomp/oacc-init.c libgomp/oacc-init.c
index 28b9e7a..c4114cc 100644
--- libgomp/oacc-init.c
+++ libgomp/oacc-init.c
@@ -122,7 +122,9 @@ resolve_device (acc_device_t d, bool fail_is_error)
       {
 	if (goacc_device_type)
 	  {
-	    /* Lookup the named device.  */
+	    /* Lookup the device that has been explicitly named, so do not pay
+	       attention to gomp_offload_target_enabled_p.  (That is, hard
+	       error if not actually enabled.)  */
 	    while (++d != _ACC_device_hwm)
 	      if (dispatchers[d]
 		  && !strcasecmp (goacc_device_type,
@@ -148,8 +150,14 @@ resolve_device (acc_device_t d, bool fail_is_error)
     case acc_device_not_host:
       /* Find the first available device after acc_device_not_host.  */
       while (++d != _ACC_device_hwm)
-	if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0)
+	if (dispatchers[d]
+	    && dispatchers[d]->get_num_devices_func () > 0
+	    /* No device has been explicitly named, so pay attention to
+	       gomp_offload_target_enabled_p, to not decide on an offload
+	       target that has not been enabled.  */
+	    && gomp_offload_target_enabled_p (dispatchers[d]->type))
 	  goto found;
+      /* No non-host device found.  */
       if (d_arg == acc_device_default)
 	{
 	  d = acc_device_host;
@@ -164,9 +172,6 @@ resolve_device (acc_device_t d, bool fail_is_error)
         return NULL;
       break;
 
-    case acc_device_host:
-      break;
-
     default:
       if (d > _ACC_device_hwm)
 	{
@@ -181,7 +186,8 @@ resolve_device (acc_device_t d, bool fail_is_error)
 
   assert (d != acc_device_none
 	  && d != acc_device_default
-	  && d != acc_device_not_host);
+	  && d != acc_device_not_host
+	  && d < _ACC_device_hwm);
 
   if (dispatchers[d] == NULL && fail_is_error)
     {
diff --git libgomp/plugin/configfrag.ac libgomp/plugin/configfrag.ac
index ad70dd1..a1bfec6 100644
--- libgomp/plugin/configfrag.ac
+++ libgomp/plugin/configfrag.ac
@@ -92,10 +92,8 @@ if test x"$enable_offload_targets" != x; then
     tgt=`echo $tgt | sed 's/=.*//'`
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
-	tgt_name=intelmic
 	;;
       nvptx*)
-        tgt_name=nvptx
 	PLUGIN_NVPTX=$tgt
 	PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
 	PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
@@ -127,9 +125,9 @@ if test x"$enable_offload_targets" != x; then
 	;;
     esac
     if test x"$offload_targets" = x; then
-      offload_targets=$tgt_name
+      offload_targets=$tgt
     else
-      offload_targets=$offload_targets,$tgt_name
+      offload_targets=$offload_targets:$tgt
     fi
     if test x"$tgt_dir" != x; then
       offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
@@ -141,7 +139,7 @@ if test x"$enable_offload_targets" != x; then
   done
 fi
 AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets",
-  [Define to offload targets, separated by commas.])
+  [Define to offload targets, separated by colons.])
 AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
   [Define to 1 if the NVIDIA plugin is built, 0 if not.])
diff --git libgomp/target.c libgomp/target.c
index 6f0a339..1c8f337 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -71,6 +71,9 @@ static int num_offload_images;
 /* Array of descriptors for all available devices.  */
 static struct gomp_device_descr *devices;
 
+/* Set of enabled devices.  */
+static bool devices_enabled[OFFLOAD_TARGET_TYPE_HWM];
+
 /* Total number of available devices.  */
 static int num_devices;
 
@@ -124,19 +127,30 @@ gomp_get_num_devices (void)
 }
 
 static struct gomp_device_descr *
-resolve_device (int device_id)
+resolve_device (int device)
 {
-  if (device_id == GOMP_DEVICE_ICV)
+  int device_id;
+  if (device == GOMP_DEVICE_ICV)
     {
       struct gomp_task_icv *icv = gomp_icv (false);
       device_id = icv->default_device_var;
     }
+  else
+    device_id = device;
 
   if (device_id < 0 || device_id >= gomp_get_num_devices ())
     return NULL;
 
   /* As it is immutable once it has been initialized, it's safe to access
      devices without register_lock held.  */
+
+  /* If the device specified by the device-var ICV is not actually enabled,
+     don't try use it (which will fail if it doesn't have offload data
+     available), and use host fallback instead.  */
+  if (device == GOMP_DEVICE_ICV
+      && !gomp_offload_target_enabled_p (devices[device_id].type))
+    return NULL;
+
   return &devices[device_id];
 }
 
@@ -799,6 +813,8 @@ void
 GOMP_offload_register_ver (unsigned version, const void *host_table,
 			   int target_type, const void *target_data)
 {
+  gomp_debug(0, "%s (%#x, %d)\n", __FUNCTION__, version, target_type);
+
   int i;
 
   if (GOMP_VERSION_LIB (version) > GOMP_VERSION)
@@ -836,6 +852,18 @@ void
 GOMP_offload_register (const void *host_table, int target_type,
 		       const void *target_data)
 {
+  gomp_debug(0, "%s (%d)\n", __FUNCTION__, target_type);
+
+  gomp_mutex_lock (&register_lock);
+  /* If we're seeing this function called, then default to the old behavior of
+     enabling all offload targets: this is what old executables and shared
+     libraries expect.  */
+  for (enum offload_target_type type = 0;
+       type < OFFLOAD_TARGET_TYPE_HWM;
+       ++type)
+    devices_enabled[type] = true;
+  gomp_mutex_unlock (&register_lock);
+
   GOMP_offload_register_ver (0, host_table, target_type, target_data);
 }
 
@@ -847,6 +875,8 @@ void
 GOMP_offload_unregister_ver (unsigned version, const void *host_table,
 			     int target_type, const void *target_data)
 {
+  gomp_debug(0, "%s (%#x, %d)\n", __FUNCTION__, version, target_type);
+
   int i;
 
   gomp_mutex_lock (&register_lock);
@@ -877,6 +907,8 @@ void
 GOMP_offload_unregister (const void *host_table, int target_type,
 			 const void *target_data)
 {
+  gomp_debug(0, "%s (%d)\n", __FUNCTION__, target_type);
+
   GOMP_offload_unregister_ver (0, host_table, target_type, target_data);
 }
 
@@ -952,6 +984,18 @@ gomp_fini_device (struct gomp_device_descr *devicep)
   devicep->is_initialized = false;
 }
 
+/* Has the offload target type TYPE been enabled?
+
+   We cannot verify that *all* offload data is available that could possibly be
+   required, so if we later find any offload data missing for this offload
+   target, then that's user error.  */
+
+attribute_hidden bool
+gomp_offload_target_enabled_p (enum offload_target_type type)
+{
+  return devices_enabled[type];
+}
+
 /* Called when encountering a target directive.  If DEVICE
    is GOMP_DEVICE_ICV, it means use device-var ICV.  If it is
    GOMP_DEVICE_HOST_FALLBACK (or any value
@@ -1121,6 +1165,8 @@ static bool
 gomp_load_plugin_for_device (struct gomp_device_descr *device,
 			     const char *plugin_name)
 {
+  gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, plugin_name);
+
   const char *err = NULL, *last_missing = NULL;
 
   void *plugin_handle = dlopen (plugin_name, RTLD_LAZY);
@@ -1216,6 +1262,78 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   return 0;
 }
 
+/* Return the corresponding offload target type for the offload target name
+   OFFLOAD_TARGET, or 0 if unknown.  */
+
+static enum offload_target_type
+offload_target_to_type (const char *offload_target)
+{
+  if (strstr (offload_target, "-intelmic") != NULL)
+    return OFFLOAD_TARGET_TYPE_INTEL_MIC;
+  else if (strncmp (offload_target, "nvptx", 5) == 0)
+    return OFFLOAD_TARGET_TYPE_NVIDIA_PTX;
+  else
+    return 0;
+}
+
+/* Return the corresponding plugin name for the offload target type TYPE, or
+   NULL if unknown.  */
+
+static const char *
+offload_target_type_to_plugin_name (enum offload_target_type type)
+{
+  switch (type)
+    {
+    case OFFLOAD_TARGET_TYPE_INTEL_MIC:
+      return "intelmic";
+    case OFFLOAD_TARGET_TYPE_NVIDIA_PTX:
+      return "nvptx";
+    default:
+      return NULL;
+    }
+}
+
+/* Enable the specified OFFLOAD_TARGETS, the set passed to the compiler at link
+   time.  */
+
+void
+GOMP_enable_offload_targets (const char *offload_targets)
+{
+  gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, offload_targets);
+
+  char *offload_targets_dup = strdup (offload_targets);
+  if (offload_targets_dup == NULL)
+    gomp_fatal ("Out of memory");
+
+  gomp_mutex_lock (&register_lock);
+
+  char *cur = offload_targets_dup;
+  while (cur)
+    {
+      char *next = strchr (cur, ':');
+      if (next != NULL)
+	{
+	  *next = '\0';
+	  ++next;
+	}
+      enum offload_target_type type = offload_target_to_type (cur);
+      if (type == 0)
+	{
+	  /* An unknown offload target has been requested; ignore it.  This
+	     makes us (future-)proof if offload targets are requested that
+	     are not supported in this build of libgomp.  */
+	}
+      else
+	devices_enabled[type] = true;
+
+      cur = next;
+    }
+
+  gomp_mutex_unlock (&register_lock);
+
+  free (offload_targets_dup);
+}
+
 /* This function initializes the runtime needed for offloading.
    It parses the list of offload targets and tries to load the plugins for
    these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -1223,13 +1341,13 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
    corresponding devices, first the GOMP_OFFLOAD_CAP_OPENMP_400 ones, follows
    by the others.  */
 
+static const char *gomp_plugin_prefix ="libgomp-plugin-";
+static const char *gomp_plugin_suffix = SONAME_SUFFIX (1);
+
 static void
 gomp_target_init (void)
 {
-  const char *prefix ="libgomp-plugin-";
-  const char *suffix = SONAME_SUFFIX (1);
   const char *cur, *next;
-  char *plugin_name;
   int i, new_num_devices;
 
   gomp_mutex_lock (&register_lock);
@@ -1241,44 +1359,58 @@ gomp_target_init (void)
   if (*cur)
     do
       {
-	struct gomp_device_descr current_device;
-
-	next = strchr (cur, ',');
-
-	plugin_name = (char *) malloc (1 + (next ? next - cur : strlen (cur))
-				       + strlen (prefix) + strlen (suffix));
-	if (!plugin_name)
-	  {
-	    num_devices = 0;
-	    break;
-	  }
-
-	strcpy (plugin_name, prefix);
-	strncat (plugin_name, cur, next ? next - cur : strlen (cur));
-	strcat (plugin_name, suffix);
+	next = strchr (cur, ':');
+	/* If no other offload target following...  */
+	if (next == NULL)
+	  /* ..., point to the terminating NUL character.  */
+	  next = strchr (cur, '\0');
+
+	size_t gomp_plugin_prefix_len = strlen (gomp_plugin_prefix);
+	size_t cur_len = next - cur;
+	size_t gomp_plugin_suffix_len = strlen (gomp_plugin_suffix);
+	char *plugin_name
+	  = gomp_realloc_unlock (NULL, (gomp_plugin_prefix_len
+					+ cur_len
+					+ gomp_plugin_suffix_len
+					+ 1));
+	memcpy (plugin_name, gomp_plugin_prefix, gomp_plugin_prefix_len);
+	memcpy (plugin_name + gomp_plugin_prefix_len, cur, cur_len);
+	/* NUL-terminate the string here...  */
+	plugin_name[gomp_plugin_prefix_len + cur_len] = '\0';
+	/* ..., so that we can then use it to translate the offload target to
+	   the plugin name...  */
+	enum offload_target_type type
+	  = offload_target_to_type (plugin_name + gomp_plugin_prefix_len);
+	const char *cur_plugin_name
+	  = offload_target_type_to_plugin_name (type);
+	size_t cur_plugin_name_len = strlen (cur_plugin_name);
+	assert (cur_plugin_name_len <= cur_len);
+	/* ..., and then rewrite it.  */
+	memcpy (plugin_name + gomp_plugin_prefix_len,
+		cur_plugin_name, cur_plugin_name_len);
+	memcpy (plugin_name + gomp_plugin_prefix_len + cur_plugin_name_len,
+		gomp_plugin_suffix, gomp_plugin_suffix_len);
+	plugin_name[gomp_plugin_prefix_len
+		    + cur_plugin_name_len
+		    + gomp_plugin_suffix_len] = '\0';
 
+	struct gomp_device_descr current_device;
 	if (gomp_load_plugin_for_device (&current_device, plugin_name))
 	  {
 	    new_num_devices = current_device.get_num_devices_func ();
 	    if (new_num_devices >= 1)
 	      {
-		/* Augment DEVICES and NUM_DEVICES.  */
-
-		devices = realloc (devices, (num_devices + new_num_devices)
-				   * sizeof (struct gomp_device_descr));
-		if (!devices)
-		  {
-		    num_devices = 0;
-		    free (plugin_name);
-		    break;
-		  }
-
 		current_device.name = current_device.get_name_func ();
 		/* current_device.capabilities has already been set.  */
 		current_device.type = current_device.get_type_func ();
 		current_device.mem_map.root = NULL;
 		current_device.is_initialized = false;
 		current_device.openacc.data_environ = NULL;
+
+		/* Augment DEVICES and NUM_DEVICES.  */
+		devices = gomp_realloc_unlock
+		  (devices, ((num_devices + new_num_devices)
+			     * sizeof (struct gomp_device_descr)));
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.target_id = i;
@@ -1292,18 +1424,13 @@ gomp_target_init (void)
 	free (plugin_name);
 	cur = next + 1;
       }
-    while (next);
+    while (*next);
 
   /* In DEVICES, sort the GOMP_OFFLOAD_CAP_OPENMP_400 ones first, and set
      NUM_DEVICES_OPENMP.  */
   struct gomp_device_descr *devices_s
-    = malloc (num_devices * sizeof (struct gomp_device_descr));
-  if (!devices_s)
-    {
-      num_devices = 0;
-      free (devices);
-      devices = NULL;
-    }
+    = gomp_realloc_unlock (NULL,
+			   num_devices * sizeof (struct gomp_device_descr));
   num_devices_openmp = 0;
   for (i = 0; i < num_devices; i++)
     if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
diff --git libgomp/testsuite/lib/libgomp.exp libgomp/testsuite/lib/libgomp.exp
index f04b163..3b4c515 100644
--- libgomp/testsuite/lib/libgomp.exp
+++ libgomp/testsuite/lib/libgomp.exp
@@ -36,24 +36,21 @@ load_gcc_lib fortran-modules.exp
 load_file libgomp-test-support.exp
 
 # Populate offload_targets_s (offloading targets separated by a space), and
-# offload_targets_s_openacc (the same, but with OpenACC names; OpenACC spells
-# some of them a little differently).
-set offload_targets_s [split $offload_targets ","]
+# offload_targets_s_openacc (those suitable for OpenACC).
+set offload_targets_s [split $offload_targets ":"]
 set offload_targets_s_openacc {}
 foreach offload_target_openacc $offload_targets_s {
-    switch $offload_target_openacc {
-	intelmic {
+    switch -glob $offload_target_openacc {
+	*-intelmic* {
 	    # Skip; will all FAIL because of missing
 	    # GOMP_OFFLOAD_CAP_OPENACC_200.
 	    continue
 	}
-	nvptx {
-	    set offload_target_openacc "nvidia"
-	}
     }
     lappend offload_targets_s_openacc "$offload_target_openacc"
 }
-lappend offload_targets_s_openacc "host"
+# Host fallback.
+lappend offload_targets_s_openacc "disable"
 
 set dg-do-what-default run
 
@@ -134,7 +131,7 @@ proc libgomp_init { args } {
     # Add liboffloadmic build directory in LD_LIBRARY_PATH to support
     # non-fallback testing for Intel MIC targets
     global offload_targets
-    if { [string match "*,intelmic,*" ",$offload_targets,"] } {
+    if { [string match "*:*-intelmic*:*" ":$offload_targets:"] } {
 	append always_ld_library_path ":${blddir}/../liboffloadmic/.libs"
 	append always_ld_library_path ":${blddir}/../liboffloadmic/plugin/.libs"
 	# libstdc++ is required by liboffloadmic
@@ -332,15 +329,14 @@ proc check_effective_target_openacc_nvidia_accel_present { } {
 }
 
 # Return 1 if at least one nvidia board is present, and the nvidia device type
-# is selected by default by means of setting the environment variable
-# ACC_DEVICE_TYPE.
+# is selected by default.
 
 proc check_effective_target_openacc_nvidia_accel_selected { } {
     if { ![check_effective_target_openacc_nvidia_accel_present] } {
 	return 0;
     }
     global offload_target_openacc
-    if { $offload_target_openacc == "nvidia" } {
+    if { [string match "nvptx*" $offload_target_openacc] } {
         return 1;
     }
     return 0;
@@ -350,7 +346,7 @@ proc check_effective_target_openacc_nvidia_accel_selected { } {
 
 proc check_effective_target_openacc_host_selected { } {
     global offload_target_openacc
-    if { $offload_target_openacc == "host" } {
+    if { $offload_target_openacc == "disable" } {
         return 1;
     }
     return 0;
diff --git libgomp/testsuite/libgomp.c++/target-1-foffload_disable.C libgomp/testsuite/libgomp.c++/target-1-foffload_disable.C
new file mode 100644
index 0000000..15b9432
--- /dev/null
+++ libgomp/testsuite/libgomp.c++/target-1-foffload_disable.C
@@ -0,0 +1,3 @@
+/* { dg-additional-options "-foffload=disable" } */
+
+#include "target-1.C"
diff --git libgomp/testsuite/libgomp.c++/target-foffload_disable.C libgomp/testsuite/libgomp.c++/target-foffload_disable.C
new file mode 100644
index 0000000..c07dea1
--- /dev/null
+++ libgomp/testsuite/libgomp.c++/target-foffload_disable.C
@@ -0,0 +1,3 @@
+/* { dg-additional-options "-foffload=disable" } */
+
+#include "../libgomp.c/target-foffload_disable.c"
diff --git libgomp/testsuite/libgomp.c/target-1-foffload_disable.c libgomp/testsuite/libgomp.c/target-1-foffload_disable.c
new file mode 100644
index 0000000..177cceb
--- /dev/null
+++ libgomp/testsuite/libgomp.c/target-1-foffload_disable.c
@@ -0,0 +1,3 @@
+/* { dg-additional-options "-foffload=disable" } */
+
+#include "target-1.c"
diff --git libgomp/testsuite/libgomp.c/target-foffload_disable.c libgomp/testsuite/libgomp.c/target-foffload_disable.c
new file mode 100644
index 0000000..4a712da
--- /dev/null
+++ libgomp/testsuite/libgomp.c/target-foffload_disable.c
@@ -0,0 +1,18 @@
+/* { dg-additional-options "-foffload=disable" } */
+
+#include <omp.h>
+
+int main()
+{
+  if (!omp_is_initial_device())
+    __builtin_abort();
+#pragma omp target
+  {
+    if (!omp_is_initial_device())
+      __builtin_abort();
+  }
+  if (!omp_is_initial_device())
+    __builtin_abort();
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.fortran/target-foffload_disable.f libgomp/testsuite/libgomp.fortran/target-foffload_disable.f
new file mode 100644
index 0000000..0d60534
--- /dev/null
+++ libgomp/testsuite/libgomp.fortran/target-foffload_disable.f
@@ -0,0 +1,14 @@
+!     { dg-additional-options "-foffload=disable" }
+
+      PROGRAM MAIN
+      IMPLICIT NONE
+
+      INCLUDE "omp_lib.h"
+
+      IF (.NOT. OMP_IS_INITIAL_DEVICE()) CALL ABORT
+!$OMP TARGET
+      IF (.NOT. OMP_IS_INITIAL_DEVICE()) CALL ABORT
+!$OMP END TARGET
+      IF (.NOT. OMP_IS_INITIAL_DEVICE()) CALL ABORT
+
+      END
diff --git libgomp/testsuite/libgomp.fortran/target1-foffload_disable.f90 libgomp/testsuite/libgomp.fortran/target1-foffload_disable.f90
new file mode 100644
index 0000000..005328e
--- /dev/null
+++ libgomp/testsuite/libgomp.fortran/target1-foffload_disable.f90
@@ -0,0 +1,3 @@
+! { dg-additional-options "-cpp -foffload=disable" }
+
+#include "target1.f90"
diff --git libgomp/testsuite/libgomp.oacc-c++/c++.exp libgomp/testsuite/libgomp.oacc-c++/c++.exp
index 88b0269..aa545a2 100644
--- libgomp/testsuite/libgomp.oacc-c++/c++.exp
+++ libgomp/testsuite/libgomp.oacc-c++/c++.exp
@@ -75,13 +75,12 @@ if { $lang_test_file_found } {
 
     # Test OpenACC with available accelerators.
     foreach offload_target_openacc $offload_targets_s_openacc {
-	set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
-
-	switch $offload_target_openacc {
-	    host {
+	switch -glob $offload_target_openacc {
+	    disable {
 		set acc_mem_shared 1
+		set tagopt "-DACC_DEVICE_TYPE_host=1"
 	    }
-	    nvidia {
+	    nvptx* {
 		if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		    # Don't bother; execution testing is going to FAIL.
 		    untested "$subdir $offload_target_openacc offloading"
@@ -95,14 +94,13 @@ if { $lang_test_file_found } {
 		lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/libgomp.oacc-c-c++-common"
 
 		set acc_mem_shared 0
+		set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	    }
 	    default {
 		set acc_mem_shared 0
 	    }
 	}
-	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
-
-	setenv ACC_DEVICE_TYPE $offload_target_openacc
+	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
 	dg-runtest $tests "$tagopt" "$libstdcxx_includes $DEFAULT_CFLAGS"
     }
diff --git libgomp/testsuite/libgomp.oacc-c/c.exp libgomp/testsuite/libgomp.oacc-c/c.exp
index 5020e6a..9d2065f 100644
--- libgomp/testsuite/libgomp.oacc-c/c.exp
+++ libgomp/testsuite/libgomp.oacc-c/c.exp
@@ -38,13 +38,13 @@ set_ld_library_path_env_vars
 set SAVE_ALWAYS_CFLAGS "$ALWAYS_CFLAGS"
 foreach offload_target_openacc $offload_targets_s_openacc {
     set ALWAYS_CFLAGS "$SAVE_ALWAYS_CFLAGS"
-    set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
 
-    switch $offload_target_openacc {
-	host {
+    switch -glob $offload_target_openacc {
+	disable {
 	    set acc_mem_shared 1
+	    set tagopt "-DACC_DEVICE_TYPE_host=1"
 	}
-	nvidia {
+	nvptx* {
 	    if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		# Don't bother; execution testing is going to FAIL.
 		untested "$subdir $offload_target_openacc offloading"
@@ -58,14 +58,13 @@ foreach offload_target_openacc $offload_targets_s_openacc {
 	    lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/libgomp.oacc-c-c++-common"
 
 	    set acc_mem_shared 0
+	    set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	}
 	default {
 	    set acc_mem_shared 0
 	}
     }
-    set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
-
-    setenv ACC_DEVICE_TYPE $offload_target_openacc
+    set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
     dg-runtest $tests "$tagopt" $DEFAULT_CFLAGS
 }
diff --git libgomp/testsuite/libgomp.oacc-fortran/fortran.exp libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
index 2d6b647..3f678ba 100644
--- libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
+++ libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
@@ -67,13 +67,12 @@ if { $lang_test_file_found } {
 
     # Test OpenACC with available accelerators.
     foreach offload_target_openacc $offload_targets_s_openacc {
-	set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
-
-	switch $offload_target_openacc {
-	    host {
+	switch -glob $offload_target_openacc {
+	    disable {
 		set acc_mem_shared 1
+		set tagopt "-DACC_DEVICE_TYPE_host=1"
 	    }
-	    nvidia {
+	    nvptx* {
 		if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		    # Don't bother; execution testing is going to FAIL.
 		    untested "$subdir $offload_target_openacc offloading"
@@ -81,14 +80,13 @@ if { $lang_test_file_found } {
 		}
 
 		set acc_mem_shared 0
+		set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	    }
 	    default {
 		set acc_mem_shared 0
 	    }
 	}
-	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
-
-	setenv ACC_DEVICE_TYPE $offload_target_openacc
+	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
 	# For Fortran we're doing torture testing, as Fortran has far more tests
 	# with arrays etc. that testing just -O0 or -O2 is insufficient, that is


Grüße,
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
  2015-09-30 16:15                   ` Thomas Schwinge
@ 2015-10-19 16:56                     ` Thomas Schwinge
  2015-10-20 10:03                       ` Jakub Jelinek
  0 siblings, 1 reply; 62+ messages in thread
From: Thomas Schwinge @ 2015-10-19 16:56 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Bernd Schmidt, Nathan Sidwell, Joseph Myers

[-- Attachment #1: Type: text/plain, Size: 51045 bytes --]

Hi!

Ping...

On Wed, 30 Sep 2015 17:54:07 +0200, I wrote:
> On Tue, 29 Sep 2015 10:18:14 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Mon, Sep 28, 2015 at 11:39:10AM +0200, Thomas Schwinge wrote:
> > > On Fri, 11 Sep 2015 17:43:49 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> > > > So, do I understand well that you'll call GOMP_set_offload_targets from
> > > > construct[ors] of all shared libraries (and the binary) that contain offloaded
> > > > code?  If yes, that is surely going to fail the assertions in there.
> > > 
> > > Indeed.  My original plan has been to generate/invoke this constructor
> > > only for/from the final executable and not for any shared libraries, but
> > > it seems I didn't implemented this correctly.
> > 
> > How would you mean to implement it?
> 
> I have come to realize that we need to generate/invoke this constructor
> From everything that links against libgomp (which is what I implemented),
> that is, executables as well as shared libraries.
> 
> > -fopenmp or -fopenacc code with
> > offloading bits might not be in the final executable at all, nor in shared
> > libraries it is linked against; such libraries could be only dlopened,
> > consider say python plugin.  And this is not just made up, perhaps not with
> > offloading yet, but people regularly use OpenMP code in plugins and then we
> > get complains that fork child of the main program is not allowed to do
> > anything but async-signal-safe functions.
> 
> I'm not sure I'm completely understanding that paragraph?  Are you saying
> that offloaded code can be in libraries that are not linked against
> libgomp?  How would these register (GOMP_offload_register) their
> offloaded code?  I think it's a reasonable to expect that every shared
> library that contains offloaded code must link against libgomp, which
> will happen automatically given that it is built with -fopenmp/-fopenacc?
> 
> > > > You can dlopen such libraries etc.  What if you link one library with
> > > > -fopenmp=nvptx-none and another one with -fopenmp=x86_64-intelmicemul-linux?
> > > 
> > > So, the first question to answer is: what do we expect to happen in this
> > > case, or similarly, if the executable and any shared libraries are
> > > compiled with different/incompatible -foffload options?
> > 
> > As the device numbers are per-process, the only possibility I see is that
> > all the physically available devices are always available, and just if you
> > try to offload from some code to a device that doesn't support it, you get
> > host fallback.  Because, one shared library could carefully use device(xyz)
> > to offload to say XeonPhi it is compiled for and supports, and another
> > library device(abc) to offload to PTX it is compiled for and supports.
> 
> OK, I think I get that, and it makes sense.  Even though, I don't know
> how you'd do that today: as far as I can tell, there is no specification
> covering the OpenMP 4 target device IDs, so I have no idea how a user
> program/library could realiably use them in practice?  For example, in
> the current GCC implementation, the OpenMP 4 target device IDs depend on
> the number of individual devices availble in the system, and the order in
> which libgomp loads the plugins, which is defined (arbitrarily) by the
> GCC configuration?
> 
> > > For this, I propose that the only mode of operation that we currently can
> > > support is that all of the executable and any shared libraries agree on
> > > the offload targets specified by -foffload, and I thus propose the
> > > following patch on top of what Joseph has posted before (passes the
> > > testsuite, but not yet tested otherwise):
> > 
> > See above, no.
> 
> OK.
> 
> How's the following (complete patch instead of incremental patch; the
> driver changes are still the same as before)?  The changes are:
> 
>   * libgomp/target.c:gomp_target_init again loads all the plugins.
>   * libgomp/target.c:resolve_device and
>     libgomp/oacc-init.c:resolve_device verify that a default device
>     (OpenMP device-var ICV, and acc_device_default, respectively) is
>     actually enabled, or resort to host fallback if not.
>   * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used
>     to enable devices specified by -foffload.  Can be called multiple
>     times (executable, any shared libraries); the set of enabled devices
>     is the union of all those ever requested.
>   * GOMP_offload_register (but not the new GOMP_offload_register_ver)
>     changed to enable all devices.  This is to maintain compatibility
>     with old executables and shared libraries built without the -foffload
>     constructor support.
>   * IntelMIC mkoffload changed to use GOMP_offload_register_ver instead
>     of GOMP_offload_register, and GOMP_offload_unregister_ver instead of
>     GOMP_offload_unregister.  To avoid enabling all devices
>     (GOMP_offload_register).
>   * New test cases to verify this (-foffload=disable, host fallback).

(Will write ChangeLog once the general approach has been approved.)

> Ilya, I'm aware of your work on additional changes (shared memory),
> <http://news.gmane.org/find-root.php?message_id=%3CCADG%3DZ0EBuhj89WEZdmaNUPy%3DE%3D63BmWofS8An8nY7rygTmdJ_w%40mail.gmail.com%3E>,
> but I think my patch is still an improvement already?
> 
> Jakub, is this OK as an incremental step forward?

Rebased on top of current trunk:

 gcc/config/i386/intelmic-mkoffload.c               |  20 +-
 gcc/fortran/gfortranspec.c                         |   2 +-
 gcc/gcc.c                                          | 139 +++++++++++---
 gcc/gcc.h                                          |   2 +-
 gcc/java/jvspec.c                                  |   2 +-
 libgomp/config.h.in                                |   2 +-
 libgomp/configure                                  |   6 +-
 libgomp/libgomp-plugin.h                           |   3 +-
 libgomp/libgomp.h                                  |   1 +
 libgomp/libgomp.map                                |   1 +
 libgomp/libgomp_g.h                                |   1 +
 libgomp/oacc-init.c                                |  18 +-
 libgomp/plugin/configfrag.ac                       |   8 +-
 libgomp/target.c                                   | 210 +++++++++++++++++----
 libgomp/testsuite/lib/libgomp.exp                  |  24 +--
 .../libgomp.c++/target-1-foffload_disable.C        |   3 +
 .../libgomp.c++/target-foffload_disable.C          |   3 +
 .../libgomp.c/target-1-foffload_disable.c          |   3 +
 .../testsuite/libgomp.c/target-foffload_disable.c  |  18 ++
 .../libgomp.fortran/target-foffload_disable.f      |  14 ++
 .../libgomp.fortran/target1-foffload_disable.f90   |   3 +
 libgomp/testsuite/libgomp.oacc-c++/c++.exp         |  14 +-
 libgomp/testsuite/libgomp.oacc-c/c.exp             |  13 +-
 libgomp/testsuite/libgomp.oacc-fortran/fortran.exp |  14 +-
 24 files changed, 393 insertions(+), 131 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.c++/target-1-foffload_disable.C
 create mode 100644 libgomp/testsuite/libgomp.c++/target-foffload_disable.C
 create mode 100644 libgomp/testsuite/libgomp.c/target-1-foffload_disable.c
 create mode 100644 libgomp/testsuite/libgomp.c/target-foffload_disable.c
 create mode 100644 libgomp/testsuite/libgomp.fortran/target-foffload_disable.f
 create mode 100644 libgomp/testsuite/libgomp.fortran/target1-foffload_disable.f90

diff --git a/gcc/config/i386/intelmic-mkoffload.c b/gcc/config/i386/intelmic-mkoffload.c
index 828b415..a4960a2 100644
--- a/gcc/config/i386/intelmic-mkoffload.c
+++ b/gcc/config/i386/intelmic-mkoffload.c
@@ -370,26 +370,34 @@ generate_host_descr_file (const char *host_compiler)
 	   "#ifdef __cplusplus\n"
 	   "extern \"C\"\n"
 	   "#endif\n"
-	   "void GOMP_offload_register (const void *, int, const void *);\n"
+	   "void GOMP_offload_register_ver "
+	   "(unsigned version, const void *, int, const void *);\n"
 	   "#ifdef __cplusplus\n"
 	   "extern \"C\"\n"
 	   "#endif\n"
-	   "void GOMP_offload_unregister (const void *, int, const void *);\n\n"
+	   "void GOMP_offload_unregister_ver "
+	   "(unsigned version, const void *, int, const void *);\n\n"
 
 	   "__attribute__((constructor))\n"
 	   "static void\n"
 	   "init (void)\n"
 	   "{\n"
-	   "  GOMP_offload_register (&__OFFLOAD_TABLE__, %d, __offload_target_data);\n"
-	   "}\n\n", GOMP_DEVICE_INTEL_MIC);
+	   "  GOMP_offload_register_ver (%#x, &__OFFLOAD_TABLE__, "
+	   "%d, __offload_target_data);\n"
+	   "}\n\n",
+	   GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_INTEL_MIC),
+	   GOMP_DEVICE_INTEL_MIC);
 
   fprintf (src_file,
 	   "__attribute__((destructor))\n"
 	   "static void\n"
 	   "fini (void)\n"
 	   "{\n"
-	   "  GOMP_offload_unregister (&__OFFLOAD_TABLE__, %d, __offload_target_data);\n"
-	   "}\n", GOMP_DEVICE_INTEL_MIC);
+	   "  GOMP_offload_unregister_ver (%#x, &__OFFLOAD_TABLE__, "
+	   "%d, __offload_target_data);\n"
+	   "}\n",
+	   GOMP_VERSION_PACK (GOMP_VERSION, GOMP_VERSION_INTEL_MIC),
+	   GOMP_DEVICE_INTEL_MIC);
 
   fclose (src_file);
 
diff --git a/gcc/fortran/gfortranspec.c b/gcc/fortran/gfortranspec.c
index fe594db..e3e83ba 100644
--- a/gcc/fortran/gfortranspec.c
+++ b/gcc/fortran/gfortranspec.c
@@ -439,7 +439,7 @@ int
 lang_specific_pre_link (void)
 {
   if (library)
-    do_spec ("%:include(libgfortran.spec)");
+    do_spec ("%:include(libgfortran.spec)", 0);
 
   return 0;
 }
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 7f5a36e..02795e7 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -401,6 +401,8 @@ static const char *compare_debug_auxbase_opt_spec_function (int, const char **);
 static const char *pass_through_libs_spec_func (int, const char **);
 static const char *replace_extension_spec_func (int, const char **);
 static const char *greater_than_spec_func (int, const char **);
+static const char *add_omp_infile_spec_func (int, const char **);
+
 static char *convert_white_space (char *);
 \f
 /* The Specs Language
@@ -1193,6 +1195,11 @@ static const char *const multilib_defaults_raw[] = MULTILIB_DEFAULTS;
 
 static const char *const driver_self_specs[] = {
   "%{fdump-final-insns:-fdump-final-insns=.} %<fdump-final-insns",
+#ifdef ENABLE_OFFLOADING
+  /* If linking against libgomp, add a setup file.  */
+  "%{fopenacc|fopenmp|%:gt(%{ftree-parallelize-loops=*} 1):" \
+  "%:add-omp-infile()}",
+#endif /* ENABLE_OFFLOADING */
   DRIVER_SELF_SPECS, CONFIGURE_SPECS, GOMP_SELF_SPECS, GTM_SELF_SPECS,
   CILK_SELF_SPECS
 };
@@ -1620,6 +1627,7 @@ static const struct spec_function static_spec_functions[] =
   { "pass-through-libs",	pass_through_libs_spec_func },
   { "replace-extension",	replace_extension_spec_func },
   { "gt",			greater_than_spec_func },
+  { "add-omp-infile",		add_omp_infile_spec_func },
 #ifdef EXTRA_SPEC_FUNCTIONS
   EXTRA_SPEC_FUNCTIONS
 #endif
@@ -3216,7 +3224,8 @@ execute (void)
    The `validated' field describes whether any spec has looked at this switch;
    if it remains false at the end of the run, the switch must be meaningless.
    The `ordering' field is used to temporarily mark switches that have to be
-   kept in a specific order.  */
+   kept in a specific order.
+   The `lang_mask' field stores the flags associated with this option.  */
 
 #define SWITCH_LIVE    			(1 << 0)
 #define SWITCH_FALSE   			(1 << 1)
@@ -3232,6 +3241,7 @@ struct switchstr
   bool known;
   bool validated;
   bool ordering;
+  unsigned int lang_mask;
 };
 
 static struct switchstr *switches;
@@ -3240,6 +3250,10 @@ static int n_switches;
 
 static int n_switches_alloc;
 
+/* If nonzero, do not pass through switches for languages not matching
+   this mask.  */
+static unsigned int spec_lang_mask_accept;
+
 /* Set to zero if -fcompare-debug is disabled, positive if it's
    enabled and we're running the first compilation, negative if it's
    enabled and we're running the second compilation.  For most of the
@@ -3277,6 +3291,7 @@ struct infile
   const char *name;
   const char *language;
   struct compiler *incompiler;
+  unsigned int lang_mask;
   bool compiled;
   bool preprocessed;
 };
@@ -3470,15 +3485,16 @@ alloc_infile (void)
     }
 }
 
-/* Store an input file with the given NAME and LANGUAGE in
+/* Store an input file with the given NAME and LANGUAGE and LANG_MASK in
    infiles.  */
 
 static void
-add_infile (const char *name, const char *language)
+add_infile (const char *name, const char *language, unsigned int lang_mask)
 {
   alloc_infile ();
   infiles[n_infiles].name = name;
-  infiles[n_infiles++].language = language;
+  infiles[n_infiles].language = language;
+  infiles[n_infiles++].lang_mask = lang_mask;
 }
 
 /* Allocate space for a switch in switches.  */
@@ -3499,11 +3515,12 @@ alloc_switch (void)
 }
 
 /* Save an option OPT with N_ARGS arguments in array ARGS, marking it
-   as validated if VALIDATED and KNOWN if it is an internal switch.  */
+   as validated if VALIDATED and KNOWN if it is an internal switch.
+   LANG_MASK is the flags associated with this option.  */
 
 static void
 save_switch (const char *opt, size_t n_args, const char *const *args,
-	     bool validated, bool known)
+	     bool validated, bool known, unsigned int lang_mask)
 {
   alloc_switch ();
   switches[n_switches].part1 = opt + 1;
@@ -3520,6 +3537,7 @@ save_switch (const char *opt, size_t n_args, const char *const *args,
   switches[n_switches].validated = validated;
   switches[n_switches].known = known;
   switches[n_switches].ordering = 0;
+  switches[n_switches].lang_mask = lang_mask;
   n_switches++;
 }
 
@@ -3537,7 +3555,8 @@ driver_unknown_option_callback (const struct cl_decoded_option *decoded)
 	 diagnosed only if there are warnings.  */
       save_switch (decoded->canonical_option[0],
 		   decoded->canonical_option_num_elements - 1,
-		   &decoded->canonical_option[1], false, true);
+		   &decoded->canonical_option[1], false, true,
+		   cl_options[decoded->opt_index].flags);
       return false;
     }
   if (decoded->opt_index == OPT_SPECIAL_unknown)
@@ -3545,7 +3564,8 @@ driver_unknown_option_callback (const struct cl_decoded_option *decoded)
       /* Give it a chance to define it a spec file.  */
       save_switch (decoded->canonical_option[0],
 		   decoded->canonical_option_num_elements - 1,
-		   &decoded->canonical_option[1], false, false);
+		   &decoded->canonical_option[1], false, false,
+		   cl_options[decoded->opt_index].flags);
       return false;
     }
   else
@@ -3572,7 +3592,8 @@ driver_wrong_lang_callback (const struct cl_decoded_option *decoded,
   else
     save_switch (decoded->canonical_option[0],
 		 decoded->canonical_option_num_elements - 1,
-		 &decoded->canonical_option[1], false, true);
+		 &decoded->canonical_option[1], false, true,
+		 option->flags);
 }
 
 static const char *spec_lang = 0;
@@ -3821,7 +3842,8 @@ driver_handle_option (struct gcc_options *opts,
 	compare_debug_opt = NULL;
       else
 	compare_debug_opt = arg;
-      save_switch (compare_debug_replacement_opt, 0, NULL, validated, true);
+      save_switch (compare_debug_replacement_opt, 0, NULL, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
     case OPT_fdiagnostics_color_:
@@ -3876,17 +3898,17 @@ driver_handle_option (struct gcc_options *opts,
 	for (j = 0; arg[j]; j++)
 	  if (arg[j] == ',')
 	    {
-	      add_infile (save_string (arg + prev, j - prev), "*");
+	      add_infile (save_string (arg + prev, j - prev), "*", 0);
 	      prev = j + 1;
 	    }
 	/* Record the part after the last comma.  */
-	add_infile (arg + prev, "*");
+	add_infile (arg + prev, "*", 0);
       }
       do_save = false;
       break;
 
     case OPT_Xlinker:
-      add_infile (arg, "*");
+      add_infile (arg, "*", 0);
       do_save = false;
       break;
 
@@ -3903,19 +3925,21 @@ driver_handle_option (struct gcc_options *opts,
     case OPT_l:
       /* POSIX allows separation of -l and the lib arg; canonicalize
 	 by concatenating -l with its arg */
-      add_infile (concat ("-l", arg, NULL), "*");
+      add_infile (concat ("-l", arg, NULL), "*", 0);
       do_save = false;
       break;
 
     case OPT_L:
       /* Similarly, canonicalize -L for linkers that may not accept
 	 separate arguments.  */
-      save_switch (concat ("-L", arg, NULL), 0, NULL, validated, true);
+      save_switch (concat ("-L", arg, NULL), 0, NULL, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
     case OPT_F:
       /* Likewise -F.  */
-      save_switch (concat ("-F", arg, NULL), 0, NULL, validated, true);
+      save_switch (concat ("-F", arg, NULL), 0, NULL, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
     case OPT_save_temps:
@@ -4038,7 +4062,8 @@ driver_handle_option (struct gcc_options *opts,
       save_temps_prefix = xstrdup (arg);
       /* On some systems, ld cannot handle "-o" without a space.  So
 	 split the option from its argument.  */
-      save_switch ("-o", 1, &arg, validated, true);
+      save_switch ("-o", 1, &arg, validated, true,
+		   cl_options[opt_index].flags);
       return true;
 
 #ifdef ENABLE_DEFAULT_PIE
@@ -4074,7 +4099,8 @@ driver_handle_option (struct gcc_options *opts,
   if (do_save)
     save_switch (decoded->canonical_option[0],
 		 decoded->canonical_option_num_elements - 1,
-		 &decoded->canonical_option[1], validated, true);
+		 &decoded->canonical_option[1], validated, true,
+		 cl_options[opt_index].flags);
   return true;
 }
 
@@ -4371,7 +4397,7 @@ process_command (unsigned int decoded_options_count,
           if (strcmp (fname, "-") != 0 && access (fname, F_OK) < 0)
 	    perror_with_name (fname);
           else
-	    add_infile (arg, spec_lang);
+	    add_infile (arg, spec_lang, 0);
 
           free (fname);
 	  continue;
@@ -4520,7 +4546,8 @@ process_command (unsigned int decoded_options_count,
   if (compare_debug == 2 || compare_debug == 3)
     {
       const char *opt = concat ("-fcompare-debug=", compare_debug_opt, NULL);
-      save_switch (opt, 0, NULL, false, true);
+      save_switch (opt, 0, NULL, false, true,
+		   cl_options[OPT_fcompare_debug_].flags);
       compare_debug = 1;
     }
 
@@ -4531,7 +4558,7 @@ process_command (unsigned int decoded_options_count,
 
       /* Create a dummy input file, so that we can pass
 	 the help option on to the various sub-processes.  */
-      add_infile ("help-dummy", "c");
+      add_infile ("help-dummy", "c", 0);
     }
 
   alloc_switch ();
@@ -4732,13 +4759,15 @@ insert_wrapper (const char *wrapper)
 }
 
 /* Process the spec SPEC and run the commands specified therein.
+   If LANG_MASK is nonzero, switches for other languages are discarded.
    Returns 0 if the spec is successfully processed; -1 if failed.  */
 
 int
-do_spec (const char *spec)
+do_spec (const char *spec, unsigned int lang_mask)
 {
   int value;
 
+  spec_lang_mask_accept = lang_mask;
   value = do_spec_2 (spec);
 
   /* Force out any unfinished command.
@@ -4896,7 +4925,8 @@ do_self_spec (const char *spec)
 	      save_switch (decoded_options[j].canonical_option[0],
 			   (decoded_options[j].canonical_option_num_elements
 			    - 1),
-			   &decoded_options[j].canonical_option[1], false, true);
+			   &decoded_options[j].canonical_option[1], false, true,
+			   cl_options[decoded_options[j].opt_index].flags);
 	      break;
 
 	    default:
@@ -6492,6 +6522,14 @@ check_live_switch (int switchnum, int prefix_length)
 static void
 give_switch (int switchnum, int omit_first_word)
 {
+  int lang_mask = switches[switchnum].lang_mask & ((1U << cl_lang_count) - 1);
+  unsigned int lang_mask_accept = (1U << cl_lang_count) - 1;
+  if (spec_lang_mask_accept != 0)
+    lang_mask_accept = spec_lang_mask_accept;
+  /* Drop switches specific to a language not in the given mask.  */
+  if (lang_mask != 0 && !(lang_mask & lang_mask_accept))
+    return;
+
   if ((switches[switchnum].live_cond & SWITCH_IGNORE) != 0)
     return;
 
@@ -7593,9 +7631,6 @@ driver::maybe_putenv_OFFLOAD_TARGETS () const
 		    strlen (offload_targets) + 1);
       xputenv (XOBFINISH (&collect_obstack, char *));
     }
-
-  free (offload_targets);
-  offload_targets = NULL;
 }
 
 /* Reject switches that no pass was interested in.  */
@@ -7899,7 +7934,8 @@ driver::do_spec_on_infiles () const
 		  debug_check_temp_file[1] = NULL;
 		}
 
-	      value = do_spec (input_file_compiler->spec);
+	      value = do_spec (input_file_compiler->spec,
+			       infiles[i].lang_mask);
 	      infiles[i].compiled = true;
 	      if (value < 0)
 		this_file_error = 1;
@@ -7913,7 +7949,8 @@ driver::do_spec_on_infiles () const
 		  n_switches_alloc = n_switches_alloc_debug_check[1];
 		  switches = switches_debug_check[1];
 
-		  value = do_spec (input_file_compiler->spec);
+		  value = do_spec (input_file_compiler->spec,
+				   infiles[i].lang_mask);
 
 		  compare_debug = -compare_debug;
 		  n_switches = n_switches_debug_check[0];
@@ -8068,7 +8105,7 @@ driver::maybe_run_linker (const char *argv0) const
 		    " to the linker.\n\n"));
 	  fflush (stdout);
 	}
-      int value = do_spec (link_command_spec);
+      int value = do_spec (link_command_spec, 0);
       if (value < 0)
 	errorcount = 1;
       linker_was_run = (tmp != execution_count);
@@ -9659,6 +9696,50 @@ greater_than_spec_func (int argc, const char **argv)
   return NULL;
 }
 
+/* If applicable, generate a C source file containing a constructor call to
+   GOMP_enable_offload_targets, to inform libgomp which offload targets have
+   actually been requested (-foffload=[...]), and add that as an infile.  */
+
+static const char *
+add_omp_infile_spec_func (int argc, const char **)
+{
+  gcc_assert (argc == 0);
+  gcc_assert (offload_targets != NULL);
+
+  /* Nothing to do if we're not actually linking.  */
+  if (have_c)
+    return NULL;
+
+  int err;
+  const char *tmp_filename;
+  tmp_filename = make_temp_file (".c");
+  record_temp_file (tmp_filename, !save_temps_flag, 0);
+  FILE *f = fopen (tmp_filename, "w");
+  if (f == NULL)
+    fatal_error (input_location,
+		 "could not open temporary file %s", tmp_filename);
+  /* As libgomp uses constructors internally, and this code is only added when
+     linking against libgomp, it is fine to use a constructor here.  */
+  err = fprintf (f,
+		 "extern void GOMP_enable_offload_targets (const char *);\n"
+		 "static __attribute__ ((constructor)) void\n"
+		 "init (void)\n"
+		 "{\n"
+		 "  GOMP_enable_offload_targets (\"%s\");\n"
+		 "}\n",
+		 offload_targets);
+  if (err < 0)
+    fatal_error (input_location,
+		 "could not write to temporary file %s", tmp_filename);
+  err = fclose (f);
+  if (err == EOF)
+    fatal_error (input_location,
+		 "could not close temporary file %s", tmp_filename);
+
+  add_infile (tmp_filename, "cpp-output", CL_C);
+  return NULL;
+}
+
 /* Insert backslash before spaces in ORIG (usually a file path), to 
    avoid being broken by spec parser.
 
diff --git a/gcc/gcc.h b/gcc/gcc.h
index e1abe43..c71582d 100644
--- a/gcc/gcc.h
+++ b/gcc/gcc.h
@@ -68,7 +68,7 @@ struct spec_function
 };
 
 /* These are exported by gcc.c.  */
-extern int do_spec (const char *);
+extern int do_spec (const char *, unsigned int);
 extern void record_temp_file (const char *, int, int);
 extern void pfatal_with_name (const char *) ATTRIBUTE_NORETURN;
 extern void set_input (const char *);
diff --git a/gcc/java/jvspec.c b/gcc/java/jvspec.c
index d4efb73..518aa4d 100644
--- a/gcc/java/jvspec.c
+++ b/gcc/java/jvspec.c
@@ -629,7 +629,7 @@ lang_specific_pre_link (void)
      class name.  Append dummy `.c' that can be stripped by set_input so %b
      is correct.  */ 
   set_input (concat (main_class_name, "main.c", NULL));
-  err = do_spec (jvgenmain_spec);
+  err = do_spec (jvgenmain_spec, 0);
   if (err == 0)
     {
       /* Shift the outfiles array so the generated main comes first.
diff --git a/libgomp/config.h.in b/libgomp/config.h.in
index 2e4c698..d63e56a 100644
--- a/libgomp/config.h.in
+++ b/libgomp/config.h.in
@@ -95,7 +95,7 @@
    */
 #undef LT_OBJDIR
 
-/* Define to offload targets, separated by commas. */
+/* Define to offload targets, separated by colons. */
 #undef OFFLOAD_TARGETS
 
 /* Name of package */
diff --git a/libgomp/configure b/libgomp/configure
index 74d4e82..36ae548 100755
--- a/libgomp/configure
+++ b/libgomp/configure
@@ -15236,10 +15236,8 @@ if test x"$enable_offload_targets" != x; then
     tgt=`echo $tgt | sed 's/=.*//'`
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
-	tgt_name=intelmic
 	;;
       nvptx*)
-        tgt_name=nvptx
 	PLUGIN_NVPTX=$tgt
 	PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
 	PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
@@ -15282,9 +15280,9 @@ rm -f core conftest.err conftest.$ac_objext \
 	;;
     esac
     if test x"$offload_targets" = x; then
-      offload_targets=$tgt_name
+      offload_targets=$tgt
     else
-      offload_targets=$offload_targets,$tgt_name
+      offload_targets=$offload_targets:$tgt
     fi
     if test x"$tgt_dir" != x; then
       offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
index 24fbb94..5da4fa7 100644
--- a/libgomp/libgomp-plugin.h
+++ b/libgomp/libgomp-plugin.h
@@ -48,7 +48,8 @@ enum offload_target_type
   OFFLOAD_TARGET_TYPE_HOST = 2,
   /* OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3 removed.  */
   OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5,
-  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
+  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6,
+  OFFLOAD_TARGET_TYPE_HWM
 };
 
 /* Auxiliary struct, used for transferring pairs of addresses from plugin
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 9c8b1fb..e945851 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -739,6 +739,7 @@ extern void gomp_free_thread (void *);
 
 extern void gomp_init_targets_once (void);
 extern int gomp_get_num_devices (void);
+extern bool gomp_offload_target_enabled_p (enum offload_target_type);
 extern void gomp_target_task_fn (void *);
 
 typedef struct splay_tree_node_s *splay_tree_node;
diff --git a/libgomp/libgomp.map b/libgomp/libgomp.map
index 2153661..05d5195 100644
--- a/libgomp/libgomp.map
+++ b/libgomp/libgomp.map
@@ -286,6 +286,7 @@ GOMP_4.5 {
 	GOMP_loop_ull_doacross_static_start;
 	GOMP_doacross_ull_post;
 	GOMP_doacross_ull_wait;
+	GOMP_enable_offload_targets;
 } GOMP_4.0.1;
 
 OACC_2.0 {
diff --git a/libgomp/libgomp_g.h b/libgomp/libgomp_g.h
index c28ad21..cc19767 100644
--- a/libgomp/libgomp_g.h
+++ b/libgomp/libgomp_g.h
@@ -247,6 +247,7 @@ extern void GOMP_single_copy_end (void *);
 
 /* target.c */
 
+extern void GOMP_enable_offload_targets (const char *);
 extern void GOMP_target (int, void (*) (void *), const void *,
 			 size_t, void **, size_t *, unsigned char *);
 extern void GOMP_target_41 (int, void (*) (void *), size_t, void **, size_t *,
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index a0e62a4..2b357e1 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -122,7 +122,9 @@ resolve_device (acc_device_t d, bool fail_is_error)
       {
 	if (goacc_device_type)
 	  {
-	    /* Lookup the named device.  */
+	    /* Lookup the device that has been explicitly named, so do not pay
+	       attention to gomp_offload_target_enabled_p.  (That is, hard
+	       error if not actually enabled.)  */
 	    while (++d != _ACC_device_hwm)
 	      if (dispatchers[d]
 		  && !strcasecmp (goacc_device_type,
@@ -148,8 +150,14 @@ resolve_device (acc_device_t d, bool fail_is_error)
     case acc_device_not_host:
       /* Find the first available device after acc_device_not_host.  */
       while (++d != _ACC_device_hwm)
-	if (dispatchers[d] && dispatchers[d]->get_num_devices_func () > 0)
+	if (dispatchers[d]
+	    && dispatchers[d]->get_num_devices_func () > 0
+	    /* No device has been explicitly named, so pay attention to
+	       gomp_offload_target_enabled_p, to not decide on an offload
+	       target that has not been enabled.  */
+	    && gomp_offload_target_enabled_p (dispatchers[d]->type))
 	  goto found;
+      /* No non-host device found.  */
       if (d_arg == acc_device_default)
 	{
 	  d = acc_device_host;
@@ -164,9 +172,6 @@ resolve_device (acc_device_t d, bool fail_is_error)
         return NULL;
       break;
 
-    case acc_device_host:
-      break;
-
     default:
       if (d > _ACC_device_hwm)
 	{
@@ -181,7 +186,8 @@ resolve_device (acc_device_t d, bool fail_is_error)
 
   assert (d != acc_device_none
 	  && d != acc_device_default
-	  && d != acc_device_not_host);
+	  && d != acc_device_not_host
+	  && d < _ACC_device_hwm);
 
   if (dispatchers[d] == NULL && fail_is_error)
     {
diff --git a/libgomp/plugin/configfrag.ac b/libgomp/plugin/configfrag.ac
index ad70dd1..a1bfec6 100644
--- a/libgomp/plugin/configfrag.ac
+++ b/libgomp/plugin/configfrag.ac
@@ -92,10 +92,8 @@ if test x"$enable_offload_targets" != x; then
     tgt=`echo $tgt | sed 's/=.*//'`
     case $tgt in
       *-intelmic-* | *-intelmicemul-*)
-	tgt_name=intelmic
 	;;
       nvptx*)
-        tgt_name=nvptx
 	PLUGIN_NVPTX=$tgt
 	PLUGIN_NVPTX_CPPFLAGS=$CUDA_DRIVER_CPPFLAGS
 	PLUGIN_NVPTX_LDFLAGS=$CUDA_DRIVER_LDFLAGS
@@ -127,9 +125,9 @@ if test x"$enable_offload_targets" != x; then
 	;;
     esac
     if test x"$offload_targets" = x; then
-      offload_targets=$tgt_name
+      offload_targets=$tgt
     else
-      offload_targets=$offload_targets,$tgt_name
+      offload_targets=$offload_targets:$tgt
     fi
     if test x"$tgt_dir" != x; then
       offload_additional_options="$offload_additional_options -B$tgt_dir/libexec/gcc/\$(target_alias)/\$(gcc_version) -B$tgt_dir/bin"
@@ -141,7 +139,7 @@ if test x"$enable_offload_targets" != x; then
   done
 fi
 AC_DEFINE_UNQUOTED(OFFLOAD_TARGETS, "$offload_targets",
-  [Define to offload targets, separated by commas.])
+  [Define to offload targets, separated by colons.])
 AM_CONDITIONAL([PLUGIN_NVPTX], [test $PLUGIN_NVPTX = 1])
 AC_DEFINE_UNQUOTED([PLUGIN_NVPTX], [$PLUGIN_NVPTX],
   [Define to 1 if the NVIDIA plugin is built, 0 if not.])
diff --git a/libgomp/target.c b/libgomp/target.c
index b767410..df51bfb 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -72,6 +72,9 @@ static int num_offload_images;
 /* Array of descriptors for all available devices.  */
 static struct gomp_device_descr *devices;
 
+/* Set of enabled devices.  */
+static bool devices_enabled[OFFLOAD_TARGET_TYPE_HWM];
+
 /* Total number of available devices.  */
 static int num_devices;
 
@@ -123,17 +126,27 @@ gomp_get_num_devices (void)
 }
 
 static struct gomp_device_descr *
-resolve_device (int device_id)
+resolve_device (int device)
 {
-  if (device_id == GOMP_DEVICE_ICV)
+  int device_id;
+  if (device == GOMP_DEVICE_ICV)
     {
       struct gomp_task_icv *icv = gomp_icv (false);
       device_id = icv->default_device_var;
     }
+  else
+    device_id = device;
 
   if (device_id < 0 || device_id >= gomp_get_num_devices ())
     return NULL;
 
+  /* If the device specified by the device-var ICV is not actually enabled,
+     don't try use it (which will fail if it doesn't have offload data
+     available), and use host fallback instead.  */
+  if (device == GOMP_DEVICE_ICV
+      && !gomp_offload_target_enabled_p (devices[device_id].type))
+    return NULL;
+
   gomp_mutex_lock (&devices[device_id].lock);
   if (!devices[device_id].is_initialized)
     gomp_init_device (&devices[device_id]);
@@ -1063,6 +1076,8 @@ void
 GOMP_offload_register_ver (unsigned version, const void *host_table,
 			   int target_type, const void *target_data)
 {
+  gomp_debug (0, "%s (%#x, %d)\n", __FUNCTION__, version, target_type);
+
   int i;
 
   if (GOMP_VERSION_LIB (version) > GOMP_VERSION)
@@ -1100,6 +1115,18 @@ void
 GOMP_offload_register (const void *host_table, int target_type,
 		       const void *target_data)
 {
+  gomp_debug (0, "%s (%d)\n", __FUNCTION__, target_type);
+
+  gomp_mutex_lock (&register_lock);
+  /* If we're seeing this function called, then default to the old behavior of
+     enabling all offload targets: this is what old executables and shared
+     libraries expect.  */
+  for (enum offload_target_type type = 0;
+       type < OFFLOAD_TARGET_TYPE_HWM;
+       ++type)
+    devices_enabled[type] = true;
+  gomp_mutex_unlock (&register_lock);
+
   GOMP_offload_register_ver (0, host_table, target_type, target_data);
 }
 
@@ -1111,6 +1138,8 @@ void
 GOMP_offload_unregister_ver (unsigned version, const void *host_table,
 			     int target_type, const void *target_data)
 {
+  gomp_debug (0, "%s (%#x, %d)\n", __FUNCTION__, version, target_type);
+
   int i;
 
   gomp_mutex_lock (&register_lock);
@@ -1141,6 +1170,8 @@ void
 GOMP_offload_unregister (const void *host_table, int target_type,
 			 const void *target_data)
 {
+  gomp_debug (0, "%s (%d)\n", __FUNCTION__, target_type);
+
   GOMP_offload_unregister_ver (0, host_table, target_type, target_data);
 }
 
@@ -1213,6 +1244,24 @@ gomp_fini_device (struct gomp_device_descr *devicep)
   devicep->is_initialized = false;
 }
 
+/* Has the offload target type TYPE been enabled?
+
+   We cannot verify that *all* offload data is available that could possibly be
+   required, so if we later find any offload data missing for this offload
+   target, then that's user error.  */
+
+attribute_hidden bool
+gomp_offload_target_enabled_p (enum offload_target_type type)
+{
+  bool ret;
+
+  gomp_mutex_lock (&register_lock);
+  ret = devices_enabled[type];
+  gomp_mutex_unlock (&register_lock);
+
+  return ret;
+}
+
 /* Host fallback for GOMP_target{,_41} routines.  */
 
 static void
@@ -2071,6 +2120,8 @@ static bool
 gomp_load_plugin_for_device (struct gomp_device_descr *device,
 			     const char *plugin_name)
 {
+  gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, plugin_name);
+
   const char *err = NULL, *last_missing = NULL;
 
   void *plugin_handle = dlopen (plugin_name, RTLD_LAZY);
@@ -2169,6 +2220,78 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   return 0;
 }
 
+/* Return the corresponding offload target type for the offload target name
+   OFFLOAD_TARGET, or 0 if unknown.  */
+
+static enum offload_target_type
+offload_target_to_type (const char *offload_target)
+{
+  if (strstr (offload_target, "-intelmic") != NULL)
+    return OFFLOAD_TARGET_TYPE_INTEL_MIC;
+  else if (strncmp (offload_target, "nvptx", 5) == 0)
+    return OFFLOAD_TARGET_TYPE_NVIDIA_PTX;
+  else
+    return 0;
+}
+
+/* Return the corresponding plugin name for the offload target type TYPE, or
+   NULL if unknown.  */
+
+static const char *
+offload_target_type_to_plugin_name (enum offload_target_type type)
+{
+  switch (type)
+    {
+    case OFFLOAD_TARGET_TYPE_INTEL_MIC:
+      return "intelmic";
+    case OFFLOAD_TARGET_TYPE_NVIDIA_PTX:
+      return "nvptx";
+    default:
+      return NULL;
+    }
+}
+
+/* Enable the specified OFFLOAD_TARGETS, the set passed to the compiler at link
+   time.  */
+
+void
+GOMP_enable_offload_targets (const char *offload_targets)
+{
+  gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, offload_targets);
+
+  char *offload_targets_dup = strdup (offload_targets);
+  if (offload_targets_dup == NULL)
+    gomp_fatal ("Out of memory");
+
+  gomp_mutex_lock (&register_lock);
+
+  char *cur = offload_targets_dup;
+  while (cur)
+    {
+      char *next = strchr (cur, ':');
+      if (next != NULL)
+	{
+	  *next = '\0';
+	  ++next;
+	}
+      enum offload_target_type type = offload_target_to_type (cur);
+      if (type == 0)
+	{
+	  /* An unknown offload target has been requested; ignore it.  This
+	     makes us (future-)proof if offload targets are requested that
+	     are not supported in this build of libgomp.  */
+	}
+      else
+	devices_enabled[type] = true;
+
+      cur = next;
+    }
+
+  gomp_mutex_unlock (&register_lock);
+
+  free (offload_targets_dup);
+}
+
 /* This function initializes the runtime needed for offloading.
    It parses the list of offload targets and tries to load the plugins for
    these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -2176,13 +2299,13 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
    corresponding devices, first the GOMP_OFFLOAD_CAP_OPENMP_400 ones, follows
    by the others.  */
 
+static const char *gomp_plugin_prefix ="libgomp-plugin-";
+static const char *gomp_plugin_suffix = SONAME_SUFFIX (1);
+
 static void
 gomp_target_init (void)
 {
-  const char *prefix ="libgomp-plugin-";
-  const char *suffix = SONAME_SUFFIX (1);
   const char *cur, *next;
-  char *plugin_name;
   int i, new_num_devices;
 
   num_devices = 0;
@@ -2192,44 +2315,58 @@ gomp_target_init (void)
   if (*cur)
     do
       {
-	struct gomp_device_descr current_device;
-
-	next = strchr (cur, ',');
-
-	plugin_name = (char *) malloc (1 + (next ? next - cur : strlen (cur))
-				       + strlen (prefix) + strlen (suffix));
-	if (!plugin_name)
-	  {
-	    num_devices = 0;
-	    break;
-	  }
-
-	strcpy (plugin_name, prefix);
-	strncat (plugin_name, cur, next ? next - cur : strlen (cur));
-	strcat (plugin_name, suffix);
+	next = strchr (cur, ':');
+	/* If no other offload target following...  */
+	if (next == NULL)
+	  /* ..., point to the terminating NUL character.  */
+	  next = strchr (cur, '\0');
+
+	size_t gomp_plugin_prefix_len = strlen (gomp_plugin_prefix);
+	size_t cur_len = next - cur;
+	size_t gomp_plugin_suffix_len = strlen (gomp_plugin_suffix);
+	char *plugin_name
+	  = gomp_realloc_unlock (NULL, (gomp_plugin_prefix_len
+					+ cur_len
+					+ gomp_plugin_suffix_len
+					+ 1));
+	memcpy (plugin_name, gomp_plugin_prefix, gomp_plugin_prefix_len);
+	memcpy (plugin_name + gomp_plugin_prefix_len, cur, cur_len);
+	/* NUL-terminate the string here...  */
+	plugin_name[gomp_plugin_prefix_len + cur_len] = '\0';
+	/* ..., so that we can then use it to translate the offload target to
+	   the plugin name...  */
+	enum offload_target_type type
+	  = offload_target_to_type (plugin_name + gomp_plugin_prefix_len);
+	const char *cur_plugin_name
+	  = offload_target_type_to_plugin_name (type);
+	size_t cur_plugin_name_len = strlen (cur_plugin_name);
+	assert (cur_plugin_name_len <= cur_len);
+	/* ..., and then rewrite it.  */
+	memcpy (plugin_name + gomp_plugin_prefix_len,
+		cur_plugin_name, cur_plugin_name_len);
+	memcpy (plugin_name + gomp_plugin_prefix_len + cur_plugin_name_len,
+		gomp_plugin_suffix, gomp_plugin_suffix_len);
+	plugin_name[gomp_plugin_prefix_len
+		    + cur_plugin_name_len
+		    + gomp_plugin_suffix_len] = '\0';
 
+	struct gomp_device_descr current_device;
 	if (gomp_load_plugin_for_device (&current_device, plugin_name))
 	  {
 	    new_num_devices = current_device.get_num_devices_func ();
 	    if (new_num_devices >= 1)
 	      {
-		/* Augment DEVICES and NUM_DEVICES.  */
-
-		devices = realloc (devices, (num_devices + new_num_devices)
-				   * sizeof (struct gomp_device_descr));
-		if (!devices)
-		  {
-		    num_devices = 0;
-		    free (plugin_name);
-		    break;
-		  }
-
 		current_device.name = current_device.get_name_func ();
 		/* current_device.capabilities has already been set.  */
 		current_device.type = current_device.get_type_func ();
 		current_device.mem_map.root = NULL;
 		current_device.is_initialized = false;
 		current_device.openacc.data_environ = NULL;
+
+		/* Augment DEVICES and NUM_DEVICES.  */
+		devices = gomp_realloc_unlock
+		  (devices, ((num_devices + new_num_devices)
+			     * sizeof (struct gomp_device_descr)));
 		for (i = 0; i < new_num_devices; i++)
 		  {
 		    current_device.target_id = i;
@@ -2243,18 +2380,13 @@ gomp_target_init (void)
 	free (plugin_name);
 	cur = next + 1;
       }
-    while (next);
+    while (*next);
 
   /* In DEVICES, sort the GOMP_OFFLOAD_CAP_OPENMP_400 ones first, and set
      NUM_DEVICES_OPENMP.  */
   struct gomp_device_descr *devices_s
-    = malloc (num_devices * sizeof (struct gomp_device_descr));
-  if (!devices_s)
-    {
-      num_devices = 0;
-      free (devices);
-      devices = NULL;
-    }
+    = gomp_realloc_unlock (NULL,
+			   num_devices * sizeof (struct gomp_device_descr));
   num_devices_openmp = 0;
   for (i = 0; i < num_devices; i++)
     if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
diff --git a/libgomp/testsuite/lib/libgomp.exp b/libgomp/testsuite/lib/libgomp.exp
index 6dc1e8e..07f85ef 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -37,24 +37,21 @@ load_gcc_lib fortran-modules.exp
 load_file libgomp-test-support.exp
 
 # Populate offload_targets_s (offloading targets separated by a space), and
-# offload_targets_s_openacc (the same, but with OpenACC names; OpenACC spells
-# some of them a little differently).
-set offload_targets_s [split $offload_targets ","]
+# offload_targets_s_openacc (those suitable for OpenACC).
+set offload_targets_s [split $offload_targets ":"]
 set offload_targets_s_openacc {}
 foreach offload_target_openacc $offload_targets_s {
-    switch $offload_target_openacc {
-	intelmic {
+    switch -glob $offload_target_openacc {
+	*-intelmic* {
 	    # Skip; will all FAIL because of missing
 	    # GOMP_OFFLOAD_CAP_OPENACC_200.
 	    continue
 	}
-	nvptx {
-	    set offload_target_openacc "nvidia"
-	}
     }
     lappend offload_targets_s_openacc "$offload_target_openacc"
 }
-lappend offload_targets_s_openacc "host"
+# Host fallback.
+lappend offload_targets_s_openacc "disable"
 
 set dg-do-what-default run
 
@@ -135,7 +132,7 @@ proc libgomp_init { args } {
     # Add liboffloadmic build directory in LD_LIBRARY_PATH to support
     # non-fallback testing for Intel MIC targets
     global offload_targets
-    if { [string match "*,intelmic,*" ",$offload_targets,"] } {
+    if { [string match "*:*-intelmic*:*" ":$offload_targets:"] } {
 	append always_ld_library_path ":${blddir}/../liboffloadmic/.libs"
 	append always_ld_library_path ":${blddir}/../liboffloadmic/plugin/.libs"
 	# libstdc++ is required by liboffloadmic
@@ -346,15 +343,14 @@ proc check_effective_target_openacc_nvidia_accel_present { } {
 }
 
 # Return 1 if at least one nvidia board is present, and the nvidia device type
-# is selected by default by means of setting the environment variable
-# ACC_DEVICE_TYPE.
+# is selected by default.
 
 proc check_effective_target_openacc_nvidia_accel_selected { } {
     if { ![check_effective_target_openacc_nvidia_accel_present] } {
 	return 0;
     }
     global offload_target_openacc
-    if { $offload_target_openacc == "nvidia" } {
+    if { [string match "nvptx*" $offload_target_openacc] } {
         return 1;
     }
     return 0;
@@ -364,7 +360,7 @@ proc check_effective_target_openacc_nvidia_accel_selected { } {
 
 proc check_effective_target_openacc_host_selected { } {
     global offload_target_openacc
-    if { $offload_target_openacc == "host" } {
+    if { $offload_target_openacc == "disable" } {
         return 1;
     }
     return 0;
diff --git a/libgomp/testsuite/libgomp.c++/target-1-foffload_disable.C b/libgomp/testsuite/libgomp.c++/target-1-foffload_disable.C
new file mode 100644
index 0000000..15b9432
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/target-1-foffload_disable.C
@@ -0,0 +1,3 @@
+/* { dg-additional-options "-foffload=disable" } */
+
+#include "target-1.C"
diff --git a/libgomp/testsuite/libgomp.c++/target-foffload_disable.C b/libgomp/testsuite/libgomp.c++/target-foffload_disable.C
new file mode 100644
index 0000000..c07dea1
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c++/target-foffload_disable.C
@@ -0,0 +1,3 @@
+/* { dg-additional-options "-foffload=disable" } */
+
+#include "../libgomp.c/target-foffload_disable.c"
diff --git a/libgomp/testsuite/libgomp.c/target-1-foffload_disable.c b/libgomp/testsuite/libgomp.c/target-1-foffload_disable.c
new file mode 100644
index 0000000..177cceb
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-1-foffload_disable.c
@@ -0,0 +1,3 @@
+/* { dg-additional-options "-foffload=disable" } */
+
+#include "target-1.c"
diff --git a/libgomp/testsuite/libgomp.c/target-foffload_disable.c b/libgomp/testsuite/libgomp.c/target-foffload_disable.c
new file mode 100644
index 0000000..4a712da
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-foffload_disable.c
@@ -0,0 +1,18 @@
+/* { dg-additional-options "-foffload=disable" } */
+
+#include <omp.h>
+
+int main()
+{
+  if (!omp_is_initial_device())
+    __builtin_abort();
+#pragma omp target
+  {
+    if (!omp_is_initial_device())
+      __builtin_abort();
+  }
+  if (!omp_is_initial_device())
+    __builtin_abort();
+
+  return 0;
+}
diff --git a/libgomp/testsuite/libgomp.fortran/target-foffload_disable.f b/libgomp/testsuite/libgomp.fortran/target-foffload_disable.f
new file mode 100644
index 0000000..0d60534
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/target-foffload_disable.f
@@ -0,0 +1,14 @@
+!     { dg-additional-options "-foffload=disable" }
+
+      PROGRAM MAIN
+      IMPLICIT NONE
+
+      INCLUDE "omp_lib.h"
+
+      IF (.NOT. OMP_IS_INITIAL_DEVICE()) CALL ABORT
+!$OMP TARGET
+      IF (.NOT. OMP_IS_INITIAL_DEVICE()) CALL ABORT
+!$OMP END TARGET
+      IF (.NOT. OMP_IS_INITIAL_DEVICE()) CALL ABORT
+
+      END
diff --git a/libgomp/testsuite/libgomp.fortran/target1-foffload_disable.f90 b/libgomp/testsuite/libgomp.fortran/target1-foffload_disable.f90
new file mode 100644
index 0000000..005328e
--- /dev/null
+++ b/libgomp/testsuite/libgomp.fortran/target1-foffload_disable.f90
@@ -0,0 +1,3 @@
+! { dg-additional-options "-cpp -foffload=disable" }
+
+#include "target1.f90"
diff --git a/libgomp/testsuite/libgomp.oacc-c++/c++.exp b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
index 88b0269..aa545a2 100644
--- a/libgomp/testsuite/libgomp.oacc-c++/c++.exp
+++ b/libgomp/testsuite/libgomp.oacc-c++/c++.exp
@@ -75,13 +75,12 @@ if { $lang_test_file_found } {
 
     # Test OpenACC with available accelerators.
     foreach offload_target_openacc $offload_targets_s_openacc {
-	set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
-
-	switch $offload_target_openacc {
-	    host {
+	switch -glob $offload_target_openacc {
+	    disable {
 		set acc_mem_shared 1
+		set tagopt "-DACC_DEVICE_TYPE_host=1"
 	    }
-	    nvidia {
+	    nvptx* {
 		if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		    # Don't bother; execution testing is going to FAIL.
 		    untested "$subdir $offload_target_openacc offloading"
@@ -95,14 +94,13 @@ if { $lang_test_file_found } {
 		lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/libgomp.oacc-c-c++-common"
 
 		set acc_mem_shared 0
+		set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	    }
 	    default {
 		set acc_mem_shared 0
 	    }
 	}
-	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
-
-	setenv ACC_DEVICE_TYPE $offload_target_openacc
+	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
 	dg-runtest $tests "$tagopt" "$libstdcxx_includes $DEFAULT_CFLAGS"
     }
diff --git a/libgomp/testsuite/libgomp.oacc-c/c.exp b/libgomp/testsuite/libgomp.oacc-c/c.exp
index 5020e6a..9d2065f 100644
--- a/libgomp/testsuite/libgomp.oacc-c/c.exp
+++ b/libgomp/testsuite/libgomp.oacc-c/c.exp
@@ -38,13 +38,13 @@ set_ld_library_path_env_vars
 set SAVE_ALWAYS_CFLAGS "$ALWAYS_CFLAGS"
 foreach offload_target_openacc $offload_targets_s_openacc {
     set ALWAYS_CFLAGS "$SAVE_ALWAYS_CFLAGS"
-    set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
 
-    switch $offload_target_openacc {
-	host {
+    switch -glob $offload_target_openacc {
+	disable {
 	    set acc_mem_shared 1
+	    set tagopt "-DACC_DEVICE_TYPE_host=1"
 	}
-	nvidia {
+	nvptx* {
 	    if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		# Don't bother; execution testing is going to FAIL.
 		untested "$subdir $offload_target_openacc offloading"
@@ -58,14 +58,13 @@ foreach offload_target_openacc $offload_targets_s_openacc {
 	    lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/libgomp.oacc-c-c++-common"
 
 	    set acc_mem_shared 0
+	    set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	}
 	default {
 	    set acc_mem_shared 0
 	}
     }
-    set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
-
-    setenv ACC_DEVICE_TYPE $offload_target_openacc
+    set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
     dg-runtest $tests "$tagopt" $DEFAULT_CFLAGS
 }
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
index 2d6b647..3f678ba 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
+++ b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
@@ -67,13 +67,12 @@ if { $lang_test_file_found } {
 
     # Test OpenACC with available accelerators.
     foreach offload_target_openacc $offload_targets_s_openacc {
-	set tagopt "-DACC_DEVICE_TYPE_$offload_target_openacc=1"
-
-	switch $offload_target_openacc {
-	    host {
+	switch -glob $offload_target_openacc {
+	    disable {
 		set acc_mem_shared 1
+		set tagopt "-DACC_DEVICE_TYPE_host=1"
 	    }
-	    nvidia {
+	    nvptx* {
 		if { ![check_effective_target_openacc_nvidia_accel_present] } {
 		    # Don't bother; execution testing is going to FAIL.
 		    untested "$subdir $offload_target_openacc offloading"
@@ -81,14 +80,13 @@ if { $lang_test_file_found } {
 		}
 
 		set acc_mem_shared 0
+		set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
 	    }
 	    default {
 		set acc_mem_shared 0
 	    }
 	}
-	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared"
-
-	setenv ACC_DEVICE_TYPE $offload_target_openacc
+	set tagopt "$tagopt -DACC_MEM_SHARED=$acc_mem_shared -foffload=$offload_target_openacc"
 
 	# For Fortran we're doing torture testing, as Fortran has far more tests
 	# with arrays etc. that testing just -O0 or -O2 is insufficient, that is


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
  2015-10-19 16:56                     ` Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) Thomas Schwinge
@ 2015-10-20 10:03                       ` Jakub Jelinek
  2015-10-20 10:44                         ` Bernd Schmidt
  2015-10-20 11:18                         ` Thomas Schwinge
  0 siblings, 2 replies; 62+ messages in thread
From: Jakub Jelinek @ 2015-10-20 10:03 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches, Bernd Schmidt, Nathan Sidwell, Joseph Myers

On Mon, Oct 19, 2015 at 06:44:40PM +0200, Thomas Schwinge wrote:
> > How's the following (complete patch instead of incremental patch; the
> > driver changes are still the same as before)?  The changes are:
> > 
> >   * libgomp/target.c:gomp_target_init again loads all the plugins.
> >   * libgomp/target.c:resolve_device and
> >     libgomp/oacc-init.c:resolve_device verify that a default device
> >     (OpenMP device-var ICV, and acc_device_default, respectively) is
> >     actually enabled, or resort to host fallback if not.
> >   * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used
> >     to enable devices specified by -foffload.  Can be called multiple
> >     times (executable, any shared libraries); the set of enabled devices
> >     is the union of all those ever requested.
> >   * GOMP_offload_register (but not the new GOMP_offload_register_ver)
> >     changed to enable all devices.  This is to maintain compatibility
> >     with old executables and shared libraries built without the -foffload
> >     constructor support.

Any reason not to pass the bitmask of the enabled targets to
GOMP_offload_register_ver instead, to decrease the amount of ctors and
the times you lock the various locks during initialization, or just enable
automatically the devices you load data for during GOMP_offload_register_ver?
I mean, GOMP_offload_register would enable for compatibility all devices,
GOMP_offload_register_ver would enable the device it is registered for.
For -foffload=disable on all shared libraries/binaries, naturally you would
not register anything, thus would not enable any devices (only host fallback
would work).

Or are you worried about the case where one shared library is compiled
with say -foffload=intelmic,ptx but doesn't actually contain any
#pragma omp target/#pragma omp declare target (or OpenACC similar
#directives), but only contains #pragma omp target data and/or the device
query/copying routines, then dlopens some other shared library that actually
has the offloading device code?
That could be solved by adding the call you are talking about, but
if we really should care about that unlikely case, it would be better to
only arrange for it if really needed by the shared library (i.e. if it calls
one of the OpenMP or OpenACC library routines that talk to the devices, or
has #pragma omp target data or similar constructs;
I'd strongly prefer not to have constructors in code that just got compiled
with -fopenmp, even in configuration where some offloading is configured by
default, when nothing in the code really cares about offloading.

> --- a/gcc/gcc.c
> +++ b/gcc/gcc.c
> @@ -401,6 +401,8 @@ static const char *compare_debug_auxbase_opt_spec_function (int, const char **);
>  static const char *pass_through_libs_spec_func (int, const char **);
>  static const char *replace_extension_spec_func (int, const char **);
>  static const char *greater_than_spec_func (int, const char **);
> +static const char *add_omp_infile_spec_func (int, const char **);
> +
>  static char *convert_white_space (char *);
>  \f
>  /* The Specs Language

I'd like to defer review of the driver bits, can Joseph or Bernd please have
a look at those?

> diff --git a/libgomp/libgomp-plugin.h b/libgomp/libgomp-plugin.h
> index 24fbb94..5da4fa7 100644
> --- a/libgomp/libgomp-plugin.h
> +++ b/libgomp/libgomp-plugin.h
> @@ -48,7 +48,8 @@ enum offload_target_type
>    OFFLOAD_TARGET_TYPE_HOST = 2,
>    /* OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3 removed.  */
>    OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5,
> -  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
> +  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6,
> +  OFFLOAD_TARGET_TYPE_HWM

What is HWM?  Is that OFFLOAD_TARGET_TYPE_LAST what you mean?

> diff --git a/libgomp/target.c b/libgomp/target.c
> index b767410..df51bfb 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -72,6 +72,9 @@ static int num_offload_images;
>  /* Array of descriptors for all available devices.  */
>  static struct gomp_device_descr *devices;
>  
> +/* Set of enabled devices.  */
> +static bool devices_enabled[OFFLOAD_TARGET_TYPE_HWM];

I must say I don't like the locking for this.
If all you ever change on this is that you change it from 0 to 1,
then supposedly just storing it with __atomic_store, perhaps with
rel semantics, and reading it as __atomic_load, with acquire semantics,
would be good enough?  And perhaps change it into int array,
so that it is actually atomic even on the old Alphas (if there are any
around).

	Jakub

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
  2015-10-20 10:03                       ` Jakub Jelinek
@ 2015-10-20 10:44                         ` Bernd Schmidt
  2015-10-20 11:18                         ` Thomas Schwinge
  1 sibling, 0 replies; 62+ messages in thread
From: Bernd Schmidt @ 2015-10-20 10:44 UTC (permalink / raw)
  To: Jakub Jelinek, Thomas Schwinge; +Cc: gcc-patches, Nathan Sidwell, Joseph Myers

On 10/20/2015 12:02 PM, Jakub Jelinek wrote:
> I'd like to defer review of the driver bits, can Joseph or Bernd please have
> a look at those?

Last time around I think I asked for some minor changes, like updated 
documentation for give_switch. Other than that, I'm ok with the patch 
iff you are happy with the overall approach.


Bernd


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
  2015-10-20 10:03                       ` Jakub Jelinek
  2015-10-20 10:44                         ` Bernd Schmidt
@ 2015-10-20 11:18                         ` Thomas Schwinge
  2015-10-20 11:49                           ` Bernd Schmidt
  2015-10-20 11:52                           ` Jakub Jelinek
  1 sibling, 2 replies; 62+ messages in thread
From: Thomas Schwinge @ 2015-10-20 11:18 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Bernd Schmidt, Nathan Sidwell, Joseph Myers

[-- Attachment #1: Type: text/plain, Size: 7855 bytes --]

Hi Jakub!

Thanks for the review.

On Tue, 20 Oct 2015 12:02:45 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> On Mon, Oct 19, 2015 at 06:44:40PM +0200, Thomas Schwinge wrote:
> > > How's the following (complete patch instead of incremental patch; the
> > > driver changes are still the same as before)?  The changes are:
> > > 
> > >   * libgomp/target.c:gomp_target_init again loads all the plugins.
> > >   * libgomp/target.c:resolve_device and
> > >     libgomp/oacc-init.c:resolve_device verify that a default device
> > >     (OpenMP device-var ICV, and acc_device_default, respectively) is
> > >     actually enabled, or resort to host fallback if not.
> > >   * GOMP_set_offload_targets renamed to GOMP_enable_offload_targets; used
> > >     to enable devices specified by -foffload.  Can be called multiple
> > >     times (executable, any shared libraries); the set of enabled devices
> > >     is the union of all those ever requested.
> > >   * GOMP_offload_register (but not the new GOMP_offload_register_ver)
> > >     changed to enable all devices.  This is to maintain compatibility
> > >     with old executables and shared libraries built without the -foffload
> > >     constructor support.
> 
> Any reason not to pass the bitmask of the enabled targets to
> GOMP_offload_register_ver instead, to decrease the amount of ctors and
> the times you lock the various locks during initialization, or just enable
> automatically the devices you load data for during GOMP_offload_register_ver?
> I mean, GOMP_offload_register would enable for compatibility all devices,
> GOMP_offload_register_ver would enable the device it is registered for.
> For -foffload=disable on all shared libraries/binaries, naturally you would
> not register anything, thus would not enable any devices (only host fallback
> would work).

As explained a few times already: GOMP_offload_register_ver constructors
will only be generated if there actually are offloaded code regions, but
for example:

    #include <openacc.h>
    int main()
    {
      __builtin_printf("%d\n", acc_get_num_devices(acc_device_nvidia));
      return 0;
    }

... is a valid OpenACC program (untested), which doesn't contain any
offloaded code regions.  As a user I'd expect it to return different
answers if compiled with -foffload=nvptx-none in contrast to
-foffload=disable.  Actually, I can foresee exactly such code to be used
to probe for offloading being available, for example in testsuites.  And,
I guess we agree that under -foffload=disable we'd like the
compilation/runtime system to be configured in a way that no offloading
will happen?

Always creating (dummy) GOMP_offload_register_ver constructors has been
another suggestion that I had voiced much earlier in this thread (months
ago), but everyone (including me) taking part in the discussion agreed
that it'd cause even higher compile-time overhead.

> Or are you worried about the case where one shared library is compiled
> with say -foffload=intelmic,ptx but doesn't actually contain any
> #pragma omp target/#pragma omp declare target (or OpenACC similar
> #directives), but only contains #pragma omp target data and/or the device
> query/copying routines, then dlopens some other shared library that actually
> has the offloading device code?

That's another example, yes.

> That could be solved by adding the call you are talking about, but
> if we really should care about that unlikely case, it would be better to
> only arrange for it if really needed by the shared library (i.e. if it calls
> one of the OpenMP or OpenACC library routines that talk to the devices, or
> has #pragma omp target data or similar constructs;
> I'd strongly prefer not to have constructors in code that just got compiled
> with -fopenmp, even in configuration where some offloading is configured by
> default, when nothing in the code really cares about offloading.

So, how to resolve our different opinions?  I mean, for any serious
program code, there will be constructor calls into libgomp already; are
you expecting that adding one more really will cause any noticeable
overhead?

I agree that enabling devices for GOMP_offload_register_ver calls makes
sense.  (I indeed had considered this earlier, but it didn't lead to
solving the problem complete -- see above.)  Can we come up with a scheme
to do it this way, and only generate the GOMP_enable_offload_targets
constructor of no GOMP_offload_register_ver constructors have been
generated?  But I have no idea how to implement that in a non-convoluted
way.  (And, it sounds excessive to me in terms of implementation overhead
on our side, in contrast to execution overhead of one libgomp constructor
call.)

> > --- a/gcc/gcc.c
> > +++ b/gcc/gcc.c
> > @@ -401,6 +401,8 @@ static const char *compare_debug_auxbase_opt_spec_function (int, const char **);
> >  static const char *pass_through_libs_spec_func (int, const char **);
> >  static const char *replace_extension_spec_func (int, const char **);
> >  static const char *greater_than_spec_func (int, const char **);
> > +static const char *add_omp_infile_spec_func (int, const char **);
> > +
> >  static char *convert_white_space (char *);
> >  \f
> >  /* The Specs Language
> 
> I'd like to defer review of the driver bits, can Joseph or Bernd please have
> a look at those?

Joseph has already been working on this code, completing my earlier WIP
patch while I've been out of office, and has submitted it for trunk
inclusion, so I'm assuming these changes do have his blessing.

> > --- a/libgomp/libgomp-plugin.h
> > +++ b/libgomp/libgomp-plugin.h
> > @@ -48,7 +48,8 @@ enum offload_target_type
> >    OFFLOAD_TARGET_TYPE_HOST = 2,
> >    /* OFFLOAD_TARGET_TYPE_HOST_NONSHM = 3 removed.  */
> >    OFFLOAD_TARGET_TYPE_NVIDIA_PTX = 5,
> > -  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6
> > +  OFFLOAD_TARGET_TYPE_INTEL_MIC = 6,
> > +  OFFLOAD_TARGET_TYPE_HWM
> 
> What is HWM?  Is that OFFLOAD_TARGET_TYPE_LAST what you mean?

Nathan has used this term before (libgomp/openacc.h:acc_device_t), and he
told me this means "High Water Mark".  I have no strong opinion on the
name to use, just want to mention that "*_LAST" sounds to me like that
one still is part of the accepted set, whereas in this case it'd be the
first enumerator outside of the accepted ones.  (And I guess, we agree
that "OFFLOAD_TARGET_TYPE_INTEL_LAST = 6" followed by
"OFFLOAD_TARGET_TYPE_INTEL_MIC = OFFLOAD_TARGET_TYPE_INTEL_LAST" is
ugly?)

> > --- a/libgomp/target.c
> > +++ b/libgomp/target.c
> > @@ -72,6 +72,9 @@ static int num_offload_images;
> >  /* Array of descriptors for all available devices.  */
> >  static struct gomp_device_descr *devices;
> >  
> > +/* Set of enabled devices.  */
> > +static bool devices_enabled[OFFLOAD_TARGET_TYPE_HWM];
> 
> I must say I don't like the locking for this.

Are you worried about the performance issues of a very short locking
cycle that in the majority of all cases should happen without blocking,
in comparison to performance issues related to host/device memory
transfers or kernel launches that will follow after the call to
gomp_offload_target_enabled_p?  I don't really think that is reasonable
to worry about.

> If all you ever change on this is that you change it from 0 to 1,
> then supposedly just storing it with __atomic_store, perhaps with
> rel semantics, and reading it as __atomic_load, with acquire semantics,
> would be good enough?  And perhaps change it into int array,
> so that it is actually atomic even on the old Alphas (if there are any
> around).

If you're really worried about this, I can look into that, but to me that
sounds like unwarranted code complexity/premature optimization...


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
  2015-10-20 11:18                         ` Thomas Schwinge
@ 2015-10-20 11:49                           ` Bernd Schmidt
  2015-10-20 12:13                             ` Jakub Jelinek
  2015-10-20 11:52                           ` Jakub Jelinek
  1 sibling, 1 reply; 62+ messages in thread
From: Bernd Schmidt @ 2015-10-20 11:49 UTC (permalink / raw)
  To: Thomas Schwinge, Jakub Jelinek; +Cc: gcc-patches, Nathan Sidwell, Joseph Myers

On 10/20/2015 01:17 PM, Thomas Schwinge wrote:
>
> As explained a few times already: GOMP_offload_register_ver constructors
> will only be generated if there actually are offloaded code regions, but
> for example:
>
>      #include <openacc.h>
>      int main()
>      {
>        __builtin_printf("%d\n", acc_get_num_devices(acc_device_nvidia));
>        return 0;
>      }
>
> ... is a valid OpenACC program (untested), which doesn't contain any
> offloaded code regions.  As a user I'd expect it to return different
> answers if compiled with -foffload=nvptx-none in contrast to
> -foffload=disable.  Actually, I can foresee exactly such code to be used
> to probe for offloading being available, for example in testsuites.  And,
> I guess we agree that under -foffload=disable we'd like the
> compilation/runtime system to be configured in a way that no offloading
> will happen?

Both of you can ignore me if you feel I'm not making sense, but what 
exactly is the use case for -foffload=disable? Isn't it slightly 
redundant with -fno-openacc? IMO it's not an option that alters the 
available devices, that's a question that is answered at run-time and 
doesn't (or shouldn't) really depend on compiler switches. As a user I'd 
expect -foffload=disable to just prevent generation of offloaded code 
for the things I'm compiling. As Jakub pointed out, shared libraries may 
still contain other pieces that are offloadable.

I guess I don't fully understand why you want to go to great lengths to 
disable devices at run-time based on a compile-time switch. What's the 
reasoning here?

> Nathan has used this term before (libgomp/openacc.h:acc_device_t), and he
> told me this means "High Water Mark".  I have no strong opinion on the
> name to use, just want to mention that "*_LAST" sounds to me like that
> one still is part of the accepted set, whereas in this case it'd be the
> first enumerator outside of the accepted ones.  (And I guess, we agree
> that "OFFLOAD_TARGET_TYPE_INTEL_LAST = 6" followed by
> "OFFLOAD_TARGET_TYPE_INTEL_MIC = OFFLOAD_TARGET_TYPE_INTEL_LAST" is
> ugly?)

Nah, just rename HWM to LAST, that's fairly common usage I think.


Bernd

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
  2015-10-20 11:18                         ` Thomas Schwinge
  2015-10-20 11:49                           ` Bernd Schmidt
@ 2015-10-20 11:52                           ` Jakub Jelinek
  1 sibling, 0 replies; 62+ messages in thread
From: Jakub Jelinek @ 2015-10-20 11:52 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches, Bernd Schmidt, Nathan Sidwell, Joseph Myers

On Tue, Oct 20, 2015 at 01:17:45PM +0200, Thomas Schwinge wrote:
> Always creating (dummy) GOMP_offload_register_ver constructors has been
> another suggestion that I had voiced much earlier in this thread (months
> ago), but everyone (including me) taking part in the discussion agreed
> that it'd cause even higher compile-time overhead.

I'd prefer to just set a flag like "force creation of the GOMP offloading
sections" whenever you see one of the APIs or constructs used in the TU,
and if that flag is set, even when there are no offloaded vars or
functions/kernels, force creation of the corresponding data sections.
Either it can be stardard offloading LTO sections, just not containing
anything, or, if you want to improve compile-time, it could be special too,
so that the linker plugin can quickly identify those that only need
offloading support, but don't have any offloaded vars or code.
But that can certainly be done as an incremental optimization.

For OpenMP that would be whenever
#pragma omp target{, data, enter data, exit data} construct is seen
(e.g. during gimplification or OMP region nesting checking even better),
or for

omp_set_default_device
omp_get_default_device
omp_get_num_devices
omp_is_initial_device
omp_get_initial_device
omp_target_alloc
omp_target_free
omp_target_is_present
omp_target_memcpy
omp_target_memcpy_rect
omp_target_associate_ptr
omp_target_disassociate_ptr

calls.  Guess for OpenACC you have similar set of calls.
The thing is, while OpenACC is standard is pretty much solely about offloading,
OpenMP is not, and in many cases programs just use host OpenMP parallelization
(at least right now, I bet such programs are significantly larger set
than programs that use OpenACC or OpenMP offloading together).
Distributions and others will eventually configure the compilers they are
shipping to enable the offloading, and if that forces a constructor to every
TU or even every shared library just because it has been compiled with
-fopenmp, it is unacceptable overhead.

For the vendor shipped binary compilers, I'm envisioning ideal would be to
be able to configure gcc for many offloading targets, then build such main
compiler and offloading target compilers, but package them separately (one
package (or set of packages) the base compiler, and then another package (or
set of them) for each offloading target.  What the -foffload= actually will
be in the end from the linked shared library or binary POV would depend both
on the configured offloading target, but also on whether the mkoffload
binaries are found (or whatever else is needed first from the offloading
target).  That would mean that we'd not issue hard error or any kind of
diagnostics if mkoffload is missing.  Is that acceptable, or should that
e.g. be limited just to the compiled in configure default (i.e. explicit
-foffload= would error if the requested mkoffload is missing, default
-foffload= would silently skip unavailable ones; I guess this would be my
preference), or should we have two ways of configuring the offloading
targets, as hard requirements and as optional support?

> So, how to resolve our different opinions?  I mean, for any serious
> program code, there will be constructor calls into libgomp already; are
> you expecting that adding one more really will cause any noticeable
> overhead?

See above, that is really not the case.  Most of OpenMP code doesn't have
any constructor calls into libgomp at all, the only exception is
GOMP_offload_register{,_ver} at this point.

> > What is HWM?  Is that OFFLOAD_TARGET_TYPE_LAST what you mean?
> 
> Nathan has used this term before (libgomp/openacc.h:acc_device_t), and he
> told me this means "High Water Mark".  I have no strong opinion on the
> name to use, just want to mention that "*_LAST" sounds to me like that
> one still is part of the accepted set, whereas in this case it'd be the
> first enumerator outside of the accepted ones.  (And I guess, we agree
> that "OFFLOAD_TARGET_TYPE_INTEL_LAST = 6" followed by
> "OFFLOAD_TARGET_TYPE_INTEL_MIC = OFFLOAD_TARGET_TYPE_INTEL_LAST" is
> ugly?)

*_LAST or *_last is actually what we use pretty much everywhere, see e.g.
lots of places in tree-core.h.

> Are you worried about the performance issues of a very short locking
> cycle that in the majority of all cases should happen without blocking,
> in comparison to performance issues related to host/device memory
> transfers or kernel launches that will follow after the call to
> gomp_offload_target_enabled_p?  I don't really think that is reasonable
> to worry about.

Yes, I'm worried about that.  The lock could be contended, and if you take
the lock many times for each construct, it can show up, I'm worried about
cache effects etc.  It is already bad enough that we take/release the locks
for the same device e.g. in each of:
  void *fn_addr = gomp_get_target_fn_addr (devicep, fn);

  struct target_mem_desc *tgt_vars
    = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
                     GOMP_MAP_VARS_TARGET);

	Jakub

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
  2015-10-20 11:49                           ` Bernd Schmidt
@ 2015-10-20 12:13                             ` Jakub Jelinek
  0 siblings, 0 replies; 62+ messages in thread
From: Jakub Jelinek @ 2015-10-20 12:13 UTC (permalink / raw)
  To: Bernd Schmidt; +Cc: Thomas Schwinge, gcc-patches, Nathan Sidwell, Joseph Myers

On Tue, Oct 20, 2015 at 01:45:37PM +0200, Bernd Schmidt wrote:
> Both of you can ignore me if you feel I'm not making sense, but what exactly
> is the use case for -foffload=disable? Isn't it slightly redundant with
> -fno-openacc? IMO it's not an option that alters the available devices,
> that's a question that is answered at run-time and doesn't (or shouldn't)
> really depend on compiler switches. As a user I'd expect -foffload=disable
> to just prevent generation of offloaded code for the things I'm compiling.
> As Jakub pointed out, shared libraries may still contain other pieces that
> are offloadable.
> 
> I guess I don't fully understand why you want to go to great lengths to
> disable devices at run-time based on a compile-time switch. What's the
> reasoning here?

At least for OpenMP, I'm also happy with what we do now (except for the
ability to configure offloading targets as optional, i.e. dynamically
configure the default based on what packages user install rather than
just on how it has been configured, so that e.g. just because it has been
configured for PTX offloading the host GCC itself doesn't have to have a
dependency on the proprietary CUDA stuff in any way).
I believe in OpenMP nobody says that if the device HW is available, but user
chose to not compile offloading code/variables for that particular device
that it can't show up among omp_get_num_devices ().  And I think it is
entirely fine if say target data map succeeds to that device, but then
target is offloaded, if that is caused by users configure or command line
choice.  Maybe OpenACC has different requirements, is it required to
terminate the program if it can't fulfill the requested offloading?

In any case, I'm fine with something I've noted in the last mail, or with
the status quo, but not with running constructors in TUs or even shared
libraries just because they have been compiled with -fopenmp (and either
haven't used any OpenMP code at all, or just the non-*target* directives).

	Jakub

^ permalink raw reply	[flat|nested] 62+ messages in thread

* [og7] Re: Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time)
  2015-08-20 23:38                                         ` Joseph Myers
  2015-08-21 16:13                                           ` Nathan Sidwell
  2015-08-25 15:04                                           ` Joseph Myers
@ 2018-05-20 20:30                                           ` Thomas Schwinge
  2 siblings, 0 replies; 62+ messages in thread
From: Thomas Schwinge @ 2018-05-20 20:30 UTC (permalink / raw)
  To: gcc-patches

Hi!

(This whole idea/patch still needs an overall re-work, as discussed, but
here is a small incremental improvement/bug fix.)

On Thu, 20 Aug 2015 22:52:58 +0000, Joseph Myers <joseph@codesourcery.com> wrote:
> On Tue, 18 Aug 2015, Thomas Schwinge wrote:
> > [...] here is my current messy WIP patch [...]

> +/* List of offload targets, separated by colon.  Defaults to the list
> +   determined when configuring libgomp.  */
> +static const char *gomp_offload_targets = OFFLOAD_TARGETS;
> +static bool gomp_offload_targets_init = false;
> +
> +/* Override the list of offload targets.  This must be called early, and only
> +   once.  */
> +
> +void
> +GOMP_set_offload_targets (const char *offload_targets)
> +{
> +  gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, offload_targets);
> +
> +  /* Make sure this gets called early.  */
> +  assert (gomp_is_initialized == PTHREAD_ONCE_INIT);
> +  /* Make sure this only gets called once.  */
> +  assert (!gomp_offload_targets_init);
> +  gomp_offload_targets_init = true;
> +  gomp_offload_targets = offload_targets;
> +}

This will obviously fail as soon as there are shared libraries involved,
compiled for offloading, which contain additional
GOMP_set_offload_targets constructor calls.  Thus pushed to
openacc-gcc-7-branch:

commit 917e247055a37f912129ed545719182de0046adb
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Sun May 20 21:31:01 2018 +0200

    [PR81886] Avoid "GOMP_set_offload_targets: Assertion `!gomp_offload_targets_init' failed"
    
            PR libgomp/81886
            * openacc.h (enum acc_device_t): Add _acc_device_intel_mic,
            _acc_device_hsa.
            * oacc-init.c (get_openacc_name): Handle these.
            (resolve_device): Debugging output.
            * target.c (resolve_device, gomp_init_device)
            (gomp_offload_target_available_p): Likewise.
            (GOMP_set_offload_targets): Rewrite.
            * testsuite/libgomp.oacc-c++/c++.exp: Provide offload target in
            "-DACC_DEVICE_TYPE_host", and "-DACC_DEVICE_TYPE_nvidia".
            * testsuite/libgomp.oacc-c/c.exp: Likewise.
            * testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
            * testsuite/libgomp.oacc-c/offload-targets-1.c: New file.
            * testsuite/libgomp.oacc-c/offload-targets-2.c: Likewise.
            * testsuite/libgomp.oacc-c/offload-targets-3.c: Likewise.
            * testsuite/libgomp.oacc-c/offload-targets-4.c: Likewise.
            * testsuite/libgomp.oacc-c/offload-targets-5.c: Likewise.
            * testsuite/libgomp.oacc-c/offload-targets-6.c: Likewise.
            * testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c: Adjust.
            * testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c: Likewise.
            * testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Likewise.
            * testsuite/libgomp.oacc-c-c++-common/pr85381-3.c: Likewise.
            * testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Likewise.
            * testsuite/libgomp.oacc-c-c++-common/pr85381-5.c: Likewise.
            * testsuite/libgomp.oacc-c-c++-common/pr85381.c: Likewise.
            * testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Likewise.
            * testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: Likewise.
            * testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.
            * testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Likewise.
            * testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
            * testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.
---
 libgomp/ChangeLog.openacc                          |  34 ++++
 libgomp/oacc-init.c                                |   7 +
 libgomp/openacc.h                                  |   2 +
 libgomp/target.c                                   | 178 +++++++++++++++++++--
 libgomp/testsuite/libgomp.oacc-c++/c++.exp         |   4 +-
 .../libgomp.oacc-c-c++-common/acc-on-device-2.c    |   2 +-
 .../libgomp.oacc-c-c++-common/acc_on_device-1.c    |   4 +-
 .../libgomp.oacc-c-c++-common/pr85381-2.c          |   3 +-
 .../libgomp.oacc-c-c++-common/pr85381-3.c          |   3 +-
 .../libgomp.oacc-c-c++-common/pr85381-4.c          |   3 +-
 .../libgomp.oacc-c-c++-common/pr85381-5.c          |   3 +-
 .../testsuite/libgomp.oacc-c-c++-common/pr85381.c  |   3 +-
 .../libgomp.oacc-c-c++-common/pr85486-2.c          |   3 +-
 .../libgomp.oacc-c-c++-common/pr85486-3.c          |   3 +-
 .../testsuite/libgomp.oacc-c-c++-common/pr85486.c  |   3 +-
 libgomp/testsuite/libgomp.oacc-c/c.exp             |   4 +-
 .../testsuite/libgomp.oacc-c/offload-targets-1.c   | 119 ++++++++++++++
 .../testsuite/libgomp.oacc-c/offload-targets-2.c   |   2 +
 .../testsuite/libgomp.oacc-c/offload-targets-3.c   |  10 ++
 .../testsuite/libgomp.oacc-c/offload-targets-4.c   |  11 ++
 .../testsuite/libgomp.oacc-c/offload-targets-5.c   |  10 ++
 .../testsuite/libgomp.oacc-c/offload-targets-6.c   |  11 ++
 .../libgomp.oacc-fortran/acc_on_device-1-1.f90     |   4 +-
 .../libgomp.oacc-fortran/acc_on_device-1-2.f       |   4 +-
 .../libgomp.oacc-fortran/acc_on_device-1-3.f       |   4 +-
 libgomp/testsuite/libgomp.oacc-fortran/fortran.exp |   4 +-
 26 files changed, 400 insertions(+), 38 deletions(-)

diff --git libgomp/ChangeLog.openacc libgomp/ChangeLog.openacc
index d43b259..48b1f96 100644
--- libgomp/ChangeLog.openacc
+++ libgomp/ChangeLog.openacc
@@ -1,3 +1,37 @@
+2018-05-20  Thomas Schwinge  <thomas@codesourcery.com>
+
+	PR libgomp/81886
+	* openacc.h (enum acc_device_t): Add _acc_device_intel_mic,
+	_acc_device_hsa.
+	* oacc-init.c (get_openacc_name): Handle these.
+	(resolve_device): Debugging output.
+	* target.c (resolve_device, gomp_init_device)
+	(gomp_offload_target_available_p): Likewise.
+	(GOMP_set_offload_targets): Rewrite.
+	* testsuite/libgomp.oacc-c++/c++.exp: Provide offload target in
+	"-DACC_DEVICE_TYPE_host", and "-DACC_DEVICE_TYPE_nvidia".
+	* testsuite/libgomp.oacc-c/c.exp: Likewise.
+	* testsuite/libgomp.oacc-fortran/fortran.exp: Likewise.
+	* testsuite/libgomp.oacc-c/offload-targets-1.c: New file.
+	* testsuite/libgomp.oacc-c/offload-targets-2.c: Likewise.
+	* testsuite/libgomp.oacc-c/offload-targets-3.c: Likewise.
+	* testsuite/libgomp.oacc-c/offload-targets-4.c: Likewise.
+	* testsuite/libgomp.oacc-c/offload-targets-5.c: Likewise.
+	* testsuite/libgomp.oacc-c/offload-targets-6.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c: Adjust.
+	* testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/pr85381-3.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/pr85381-5.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/pr85381.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/pr85486-3.c: Likewise.
+	* testsuite/libgomp.oacc-c-c++-common/pr85486.c: Likewise.
+	* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Likewise.
+	* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
+	* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.
+
 2018-05-18  Cesar Philippidis  <cesar@codesourcery.com>
 
 	Backport from mainline
diff --git libgomp/oacc-init.c libgomp/oacc-init.c
index d8348c0..19c2687 100644
--- libgomp/oacc-init.c
+++ libgomp/oacc-init.c
@@ -92,6 +92,8 @@ goacc_register (struct gomp_device_descr *disp)
 static const char *
 get_openacc_name (const char *name)
 {
+  /* not supported: _acc_device_intel_mic */
+  /* not supported: _acc_device_hsa */
   if (strcmp (name, "nvptx") == 0)
     return "nvidia";
   else
@@ -108,6 +110,8 @@ name_of_acc_device_t (enum acc_device_t type)
     case acc_device_host: return "host";
     case acc_device_not_host: return "not_host";
     case acc_device_nvidia: return "nvidia";
+    case /* not supported */ _acc_device_intel_mic:
+    case /* not supported */ _acc_device_hsa:
     default: gomp_fatal ("unknown device type %u", (unsigned) type);
     }
 }
@@ -119,6 +123,8 @@ name_of_acc_device_t (enum acc_device_t type)
 static struct gomp_device_descr *
 resolve_device (acc_device_t d, bool fail_is_error)
 {
+  gomp_debug (0, "%s (%d)\n", __FUNCTION__, (int) d);
+
   acc_device_t d_arg = d;
 
   switch (d)
@@ -203,6 +209,7 @@ resolve_device (acc_device_t d, bool fail_is_error)
       gomp_fatal ("device type %s not supported", name_of_acc_device_t (d));
     }
 
+  gomp_debug (0, "  %s: %d: %p\n", __FUNCTION__, (int) d, dispatchers[d]);
   return dispatchers[d];
 }
 
diff --git libgomp/openacc.h libgomp/openacc.h
index 102723a..3d6d57e 100644
--- libgomp/openacc.h
+++ libgomp/openacc.h
@@ -55,6 +55,8 @@ typedef enum acc_device_t {
   /* acc_device_host_nonshm = 3 removed.  */
   acc_device_not_host = 4,
   acc_device_nvidia = 5,
+  /* not supported */ _acc_device_intel_mic = 6,
+  /* not supported */ _acc_device_hsa = 7,
   _ACC_device_hwm,
   /* Ensure enumeration is layout compatible with int.  */
   _ACC_highest = __INT_MAX__,
diff --git libgomp/target.c libgomp/target.c
index aa27dc8..b5f86c8 100644
--- libgomp/target.c
+++ libgomp/target.c
@@ -108,6 +108,8 @@ gomp_get_num_devices (void)
 static struct gomp_device_descr *
 resolve_device (int device)
 {
+  gomp_debug (0, "%s (%d)\n", __FUNCTION__, device);
+
   int device_id;
   if (device == GOMP_DEVICE_ICV)
     {
@@ -137,6 +139,7 @@ resolve_device (int device)
       && !gomp_offload_target_available_p (devices[device_id].type))
     return NULL;
 
+  gomp_debug (0, "  %s (%d): %d\n", __FUNCTION__, device, device_id);
   return &devices[device_id];
 }
 
@@ -1883,6 +1886,9 @@ GOMP_offload_unregister (const void *host_table, int target_type,
 attribute_hidden void
 gomp_init_device (struct gomp_device_descr *devicep)
 {
+  gomp_debug (0, "%s (%s; %d; %d)\n", __FUNCTION__,
+	      devicep->name, (int) devicep->type, devicep->target_id);
+
   int i;
   if (!devicep->init_device_func (devicep->target_id))
     {
@@ -1946,6 +1952,8 @@ gomp_unload_device (struct gomp_device_descr *devicep)
 attribute_hidden bool
 gomp_offload_target_available_p (int type)
 {
+  gomp_debug (0, "%s (%d)\n", __FUNCTION__, type);
+
   bool available = false;
 
   /* Has the offload target already been initialized?  */
@@ -1987,6 +1995,7 @@ gomp_offload_target_available_p (int type)
       gomp_mutex_unlock (&register_lock);
     }
 
+  gomp_debug (0, "  %s (%d): %d\n", __FUNCTION__, type, (int) available);
   return available;
 }
 
@@ -3157,25 +3166,170 @@ offload_target_to_plugin_name (const char *offload_target)
   gomp_fatal ("Unknown offload target: %s", offload_target);
 }
 
-/* List of offload targets, separated by colon.  Defaults to the list
+/* List of requested offload targets, separated by colon.  Defaults to the list
    determined when configuring libgomp.  */
 static const char *gomp_offload_targets = OFFLOAD_TARGETS;
-static bool gomp_offload_targets_init = false;
+static bool gomp_offload_targets_set = false;
+static bool gomp_offload_targets_malloced = false;
 
-/* Override the list of offload targets.  This must be called early, and only
-   once.  */
+/* This function frees gomp_offload_targets.  */
+
+static void
+free_gomp_offload_targets (void)
+{
+  free ((char *) gomp_offload_targets);
+}
+
+/* Override the list of requested offload targets.  This must be called
+   early, before gomp_target_init.  */
 
 void
 GOMP_set_offload_targets (const char *offload_targets)
 {
-  gomp_debug (0, "%s (\"%s\")\n", __FUNCTION__, offload_targets);
-
-  /* Make sure this gets called early.  */
-  assert (gomp_is_initialized == PTHREAD_ONCE_INIT);
-  /* Make sure this only gets called once.  */
-  assert (!gomp_offload_targets_init);
-  gomp_offload_targets_init = true;
-  gomp_offload_targets = offload_targets;
+  gomp_debug (0, "%s (\"%s\"): %s\n", __FUNCTION__,
+	      offload_targets, gomp_offload_targets);
+
+  /* TODO: multithreading, locking.  */
+  /* TODO: this should not (sometimes) keep a copy of the offload_target
+     pointer, so that the caller knows what to expect.  */
+  /* TODO: What actually is supposed to happen if some parts of a program are
+     compiled with, for example, "-foffload=disable" (that is, when called with
+     the empty string for offload_targets), and others for other actual
+     (possibly different) offload targets?  */
+  if (gomp_is_initialized == PTHREAD_ONCE_INIT)
+    {
+      /* If we have not yet initialized, we capture all the offload targets
+	 requested.  We do not worry that the set of requested offload targets
+	 vs. the set of available offload data will eventually match; any such
+	 inconsistencies would be user error.  (See also
+	 gomp_offload_target_available_p.)  */
+      if (!gomp_offload_targets_set)
+	gomp_offload_targets = offload_targets;
+      else if (gomp_offload_targets == offload_targets
+	       || strcmp (gomp_offload_targets, offload_targets) == 0)
+	/* Nothing to do if the same.  */;
+      else
+	{
+	  /* Merge offload_targets into gomp_offload_targets.  */
+	  /* TODO: this could be simpler if we had the data available in a
+	     different form.  */
+	  size_t gomp_offload_targets_len = strlen (gomp_offload_targets);
+	  /* Maximum length.  */
+	  size_t len = (gomp_offload_targets_len + /* ":" */ 1
+			+ strlen (offload_targets) + /* '\0' */ 1);
+	  char *gomp_offload_targets_new = gomp_malloc (len);
+	  memcpy (gomp_offload_targets_new,
+		  gomp_offload_targets, gomp_offload_targets_len);
+	  char *gomp_offload_targets_new_next
+	    = gomp_offload_targets_new + gomp_offload_targets_len;
+	  *gomp_offload_targets_new_next = '\0';
+	  const char *cur = offload_targets;
+	  while (*cur)
+	    {
+	      const char *cur_end = strchr (cur, ':');
+	      /* If no other offload target following...  */
+	      if (cur_end == NULL)
+		/* ..., point to the terminating NUL character.  */
+		cur_end = cur + strlen (cur);
+	      size_t cur_len = cur_end - cur;
+
+	      /* Do we already have this one listed?  */
+	      const char *haystack = gomp_offload_targets_new;
+	      while (haystack != NULL)
+		{
+		  if (strncmp (haystack, cur, cur_len) == 0)
+		    break;
+		  else
+		    {
+		      haystack = strchr (haystack, ':');
+		      if (haystack != NULL)
+			haystack += /* ':' */ 1;
+		    }
+		}
+	      if (haystack == NULL)
+		{
+		  /* Not yet listed; add it.  */
+		  if (gomp_offload_targets_new_next != gomp_offload_targets_new)
+		    *gomp_offload_targets_new_next++ = ':';
+		  assert (gomp_offload_targets_new_next + cur_len + /* '\0' */ 1
+			  <= gomp_offload_targets_new + len);
+		  memcpy (gomp_offload_targets_new_next, cur, cur_len);
+		  gomp_offload_targets_new_next += cur_len;
+		  *gomp_offload_targets_new_next = '\0';
+		}
+
+	      if (*cur_end == '\0')
+		break;
+	      cur = cur_end + /* : */ 1;
+	    }
+
+	  if (gomp_offload_targets_malloced)
+	    free ((char *) gomp_offload_targets);
+	  else
+	    {
+	      if (atexit (free_gomp_offload_targets) != 0)
+		gomp_fatal ("atexit failed");
+	    }
+
+	  gomp_offload_targets = gomp_offload_targets_new;
+	  gomp_offload_targets_malloced = true;
+	}
+    }
+  else
+    {
+      /* If we have already initialized (which can happen only if a shared
+	 library with another GOMP_set_offload_targets constructor call gets
+	 loaded dynamically), and the user is now requesting offload targets
+	 that were not requested previously, then we're out of luck: we can't
+	 load new plugins now.  Otherwise, we're all set.  */
+      if (gomp_offload_targets == offload_targets
+	  || strcmp (gomp_offload_targets, offload_targets) == 0)
+	/* All fine if the same.  */;
+      else
+	{
+	  /* Check offload_targets against gomp_offload_targets.  */
+	  /* TODO: this could be simpler if we had the data available in a
+	     different form.  */
+	  const char *cur = offload_targets;
+	  while (*cur)
+	    {
+	      const char *cur_end = strchr (cur, ':');
+	      /* If no other offload target following...  */
+	      if (cur_end == NULL)
+		/* ..., point to the terminating NUL character.  */
+		cur_end = cur + strlen (cur);
+	      size_t cur_len = cur_end - cur;
+
+	      /* Do we have this one listed?  */
+	      const char *haystack = gomp_offload_targets;
+	      while (haystack != NULL)
+		{
+		  if (strncmp (haystack, cur, cur_len) == 0)
+		    break;
+		  else
+		    {
+		      haystack = strchr (haystack, ':');
+		      if (haystack != NULL)
+			haystack += /* ':' */ 1;
+		    }
+		}
+	      if (haystack == NULL)
+		{
+		  /* Not listed.  */
+		  gomp_fatal ("Can't satisfy request for offload targets: %s; have loaded: %s",
+			      offload_targets, gomp_offload_targets);
+		}
+
+	      if (*cur_end == '\0')
+		break;
+	      cur = cur_end + /* : */ 1;
+	    }
+	}
+    }
+  gomp_offload_targets_set = true;
+
+  gomp_debug (0, "  %s (\"%s\"): %s\n", __FUNCTION__,
+	      offload_targets, gomp_offload_targets);
 }
 
 /* This function initializes the runtime needed for offloading.
diff --git libgomp/testsuite/libgomp.oacc-c++/c++.exp libgomp/testsuite/libgomp.oacc-c++/c++.exp
index 695b96d..2e17504 100644
--- libgomp/testsuite/libgomp.oacc-c++/c++.exp
+++ libgomp/testsuite/libgomp.oacc-c++/c++.exp
@@ -86,7 +86,7 @@ if { $lang_test_file_found } {
 	switch -glob $offload_target_openacc {
 	    disable {
 		set acc_mem_shared 1
-		set tagopt "-DACC_DEVICE_TYPE_host=1"
+		set tagopt "-DACC_DEVICE_TYPE_host=\"\""
 	    }
 	    nvptx* {
 		if { ![check_effective_target_openacc_nvidia_accel_present] } {
@@ -102,7 +102,7 @@ if { $lang_test_file_found } {
 		lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/libgomp.oacc-c-c++-common"
 
 		set acc_mem_shared 0
-		set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
+		set tagopt "-DACC_DEVICE_TYPE_nvidia=\"$offload_target_openacc\""
 	    }
 	    default {
 		set acc_mem_shared 0
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c
index bfcb67d..758b1fc 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/acc-on-device-2.c
@@ -14,7 +14,7 @@ int main ()
 
   int expect = 1;
   
-#if  ACC_DEVICE_TYPE_host
+#ifdef ACC_DEVICE_TYPE_host
   expect = 0;
 #endif
   
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c
index 8112745..0270d06 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c
@@ -37,7 +37,7 @@ main (int argc, char *argv[])
   }
 
 
-#if !ACC_DEVICE_TYPE_host
+#ifndef ACC_DEVICE_TYPE_host
 
   /* Offloaded.  */
 
@@ -49,7 +49,7 @@ main (int argc, char *argv[])
       abort ();
     if (!acc_on_device (acc_device_not_host))
       abort ();
-#if ACC_DEVICE_TYPE_nvidia
+#ifdef ACC_DEVICE_TYPE_nvidia
     if (!acc_on_device (acc_device_nvidia))
       abort ();
 #else
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c
index e5d02cf..6570c64 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c
@@ -1,5 +1,6 @@
 /* { dg-additional-options "-save-temps" } */
-/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_DEVICE_TYPE_nvidia=1 -O2" } } */
+/* { dg-do run { target openacc_nvidia_accel_selected } }
+   { dg-skip-if "" { *-*-* } { "*" } { "-O2" } } */
 
 int
 main (void)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-3.c
index 7d9ba1b..c5d1c5a 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-3.c
@@ -1,5 +1,6 @@
 /* { dg-additional-options "-save-temps -w" } */
-/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_DEVICE_TYPE_nvidia=1 -O2" } } */
+/* { dg-do run { target openacc_nvidia_accel_selected } }
+   { dg-skip-if "" { *-*-* } { "*" } { "-O2" } } */
 
 int a;
 #pragma acc declare create(a)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c
index 477297d..d955d79 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c
@@ -1,5 +1,6 @@
 /* { dg-additional-options "-save-temps -w" } */
-/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_DEVICE_TYPE_nvidia=1 -O2" } } */
+/* { dg-do run { target openacc_nvidia_accel_selected } }
+   { dg-skip-if "" { *-*-* } { "*" } { "-O2" } } */
 
 #define n 1024
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-5.c libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-5.c
index 4653009..61e7e48 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-5.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-5.c
@@ -1,5 +1,6 @@
 /* { dg-additional-options "-save-temps" } */
-/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_DEVICE_TYPE_nvidia=1 -O2" } } */
+/* { dg-do run { target openacc_nvidia_accel_selected } }
+   { dg-skip-if "" { *-*-* } { "*" } { "-O2" } } */
 
 #define n 1024
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381.c libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381.c
index f585ae5..2864dfc 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381.c
@@ -1,5 +1,6 @@
 /* { dg-additional-options "-save-temps" } */
-/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_DEVICE_TYPE_nvidia=1 -O2" } } */
+/* { dg-do run { target openacc_nvidia_accel_selected } }
+   { dg-skip-if "" { *-*-* } { "*" } { "-O2" } } */
 
 int
 main (void)
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
index a92b5dd..0f74921 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
@@ -1,5 +1,4 @@
-/* { dg-do run } */
-/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_DEVICE_TYPE_nvidia=1" } } */
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
 /* { dg-additional-options "-fopenacc-dim=-:-:128" } */
 
 /* Minimized from ref-1.C.  */
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c
index ae62206..b4ef878 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-3.c
@@ -1,5 +1,4 @@
-/* { dg-do run } */
-/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_DEVICE_TYPE_nvidia=1" } } */
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
 /* { dg-additional-options "-fopenacc-dim=-:-:-" } */
 /* { dg-set-target-env-var "GOMP_OPENACC_DIM" "-:-:128" } */
 
diff --git libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c
index f91dee0..99c0805 100644
--- libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486.c
@@ -1,5 +1,4 @@
-/* { dg-do run } */
-/* { dg-skip-if "" { *-*-* } { "*" } { "-DACC_DEVICE_TYPE_nvidia=1" } } */
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
 
 /* Minimized from ref-1.C.  */
 
diff --git libgomp/testsuite/libgomp.oacc-c/c.exp libgomp/testsuite/libgomp.oacc-c/c.exp
index 16f8295..73a7a5a 100644
--- libgomp/testsuite/libgomp.oacc-c/c.exp
+++ libgomp/testsuite/libgomp.oacc-c/c.exp
@@ -47,7 +47,7 @@ foreach offload_target_openacc $offload_targets_s_openacc {
     switch -glob $offload_target_openacc {
 	disable {
 	    set acc_mem_shared 1
-	    set tagopt "-DACC_DEVICE_TYPE_host=1"
+	    set tagopt "-DACC_DEVICE_TYPE_host=\"\""
 	}
 	nvptx* {
 	    if { ![check_effective_target_openacc_nvidia_accel_present] } {
@@ -63,7 +63,7 @@ foreach offload_target_openacc $offload_targets_s_openacc {
 	    lappend ALWAYS_CFLAGS "additional_flags=-I${srcdir}/libgomp.oacc-c-c++-common"
 
 	    set acc_mem_shared 0
-	    set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
+	    set tagopt "-DACC_DEVICE_TYPE_nvidia=\"$offload_target_openacc\""
 	}
 	default {
 	    set acc_mem_shared 0
diff --git libgomp/testsuite/libgomp.oacc-c/offload-targets-1.c libgomp/testsuite/libgomp.oacc-c/offload-targets-1.c
new file mode 100644
index 0000000..b62a587
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c/offload-targets-1.c
@@ -0,0 +1,119 @@
+/* Test what happens for repeated GOMP_set_offload_targets calls, which happens
+   when shared libraries are involved, for example.  As in the libgomp
+   testsuite infrastructure, it is difficult to build and link against shared
+   libraries, we simulate that by replicating some relevant
+   GOMP_set_offload_targets calls.  */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <openacc.h>
+#include "libgomp_g.h"
+
+int main ()
+{
+  /* Before getting here, GOMP_set_offload_targets already got called via a
+     constructor.  */
+
+  bool acc_device_types_requested[_ACC_device_hwm];
+  for (int i = 0; i < _ACC_device_hwm; ++i)
+    acc_device_types_requested[i] = false;
+
+  /* We're building for only one offload target ("-foffload=[...]") which is
+     the following.  */
+  const char *offload_target_requested;
+  acc_device_t acc_device_type_requested;
+#if defined ACC_DEVICE_TYPE_nvidia
+  offload_target_requested = ACC_DEVICE_TYPE_nvidia;
+  acc_device_type_requested = acc_device_nvidia;
+#elif defined ACC_DEVICE_TYPE_host
+  offload_target_requested = ACC_DEVICE_TYPE_host;
+  acc_device_type_requested = acc_device_host;
+#else
+# error Not ported to this ACC_DEVICE_TYPE
+#endif
+  acc_device_types_requested[acc_device_type_requested] = true;
+
+#ifdef OFFLOAD_TARGETS_SAME_AGAIN
+  /* Call again; will have no noticeable difference.  */
+  GOMP_set_offload_targets (offload_target_requested);
+#endif
+
+#ifdef OFFLOAD_TARGETS_ADD_EARLY
+  /* Request a (non-existing) offloading target (which will result in a
+     non-fatal diagnostic).  */
+  GOMP_set_offload_targets (OFFLOAD_TARGETS_ADD);
+#endif
+
+#ifdef OFFLOAD_TARGETS_SAME_AGAIN
+  /* Call again; will have no noticeable difference.  */
+  GOMP_set_offload_targets (offload_target_requested);
+  char *s;
+  {
+    size_t len = 3 * (strlen (offload_target_requested) + 1);
+# ifdef OFFLOAD_TARGETS_ADD_EARLY
+    len += 3 * (strlen (OFFLOAD_TARGETS_ADD) + 1);
+# endif
+    s = malloc (len);
+    if (s == NULL)
+      __builtin_abort ();
+    size_t len_;
+# ifndef OFFLOAD_TARGETS_ADD_EARLY
+    len_ = sprintf (s, "%s:%s:%s",
+		    offload_target_requested,
+		    offload_target_requested,
+		    offload_target_requested);
+# else
+    len_ = sprintf (s, "%s:%s:%s:%s:%s:%s",
+		    offload_target_requested,
+		    offload_target_requested,
+		    OFFLOAD_TARGETS_ADD,
+		    OFFLOAD_TARGETS_ADD,
+		    offload_target_requested,
+		    OFFLOAD_TARGETS_ADD);
+# endif
+    if (len_ + 1 != len)
+      __builtin_abort ();
+    GOMP_set_offload_targets (s);
+  }
+#endif
+
+  /* Calling acc_get_num_devices will implicitly initialize offloading.  */
+#if defined OFFLOAD_TARGETS_ADD_EARLY
+  fprintf (stderr, "CheCKpOInT1\n");
+#endif
+  /* acc_device_host is always available.  */
+  if ((acc_get_num_devices (acc_device_host) > 0) == false)
+    __builtin_abort ();
+#if defined OFFLOAD_TARGETS_ADD_EARLY
+  fprintf (stderr, "WrONg WAy1\n");
+#endif
+  for (acc_device_t acc_device_type = acc_device_not_host + 1;
+       acc_device_type < _ACC_device_hwm;
+       ++acc_device_type)
+    {
+      /* The requested device type must be available.  Any other device types
+	 must not be available.  */
+      if ((acc_get_num_devices (acc_device_type) > 0)
+	  != acc_device_types_requested[acc_device_type])
+	__builtin_abort ();
+    }
+
+#ifdef OFFLOAD_TARGETS_SAME_AGAIN
+  /* Request the same again; will have no noticeable difference.  */
+  GOMP_set_offload_targets (offload_target_requested);
+#endif
+#if defined OFFLOAD_TARGETS_ADD_LATE
+  fprintf (stderr, "CheCKpOInT2\n");
+  GOMP_set_offload_targets (OFFLOAD_TARGETS_ADD);
+  fprintf (stderr, "WrONg WAy2\n");
+#endif
+#ifdef OFFLOAD_TARGETS_SAME_AGAIN
+  GOMP_set_offload_targets (s);
+
+  /* Implementation defail: OK to "free (s)", in this case.  */
+  free (s);
+#endif
+
+  return 0;
+}
diff --git libgomp/testsuite/libgomp.oacc-c/offload-targets-2.c libgomp/testsuite/libgomp.oacc-c/offload-targets-2.c
new file mode 100644
index 0000000..977c559
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c/offload-targets-2.c
@@ -0,0 +1,2 @@
+#define OFFLOAD_TARGETS_SAME_AGAIN
+#include "offload-targets-1.c"
diff --git libgomp/testsuite/libgomp.oacc-c/offload-targets-3.c libgomp/testsuite/libgomp.oacc-c/offload-targets-3.c
new file mode 100644
index 0000000..1eb080b
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c/offload-targets-3.c
@@ -0,0 +1,10 @@
+#define OFFLOAD_TARGETS_ADD "XYZ"
+#define OFFLOAD_TARGETS_ADD_EARLY
+#include "offload-targets-1.c"
+
+/*
+  { dg-output "CheCKpOInT1(\n|\r\n|\r)+" }
+  { dg-output "libgomp: Unknown offload target: XYZ(\n|\r\n|\r)+" }
+  { dg-output "$" }
+  { dg-shouldfail ""  }
+*/
diff --git libgomp/testsuite/libgomp.oacc-c/offload-targets-4.c libgomp/testsuite/libgomp.oacc-c/offload-targets-4.c
new file mode 100644
index 0000000..2bb7204
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c/offload-targets-4.c
@@ -0,0 +1,11 @@
+#define OFFLOAD_TARGETS_SAME_AGAIN
+#define OFFLOAD_TARGETS_ADD "XYZ"
+#define OFFLOAD_TARGETS_ADD_EARLY
+#include "offload-targets-1.c"
+
+/*
+  { dg-output "CheCKpOInT1(\n|\r\n|\r)+" }
+  { dg-output "libgomp: Unknown offload target: XYZ(\n|\r\n|\r)+" }
+  { dg-output "$" }
+  { dg-shouldfail ""  }
+*/
diff --git libgomp/testsuite/libgomp.oacc-c/offload-targets-5.c libgomp/testsuite/libgomp.oacc-c/offload-targets-5.c
new file mode 100644
index 0000000..8ba0792
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c/offload-targets-5.c
@@ -0,0 +1,10 @@
+#define OFFLOAD_TARGETS_ADD "XYZ"
+#define OFFLOAD_TARGETS_ADD_LATE
+#include "offload-targets-1.c"
+
+/*
+  { dg-output "CheCKpOInT2(\n|\r\n|\r)+" }
+  { dg-output "libgomp: Can't satisfy request for offload targets: XYZ; have loaded: \[a-z-\]*(\n|\r\n|\r)+" }
+  { dg-output "$" }
+  { dg-shouldfail ""  }
+*/
diff --git libgomp/testsuite/libgomp.oacc-c/offload-targets-6.c libgomp/testsuite/libgomp.oacc-c/offload-targets-6.c
new file mode 100644
index 0000000..4b15582
--- /dev/null
+++ libgomp/testsuite/libgomp.oacc-c/offload-targets-6.c
@@ -0,0 +1,11 @@
+#define OFFLOAD_TARGETS_SAME_AGAIN
+#define OFFLOAD_TARGETS_ADD "XYZ"
+#define OFFLOAD_TARGETS_ADD_LATE
+#include "offload-targets-1.c"
+
+/*
+  { dg-output "CheCKpOInT2(\n|\r\n|\r)+" }
+  { dg-output "libgomp: Can't satisfy request for offload targets: XYZ; have loaded: \[a-z-\]*(\n|\r\n|\r)+" }
+  { dg-output "$" }
+  { dg-shouldfail ""  }
+*/
diff --git libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90 libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
index 1a10f32..f57a2f2 100644
--- libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
+++ libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
@@ -25,7 +25,7 @@ if (acc_on_device (acc_device_nvidia)) call abort
 !$acc end parallel
 
 
-#if !ACC_DEVICE_TYPE_host
+#ifndef ACC_DEVICE_TYPE_host
 
 ! Offloaded.
 
@@ -33,7 +33,7 @@ if (acc_on_device (acc_device_nvidia)) call abort
 if (acc_on_device (acc_device_none)) call abort
 if (acc_on_device (acc_device_host)) call abort
 if (.not. acc_on_device (acc_device_not_host)) call abort
-#if ACC_DEVICE_TYPE_nvidia
+#ifdef ACC_DEVICE_TYPE_nvidia
 if (.not. acc_on_device (acc_device_nvidia)) call abort
 #else
 if (acc_on_device (acc_device_nvidia)) call abort
diff --git libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
index cbd1dd9..6209d12 100644
--- libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
+++ libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
@@ -26,7 +26,7 @@
 !$ACC END PARALLEL
 
 
-#if !ACC_DEVICE_TYPE_host
+#ifndef ACC_DEVICE_TYPE_host
 
 ! Offloaded.
 
@@ -34,7 +34,7 @@
       IF (ACC_ON_DEVICE (ACC_DEVICE_NONE)) CALL ABORT
       IF (ACC_ON_DEVICE (ACC_DEVICE_HOST)) CALL ABORT
       IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NOT_HOST)) CALL ABORT
-#if ACC_DEVICE_TYPE_nvidia
+#ifdef ACC_DEVICE_TYPE_nvidia
       IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NVIDIA)) CALL ABORT
 #else
       IF (ACC_ON_DEVICE (ACC_DEVICE_NVIDIA)) CALL ABORT
diff --git libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
index c391776..90d567f 100644
--- libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
+++ libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
@@ -25,7 +25,7 @@
 !$ACC END PARALLEL
 
 
-#if !ACC_DEVICE_TYPE_host
+#ifndef ACC_DEVICE_TYPE_host
 
 ! Offloaded.
 
@@ -33,7 +33,7 @@
       IF (ACC_ON_DEVICE (ACC_DEVICE_NONE)) CALL ABORT
       IF (ACC_ON_DEVICE (ACC_DEVICE_HOST)) CALL ABORT
       IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NOT_HOST)) CALL ABORT
-#if ACC_DEVICE_TYPE_nvidia
+#ifdef ACC_DEVICE_TYPE_nvidia
       IF (.NOT. ACC_ON_DEVICE (ACC_DEVICE_NVIDIA)) CALL ABORT
 #else
       IF (ACC_ON_DEVICE (ACC_DEVICE_NVIDIA)) CALL ABORT
diff --git libgomp/testsuite/libgomp.oacc-fortran/fortran.exp libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
index d78ce55..865c704 100644
--- libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
+++ libgomp/testsuite/libgomp.oacc-fortran/fortran.exp
@@ -71,7 +71,7 @@ if { $lang_test_file_found } {
 	switch -glob $offload_target_openacc {
 	    disable {
 		set acc_mem_shared 1
-		set tagopt "-DACC_DEVICE_TYPE_host=1"
+		set tagopt "-DACC_DEVICE_TYPE_host=\"\""
 	    }
 	    nvptx* {
 		if { ![check_effective_target_openacc_nvidia_accel_present] } {
@@ -81,7 +81,7 @@ if { $lang_test_file_found } {
 		}
 
 		set acc_mem_shared 0
-		set tagopt "-DACC_DEVICE_TYPE_nvidia=1"
+		set tagopt "-DACC_DEVICE_TYPE_nvidia=\"$offload_target_openacc\""
 	    }
 	    default {
 		set acc_mem_shared 0


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2018-05-20 19:46 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-27 18:17 [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming Ilya Verbin
2014-09-29  1:10 ` Jan Hubicka
2014-09-29 17:37   ` Ilya Verbin
2014-09-30 11:40     ` Thomas Schwinge
2014-10-01 16:13       ` Ilya Verbin
2014-10-08  8:45         ` Jakub Jelinek
2014-10-08  9:13           ` Jakub Jelinek
2014-10-15 14:28         ` Richard Biener
2014-10-20 11:21           ` Ilya Verbin
2014-10-20 11:26             ` Jakub Jelinek
2014-10-24 14:16             ` Ilya Verbin
2014-10-24 14:29               ` Jakub Jelinek
2014-10-28 19:32                 ` Ilya Verbin
2014-11-03  9:24                   ` Jakub Jelinek
2014-11-05 12:47                     ` Ilya Verbin
2014-11-05 12:50                       ` Jakub Jelinek
2014-11-07 14:41                         ` Kirill Yukhin
2014-11-12  9:32                       ` Richard Biener
2014-11-12 14:11                         ` Kirill Yukhin
2014-11-12 14:23                           ` Richard Biener
2014-11-12 14:35                             ` Kirill Yukhin
2014-11-12 14:41                               ` Richard Biener
2014-11-12 17:38                                 ` Ilya Verbin
2014-11-13  8:51                                   ` Richard Biener
2015-07-31 15:37                       ` Thomas Schwinge
2015-07-31 15:43                         ` Ilya Verbin
2015-08-05  8:40                           ` Richard Biener
2015-08-05 15:09                             ` Ilya Verbin
2015-08-14  9:49                               ` Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) (was: [PATCH 2/n] OpenMP 4.0 offloading infrastructure: LTO streaming) Thomas Schwinge
2015-08-14 13:29                                 ` Ilya Verbin
2015-08-17 13:57                                   ` Martin Jambor
2015-08-14 17:08                                 ` Joseph Myers
2015-08-14 21:48                                   ` Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) Thomas Schwinge
2015-08-15  4:03                                     ` Joseph Myers
2015-08-18 16:55                                       ` Thomas Schwinge
2015-08-20 23:38                                         ` Joseph Myers
2015-08-21 16:13                                           ` Nathan Sidwell
2015-08-21 16:21                                             ` Joseph Myers
2015-08-24 18:05                                               ` Joseph Myers
2015-08-24 22:50                                                 ` Joseph Myers
2015-08-24 23:26                                                   ` Nathan Sidwell
2015-08-25 15:04                                           ` Joseph Myers
2018-05-20 20:30                                           ` [og7] " Thomas Schwinge
2015-08-27 20:58 Pass -foffload targets from driver to libgomp at link time Joseph Myers
2015-09-03 14:58 ` Ping " Joseph Myers
2015-09-10 14:01   ` Ping^2 " Joseph Myers
2015-09-10 14:03     ` Bernd Schmidt
2015-09-11 14:29       ` Joseph Myers
2015-09-11 14:48         ` Bernd Schmidt
2015-09-11 15:28           ` Joseph Myers
2015-09-11 15:47             ` Jakub Jelinek
2015-09-11 16:16               ` Joseph Myers
2015-09-28 10:09               ` Thomas Schwinge
2015-09-29  9:48                 ` Jakub Jelinek
2015-09-30 16:15                   ` Thomas Schwinge
2015-10-19 16:56                     ` Forwarding -foffload=[...] from the driver (compile-time) to libgomp (run-time) Thomas Schwinge
2015-10-20 10:03                       ` Jakub Jelinek
2015-10-20 10:44                         ` Bernd Schmidt
2015-10-20 11:18                         ` Thomas Schwinge
2015-10-20 11:49                           ` Bernd Schmidt
2015-10-20 12:13                             ` Jakub Jelinek
2015-10-20 11:52                           ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).