public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [RFC PATCH v2] cgraph support for late declare variant resolution
@ 2020-05-13 11:16 Jakub Jelinek
  2020-05-14 12:17 ` [RFH] LTO " Jakub Jelinek
  0 siblings, 1 reply; 3+ messages in thread
From: Jakub Jelinek @ 2020-05-13 11:16 UTC (permalink / raw)
  To: Jan Hubicka, Martin Jambor; +Cc: gcc-patches

Hi!

This is a new version of the
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-11/msg01493.html
patch.  Unlike the previous version, this one actually works properly
except for LTO, bootstrapped/regtested on x86_64-linux and i686-linux
too.

In short, #pragma omp declare variant is a directive which allows
redirection of direct calls to certain function to other calls with a
scoring system and some of those decisions need to be deferred until after
IPA.  The patch represents them with calls to an artificial FUNCTION_DECL
with declare_variant_alt in the cgraph_node set.

Honza/Martin, are the cgraph related changes acceptable to you?

For LTO, the patch only saves/restores the two cgraph_node bits added in the
patch, but doesn't yet stream out and back in the on the side info for the
declare_variant_alt.  For the LTO partitioning, I believe those artificial
FUNCTION_DECLs with declare_variant_alt need to go into partition together
with anything that calls them (possibly duplicated), any way how to achieve
that?  Say if declare variant artificial fn foobar is directly
called from all of foo, bar and baz and not from qux and we want 4
partitions, one for each of foo, bar, baz, qux, then foobar is needed in the
first 3 partitions, and the IPA_REF_ADDRs recorded for foobar that right
after IPA the foobar call will be replaced with calls to foobar1, foobar2,
foobar3 or foobar (non-artificial) can of course stay in different
partitions if needed.

2020-05-13  Jakub Jelinek  <jakub@redhat.com>

	* Makefile.in (GTFILES): Add omp-general.c.
	* cgraph.h (struct cgraph_node): Add declare_variant_alt and
	calls_declare_variant_alt members and initialize them in the
	ctor.
	* ipa.c (symbol_table::remove_unreachable_nodes): Handle direct
	calls to declare_variant_alt nodes.
	* lto-cgraph.c (lto_output_node): Write declare_variant_alt
	and calls_declare_variant_alt.
	(input_overwrite_node): Read them back.
	* omp-simd-clone.c (simd_clone_create): Copy calls_declare_variant_alt
	bit.
	* tree-inline.c (expand_call_inline): Or in calls_declare_variant_alt
	bit.
	(tree_function_versioning): Copy calls_declare_variant_alt bit.
	* omp-offload.c (execute_omp_device_lower): Call
	omp_resolve_declare_variant on direct function calls.
	(pass_omp_device_lower::gate): Also enable for
	calls_declare_variant_alt functions.
	* omp-general.c (omp_maybe_offloaded): Return false after inlining.
	(omp_context_selector_matches): Handle the case when
	cfun->curr_properties has PROP_gimple_any bit set.
	(struct omp_declare_variant_entry): New type.
	(struct omp_declare_variant_base_entry): New type.
	(struct omp_declare_variant_hasher): New type. 
	(omp_declare_variant_hasher::hash, omp_declare_variant_hasher::equal):
	New methods.
	(omp_declare_variants): New variable.
	(struct omp_declare_variant_alt_hasher): New type.
	(omp_declare_variant_alt_hasher::hash,
	omp_declare_variant_alt_hasher::equal): New methods.
	(omp_declare_variant_alt): New variables.
	(omp_resolve_late_declare_variant): New function.
	(omp_resolve_declare_variant): Call omp_resolve_late_declare_variant
	when called late.  Create a magic declare_variant_alt fndecl and
	cgraph node and return that if decision needs to be deferred until
	after gimplification.
	* cgraph.c (symbol_table::create_edge): Or in calls_declare_variant_alt
	bit.

	* c-c++-common/gomp/declare-variant-14.c: New test.

--- gcc/Makefile.in.jj	2020-05-12 21:20:52.701547377 +0200
+++ gcc/Makefile.in	2020-05-13 11:34:54.869947514 +0200
@@ -2616,6 +2616,7 @@ GTFILES = $(CPPLIB_H) $(srcdir)/input.h
   $(srcdir)/omp-offload.h \
   $(srcdir)/omp-offload.c \
   $(srcdir)/omp-expand.c \
+  $(srcdir)/omp-general.c \
   $(srcdir)/omp-low.c \
   $(srcdir)/targhooks.c $(out_file) $(srcdir)/passes.c $(srcdir)/cgraphunit.c \
   $(srcdir)/cgraphclones.c \
--- gcc/cgraph.h.jj	2020-05-12 21:20:47.433626426 +0200
+++ gcc/cgraph.h	2020-05-13 11:34:54.870947499 +0200
@@ -937,7 +937,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cg
       split_part (false), indirect_call_target (false), local (false),
       versionable (false), can_change_signature (false),
       redefined_extern_inline (false), tm_may_enter_irr (false),
-      ipcp_clone (false), m_uid (uid), m_summary_id (-1)
+      ipcp_clone (false), declare_variant_alt (false),
+      calls_declare_variant_alt (false), m_uid (uid), m_summary_id (-1)
   {}
 
   /* Remove the node from cgraph and all inline clones inlined into it.
@@ -1539,6 +1540,11 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cg
   unsigned tm_may_enter_irr : 1;
   /* True if this was a clone created by ipa-cp.  */
   unsigned ipcp_clone : 1;
+  /* True if this is the deferred declare variant resolution artificial
+     function.  */
+  unsigned declare_variant_alt : 1;
+  /* True if the function calls declare_variant_alt functions.  */
+  unsigned calls_declare_variant_alt : 1;
 
 private:
   /* Unique id of the node.  */
--- gcc/ipa.c.jj	2020-05-12 21:20:47.463625977 +0200
+++ gcc/ipa.c	2020-05-13 11:34:54.871947484 +0200
@@ -450,6 +450,9 @@ symbol_table::remove_unreachable_nodes (
 			reachable.add (body);
 		      reachable.add (e->callee);
 		    }
+		  else if (e->callee->declare_variant_alt
+			   && !e->callee->in_other_partition)
+		    reachable.add (e->callee);
 		  enqueue_node (e->callee, &first, &reachable);
 		}
 
--- gcc/lto-cgraph.c.jj	2020-05-12 21:20:47.472625842 +0200
+++ gcc/lto-cgraph.c	2020-05-13 11:34:54.871947484 +0200
@@ -535,6 +535,8 @@ lto_output_node (struct lto_simple_outpu
   bp_pack_value (&bp, node->merged_extern_inline, 1);
   bp_pack_value (&bp, node->thunk.thunk_p, 1);
   bp_pack_value (&bp, node->parallelized_function, 1);
+  bp_pack_value (&bp, node->declare_variant_alt, 1);
+  bp_pack_value (&bp, node->calls_declare_variant_alt, 1);
   bp_pack_enum (&bp, ld_plugin_symbol_resolution,
 	        LDPR_NUM_KNOWN,
 		/* When doing incremental link, we will get new resolution
@@ -1186,6 +1188,8 @@ input_overwrite_node (struct lto_file_de
   node->merged_extern_inline = bp_unpack_value (bp, 1);
   node->thunk.thunk_p = bp_unpack_value (bp, 1);
   node->parallelized_function = bp_unpack_value (bp, 1);
+  node->declare_variant_alt = bp_unpack_value (bp, 1);
+  node->calls_declare_variant_alt = bp_unpack_value (bp, 1);
   node->resolution = bp_unpack_enum (bp, ld_plugin_symbol_resolution,
 				     LDPR_NUM_KNOWN);
   node->split_part = bp_unpack_value (bp, 1);
--- gcc/omp-simd-clone.c.jj	2020-05-12 21:20:47.500625421 +0200
+++ gcc/omp-simd-clone.c	2020-05-13 11:34:54.872947469 +0200
@@ -477,6 +477,7 @@ simd_clone_create (struct cgraph_node *o
      the old node.  */
   new_node->local = old_node->local;
   new_node->externally_visible = old_node->externally_visible;
+  new_node->calls_declare_variant_alt = old_node->calls_declare_variant_alt;
 
   return new_node;
 }
--- gcc/tree-inline.c.jj	2020-05-12 21:20:47.518625152 +0200
+++ gcc/tree-inline.c	2020-05-13 11:34:54.874947439 +0200
@@ -4900,6 +4900,8 @@ expand_call_inline (basic_block bb, gimp
   if (src_properties != prop_mask)
     dst_cfun->curr_properties &= src_properties | ~prop_mask;
   dst_cfun->calls_eh_return |= id->src_cfun->calls_eh_return;
+  id->dst_node->calls_declare_variant_alt
+    |= id->src_node->calls_declare_variant_alt;
 
   gcc_assert (!id->src_cfun->after_inlining);
 
@@ -6231,6 +6233,8 @@ tree_function_versioning (tree old_decl,
   DECL_ARGUMENTS (new_decl) = DECL_ARGUMENTS (old_decl);
   initialize_cfun (new_decl, old_decl,
 		   new_entry ? new_entry->count : old_entry_block->count);
+  new_version_node->calls_declare_variant_alt
+    = old_version_node->calls_declare_variant_alt;
   if (DECL_STRUCT_FUNCTION (new_decl)->gimple_df)
     DECL_STRUCT_FUNCTION (new_decl)->gimple_df->ipa_pta
       = id.src_cfun->gimple_df->ipa_pta;
--- gcc/omp-offload.c.jj	2020-05-12 21:20:47.499625436 +0200
+++ gcc/omp-offload.c	2020-05-13 11:34:54.874947439 +0200
@@ -2038,12 +2038,28 @@ execute_omp_device_lower ()
   bool regimplify = false;
   basic_block bb;
   gimple_stmt_iterator gsi;
+  bool calls_declare_variant_alt
+    = cgraph_node::get (cfun->decl)->calls_declare_variant_alt;
   FOR_EACH_BB_FN (bb, cfun)
     for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
       {
 	gimple *stmt = gsi_stmt (gsi);
-	if (!is_gimple_call (stmt) || !gimple_call_internal_p (stmt))
+	if (!is_gimple_call (stmt))
 	  continue;
+	if (!gimple_call_internal_p (stmt))
+	  {
+	    if (calls_declare_variant_alt)
+	      if (tree fndecl = gimple_call_fndecl (stmt))
+		{
+		  tree new_fndecl = omp_resolve_declare_variant (fndecl);
+		  if (new_fndecl != fndecl)
+		    {
+		      gimple_call_set_fndecl (stmt, new_fndecl);
+		      update_stmt (stmt);
+		    }
+		}
+	    continue;
+	  }
 	tree lhs = gimple_call_lhs (stmt), rhs = NULL_TREE;
 	tree type = lhs ? TREE_TYPE (lhs) : integer_type_node;
 	switch (gimple_call_internal_fn (stmt))
@@ -2137,7 +2153,9 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *fun)
     {
-      return !(fun->curr_properties & PROP_gimple_lomp_dev);
+      return (!(fun->curr_properties & PROP_gimple_lomp_dev)
+	      || (flag_openmp
+		  && cgraph_node::get (fun->decl)->calls_declare_variant_alt));
     }
   virtual unsigned int execute (function *)
     {
--- gcc/omp-general.c.jj	2020-05-12 21:20:47.498625451 +0200
+++ gcc/omp-general.c	2020-05-13 11:59:16.658204810 +0200
@@ -642,6 +642,8 @@ omp_maybe_offloaded (void)
   if (symtab->state == PARSING)
     /* Maybe.  */
     return true;
+  if (cfun && cfun->after_inlining)
+    return false;
   if (current_function_decl
       && lookup_attribute ("omp declare target",
 			   DECL_ATTRIBUTES (current_function_decl)))
@@ -694,8 +696,7 @@ omp_context_selector_matches (tree ctx)
 	     (so in most of the cases), and we'd need to maintain set of
 	     surrounding OpenMP constructs, which is better handled during
 	     gimplification.  */
-	  if (symtab->state == PARSING
-	      || (cfun->curr_properties & PROP_gimple_any) != 0)
+	  if (symtab->state == PARSING)
 	    {
 	      ret = -1;
 	      continue;
@@ -704,6 +705,28 @@ omp_context_selector_matches (tree ctx)
 	  enum tree_code constructs[5];
 	  int nconstructs
 	    = omp_constructor_traits_to_codes (TREE_VALUE (t1), constructs);
+
+	  if (cfun && (cfun->curr_properties & PROP_gimple_any) != 0)
+	    {
+	      if (!cfun->after_inlining)
+		{
+		  ret = -1;
+		  continue;
+		}
+	      int i;
+	      for (i = 0; i < nconstructs; ++i)
+		if (constructs[i] == OMP_SIMD)
+		  break;
+	      if (i < nconstructs)
+		{
+		  ret = -1;
+		  continue;
+		}
+	      /* If there is no simd, assume it is ok after IPA,
+		 constructs should have been checked before.  */
+	      continue;
+	    }
+
 	  int r = omp_construct_selector_matches (constructs, nconstructs,
 						  NULL);
 	  if (r == 0)
@@ -738,6 +761,9 @@ omp_context_selector_matches (tree ctx)
 	    case 'a':
 	      if (set == 'i' && !strcmp (sel, "atomic_default_mem_order"))
 		{
+		  if (cfun && (cfun->curr_properties & PROP_gimple_any) != 0)
+		    break;
+
 		  enum omp_memory_order omo
 		    = ((enum omp_memory_order)
 		       (omp_requires_mask
@@ -816,6 +842,9 @@ omp_context_selector_matches (tree ctx)
 	    case 'u':
 	      if (set == 'i' && !strcmp (sel, "unified_address"))
 		{
+		  if (cfun && (cfun->curr_properties & PROP_gimple_any) != 0)
+		    break;
+
 		  if ((omp_requires_mask & OMP_REQUIRES_UNIFIED_ADDRESS) == 0)
 		    {
 		      if (symtab->state == PARSING)
@@ -827,6 +856,9 @@ omp_context_selector_matches (tree ctx)
 		}
 	      if (set == 'i' && !strcmp (sel, "unified_shared_memory"))
 		{
+		  if (cfun && (cfun->curr_properties & PROP_gimple_any) != 0)
+		    break;
+
 		  if ((omp_requires_mask
 		       & OMP_REQUIRES_UNIFIED_SHARED_MEMORY) == 0)
 		    {
@@ -841,6 +873,9 @@ omp_context_selector_matches (tree ctx)
 	    case 'd':
 	      if (set == 'i' && !strcmp (sel, "dynamic_allocators"))
 		{
+		  if (cfun && (cfun->curr_properties & PROP_gimple_any) != 0)
+		    break;
+
 		  if ((omp_requires_mask
 		       & OMP_REQUIRES_DYNAMIC_ALLOCATORS) == 0)
 		    {
@@ -855,6 +890,9 @@ omp_context_selector_matches (tree ctx)
 	    case 'r':
 	      if (set == 'i' && !strcmp (sel, "reverse_offload"))
 		{
+		  if (cfun && (cfun->curr_properties & PROP_gimple_any) != 0)
+		    break;
+
 		  if ((omp_requires_mask & OMP_REQUIRES_REVERSE_OFFLOAD) == 0)
 		    {
 		      if (symtab->state == PARSING)
@@ -944,7 +982,8 @@ omp_context_selector_matches (tree ctx)
 			   #pragma omp declare simd on it, some simd clones
 			   might have the isa added later on.  */
 			if (r == -1
-			    && targetm.simd_clone.compute_vecsize_and_simdlen)
+			    && targetm.simd_clone.compute_vecsize_and_simdlen
+			    && (cfun == NULL || !cfun->after_inlining))
 			  {
 			    tree attrs
 			      = DECL_ATTRIBUTES (current_function_decl);
@@ -1415,6 +1454,191 @@ omp_context_compute_score (tree ctx, wid
   return ret;
 }
 
+/* Class describing a single variant.  */
+struct GTY(()) omp_declare_variant_entry {
+  /* NODE of the variant.  */
+  cgraph_node *variant;
+  /* Score if not in declare simd clone.  */
+  widest_int score;
+  /* Score if in declare simd clone.  */
+  widest_int score_in_declare_simd_clone;
+  /* Context selector for the variant.  */
+  tree ctx;
+  /* True if the context selector is known to match already.  */
+  bool matches;
+};
+
+/* Class describing a function with variants.  */
+struct GTY((for_user)) omp_declare_variant_base_entry {
+  /* NODE of the base function.  */
+  cgraph_node *base;
+  /* NODE of the artificial function created for the deferred variant
+     resolution.  */
+  cgraph_node *node;
+  /* Vector of the variants.  */
+  vec<omp_declare_variant_entry, va_gc> *variants;
+};
+
+struct omp_declare_variant_hasher
+  : ggc_ptr_hash<omp_declare_variant_base_entry> {
+  static hashval_t hash (omp_declare_variant_base_entry *);
+  static bool equal (omp_declare_variant_base_entry *,
+		     omp_declare_variant_base_entry *);
+};
+
+hashval_t
+omp_declare_variant_hasher::hash (omp_declare_variant_base_entry *x)
+{
+  inchash::hash hstate;
+  hstate.add_int (DECL_UID (x->base->decl));
+  hstate.add_int (x->variants->length ());
+  omp_declare_variant_entry *variant;
+  unsigned int i;
+  FOR_EACH_VEC_SAFE_ELT (x->variants, i, variant)
+    {
+      hstate.add_int (DECL_UID (variant->variant->decl));
+      hstate.add_wide_int (variant->score);
+      hstate.add_wide_int (variant->score_in_declare_simd_clone);
+      hstate.add_ptr (variant->ctx);
+      hstate.add_int (variant->matches);
+    }
+  return hstate.end ();
+}
+
+bool
+omp_declare_variant_hasher::equal (omp_declare_variant_base_entry *x,
+				   omp_declare_variant_base_entry *y)
+{
+  if (x->base != y->base
+      || x->variants->length () != y->variants->length ())
+    return false;
+  omp_declare_variant_entry *variant;
+  unsigned int i;
+  FOR_EACH_VEC_SAFE_ELT (x->variants, i, variant)
+    if (variant->variant != (*y->variants)[i].variant
+	|| variant->score != (*y->variants)[i].score
+	|| (variant->score_in_declare_simd_clone
+	    != (*y->variants)[i].score_in_declare_simd_clone)
+	|| variant->ctx != (*y->variants)[i].ctx
+	|| variant->matches != (*y->variants)[i].matches)
+      return false;
+  return true;
+}
+
+static GTY(()) hash_table<omp_declare_variant_hasher> *omp_declare_variants;
+
+struct omp_declare_variant_alt_hasher
+  : ggc_ptr_hash<omp_declare_variant_base_entry> {
+  static hashval_t hash (omp_declare_variant_base_entry *);
+  static bool equal (omp_declare_variant_base_entry *,
+		     omp_declare_variant_base_entry *);
+};
+
+hashval_t
+omp_declare_variant_alt_hasher::hash (omp_declare_variant_base_entry *x)
+{
+  return DECL_UID (x->node->decl);
+}
+
+bool
+omp_declare_variant_alt_hasher::equal (omp_declare_variant_base_entry *x,
+				       omp_declare_variant_base_entry *y)
+{
+  return x->node == y->node;
+}
+
+static GTY(()) hash_table<omp_declare_variant_alt_hasher>
+  *omp_declare_variant_alt;
+
+/* Try to resolve declare variant after gimplification.  */
+
+static tree
+omp_resolve_late_declare_variant (tree alt)
+{
+  cgraph_node *node = cgraph_node::get (alt);
+  cgraph_node *cur_node = cgraph_node::get (cfun->decl);
+  if (node == NULL
+      || !node->declare_variant_alt
+      || !cfun->after_inlining)
+    return alt;
+
+  omp_declare_variant_base_entry entry;
+  entry.base = NULL;
+  entry.node = node;
+  entry.variants = NULL;
+  omp_declare_variant_base_entry *entryp
+    = omp_declare_variant_alt->find_with_hash (&entry, DECL_UID (alt));
+
+  unsigned int i, j;
+  omp_declare_variant_entry *varentry1, *varentry2;
+  auto_vec <bool, 16> matches;
+  unsigned int nmatches = 0;
+  FOR_EACH_VEC_SAFE_ELT (entryp->variants, i, varentry1)
+    {
+      if (varentry1->matches)
+	{
+	  /* This has been checked to be ok already.  */
+	  matches.safe_push (true);
+	  nmatches++;
+	  continue;
+	}
+      switch (omp_context_selector_matches (varentry1->ctx))
+	{
+	case 0:
+          matches.safe_push (false);
+	  break;
+	case -1:
+	  return alt;
+	default:
+	  matches.safe_push (true);
+	  nmatches++;
+	  break;
+	}
+    }
+
+  if (nmatches == 0)
+    return entryp->base->decl;
+
+  /* A context selector that is a strict subset of another context selector
+     has a score of zero.  */
+  FOR_EACH_VEC_SAFE_ELT (entryp->variants, i, varentry1)
+    if (matches[i])
+      {
+        for (j = i + 1;
+	     vec_safe_iterate (entryp->variants, j, &varentry2); ++j)
+	  if (matches[j])
+	    {
+	      int r = omp_context_selector_compare (varentry1->ctx,
+						    varentry2->ctx);
+	      if (r == -1)
+		{
+		  /* ctx1 is a strict subset of ctx2, ignore ctx1.  */
+		  matches[i] = false;
+		  break;
+		}
+	      else if (r == 1)
+		/* ctx2 is a strict subset of ctx1, remove ctx2.  */
+		matches[j] = false;
+	    }
+      }
+
+  widest_int max_score = -1;
+  varentry2 = NULL;
+  FOR_EACH_VEC_SAFE_ELT (entryp->variants, i, varentry1)
+    if (matches[i])
+      {
+	widest_int score
+	  = (cur_node->simdclone ? varentry1->score_in_declare_simd_clone
+	     : varentry1->score);
+	if (score > max_score)
+	  {
+	    max_score = score;
+	    varentry2 = varentry1;
+	  }
+      }
+  return varentry2->variant->decl;
+}
+
 /* Try to resolve declare variant, return the variant decl if it should
    be used instead of base, or base otherwise.  */
 
@@ -1422,6 +1646,9 @@ tree
 omp_resolve_declare_variant (tree base)
 {
   tree variant1 = NULL_TREE, variant2 = NULL_TREE;
+  if (cfun && (cfun->curr_properties & PROP_gimple_any) != 0)
+    return omp_resolve_late_declare_variant (base);
+
   auto_vec <tree, 16> variants;
   auto_vec <bool, 16> defer;
   bool any_deferred = false;
@@ -1459,6 +1686,10 @@ omp_resolve_declare_variant (tree base)
       bool first = true;
       unsigned int i;
       tree attr1, attr2;
+      omp_declare_variant_base_entry entry;
+      entry.base = cgraph_node::get_create (base);
+      entry.node = NULL;
+      vec_alloc (entry.variants, variants.length ());
       FOR_EACH_VEC_ELT (variants, i, attr1)
 	{
 	  widest_int score1;
@@ -1498,6 +1729,14 @@ omp_resolve_declare_variant (tree base)
 		  variant2 = defer[i] ? NULL_TREE : attr1;
 		}
 	    }
+	  omp_declare_variant_entry varentry;
+	  varentry.variant
+	    = cgraph_node::get_create (TREE_PURPOSE (TREE_VALUE (attr1)));
+	  varentry.score = score1;
+	  varentry.score_in_declare_simd_clone = score2;
+	  varentry.ctx = ctx;
+	  varentry.matches = !defer[i];
+	  entry.variants->quick_push (varentry);
 	}
 
       /* If there is a clear winner variant with the score which is not
@@ -1522,17 +1761,67 @@ omp_resolve_declare_variant (tree base)
 		}
 	    }
 	  if (variant1)
-	    return TREE_PURPOSE (TREE_VALUE (variant1));
+	    {
+	      vec_free (entry.variants);
+	      return TREE_PURPOSE (TREE_VALUE (variant1));
+	    }
+	}
+
+      if (omp_declare_variants == NULL)
+	omp_declare_variants
+	  = hash_table<omp_declare_variant_hasher>::create_ggc (64);
+      omp_declare_variant_base_entry **slot
+	= omp_declare_variants->find_slot (&entry, INSERT);
+      if (*slot != NULL)
+	{
+	  vec_free (entry.variants);
+	  return (*slot)->node->decl;
 	}
 
-      return base;
+      *slot = ggc_cleared_alloc<omp_declare_variant_base_entry> ();
+      (*slot)->base = entry.base;
+      (*slot)->node = entry.base;
+      (*slot)->variants = entry.variants;
+      tree alt = build_decl (DECL_SOURCE_LOCATION (base), FUNCTION_DECL,
+			     DECL_NAME (base), TREE_TYPE (base));
+      DECL_ARTIFICIAL (alt) = 1;
+      DECL_IGNORED_P (alt) = 1;
+      TREE_STATIC (alt) = 1;
+      tree attributes = DECL_ATTRIBUTES (base);
+      if (lookup_attribute ("noipa", attributes) == NULL)
+	{
+	  attributes = tree_cons (get_identifier ("noipa"), NULL, attributes);
+	  if (lookup_attribute ("noinline", attributes) == NULL)
+	    attributes = tree_cons (get_identifier ("noinline"), NULL,
+				    attributes);
+	  if (lookup_attribute ("noclone", attributes) == NULL)
+	    attributes = tree_cons (get_identifier ("noclone"), NULL,
+				    attributes);
+	  if (lookup_attribute ("no_icf", attributes) == NULL)
+	    attributes = tree_cons (get_identifier ("no_icf"), NULL,
+				    attributes);
+	}
+      DECL_ATTRIBUTES (alt) = attributes;
+      DECL_INITIAL (alt) = error_mark_node;
+      (*slot)->node = cgraph_node::create (alt);
+      (*slot)->node->declare_variant_alt = 1;
+      (*slot)->node->create_reference (entry.base, IPA_REF_ADDR);
+      omp_declare_variant_entry *varentry;
+      FOR_EACH_VEC_SAFE_ELT (entry.variants, i, varentry)
+	(*slot)->node->create_reference (varentry->variant, IPA_REF_ADDR);
+      if (omp_declare_variant_alt == NULL)
+	omp_declare_variant_alt
+	  = hash_table<omp_declare_variant_alt_hasher>::create_ggc (64);
+      *omp_declare_variant_alt->find_slot_with_hash (*slot, DECL_UID (alt),
+						     INSERT) = *slot;
+      return alt;
     }
 
   if (variants.length () == 1)
     return TREE_PURPOSE (TREE_VALUE (variants[0]));
 
-  /* A context selector that is a strict subset of another context selector has a score
-     of zero.  */
+  /* A context selector that is a strict subset of another context selector
+     has a score of zero.  */
   tree attr1, attr2;
   unsigned int i, j;
   FOR_EACH_VEC_ELT (variants, i, attr1)
@@ -1948,3 +2237,5 @@ oacc_get_ifn_dim_arg (const gimple *stmt
   gcc_checking_assert (axis >= 0 && axis < GOMP_DIM_MAX);
   return (int) axis;
 }
+
+#include "gt-omp-general.h"
--- gcc/cgraph.c.jj	2020-05-12 21:20:47.405626848 +0200
+++ gcc/cgraph.c	2020-05-12 23:08:40.606680879 +0200
@@ -915,6 +915,8 @@ symbol_table::create_edge (cgraph_node *
 				      caller->decl);
   else
     edge->in_polymorphic_cdtor = caller->thunk.thunk_p;
+  if (callee)
+    caller->calls_declare_variant_alt |= callee->declare_variant_alt;
 
   if (callee && symtab->state != LTO_STREAMING
       && edge->callee->comdat_local_p ())
--- gcc/testsuite/c-c++-common/gomp/declare-variant-14.c.jj	2020-05-13 11:34:54.876947408 +0200
+++ gcc/testsuite/c-c++-common/gomp/declare-variant-14.c	2020-05-13 11:34:54.876947408 +0200
@@ -0,0 +1,28 @@
+/* { dg-do compile { target vect_simd_clones } } */
+/* { dg-additional-options "-fdump-tree-gimple -fdump-tree-optimized" } */
+/* { dg-additional-options "-mno-sse3" { target { i?86-*-* x86_64-*-* } } } */
+
+int f01 (int);
+int f02 (int);
+int f03 (int);
+#pragma omp declare variant (f01) match (device={isa("avx512f")}) /* 4 or 8 */
+#pragma omp declare variant (f02) match (implementation={vendor(score(3):gnu)},device={kind(cpu)}) /* (1 or 2) + 3 */
+#pragma omp declare variant (f03) match (implementation={vendor(score(5):gnu)},device={kind(host)}) /* (1 or 2) + 5 */
+int f04 (int);
+
+#pragma omp declare simd
+int
+test1 (int x)
+{
+  /* At gimplification time, we can't decide yet which function to call.  */
+  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" } } */
+  /* After simd clones are created, the original non-clone test1 shall
+     call f03 (score 6), the sse2/avx/avx2 clones too, but avx512f clones
+     shall call f01 with score 8.  */
+  /* { dg-final { scan-tree-dump-not "f04 \\\(x" "optimized" } } */
+  /* { dg-final { scan-tree-dump-times "f03 \\\(x" 14 "optimized" } } */
+  /* { dg-final { scan-tree-dump-times "f01 \\\(x" 4 "optimized" } } */
+  int a = f04 (x);
+  int b = f04 (x);
+  return a + b;
+}

	Jakub


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [RFH] LTO cgraph support for late declare variant resolution
  2020-05-13 11:16 [RFC PATCH v2] cgraph support for late declare variant resolution Jakub Jelinek
@ 2020-05-14 12:17 ` Jakub Jelinek
  2020-10-22 11:27   ` [PATCH] lto: " Jakub Jelinek
  0 siblings, 1 reply; 3+ messages in thread
From: Jakub Jelinek @ 2020-05-14 12:17 UTC (permalink / raw)
  To: Jan Hubicka, Martin Jambor; +Cc: gcc-patches

Hi!

I've committed the patch, so that the rest can be handled incrementally.

On Wed, May 13, 2020 at 01:16:42PM +0200, Jakub Jelinek wrote:
> Honza/Martin, are the cgraph related changes acceptable to you?
> 
> For LTO, the patch only saves/restores the two cgraph_node bits added in the
> patch, but doesn't yet stream out and back in the on the side info for the
> declare_variant_alt.  For the LTO partitioning, I believe those artificial
> FUNCTION_DECLs with declare_variant_alt need to go into partition together
> with anything that calls them (possibly duplicated), any way how to achieve
> that?  Say if declare variant artificial fn foobar is directly
> called from all of foo, bar and baz and not from qux and we want 4
> partitions, one for each of foo, bar, baz, qux, then foobar is needed in the
> first 3 partitions, and the IPA_REF_ADDRs recorded for foobar that right
> after IPA the foobar call will be replaced with calls to foobar1, foobar2,
> foobar3 or foobar (non-artificial) can of course stay in different
> partitions if needed.

I've tried to add the saving/restoring next to ipa refs saving/restoring, as
the declare variant alt stuff is kind of extension of those, unfortunately
following doesn't compile, because I need to also write or read a tree there
(ctx is a portion of DECL_ATTRIBUTES of the base function), but the ipa refs
write/read back functions don't have arguments that can be used for that.

Any idea where to do it instead (for all cgraph_nodes with
declare_variant_alt call the function to write it which needs to contain
a few other cgraph_nodes (duplicated also in the ipa_refs), some widest_ints,
one tree and some booleans)?

Also, do I need to do anything special to avoid LTO merging those artificial
decls?  It is just fine if their ipa refs are merged, but the artificial
vars would be fine only if they are the same (could use the other hash table
for that).

--- gcc/symtab.c.jj	2020-04-20 15:51:19.005560662 +0200
+++ gcc/symtab.c	2020-05-14 12:25:41.530745061 +0200
@@ -1984,7 +1984,7 @@ symtab_node::get_partitioning_class (voi
   if (DECL_ABSTRACT_P (decl))
     return SYMBOL_EXTERNAL;
 
-  if (cnode && cnode->inlined_to)
+  if (cnode && (cnode->inlined_to || cnode->declare_variant_alt))
     return SYMBOL_DUPLICATE;
 
   /* Transparent aliases are always duplicated.  */
--- gcc/lto-cgraph.c.jj	2020-05-14 09:58:21.353412170 +0200
+++ gcc/lto-cgraph.c	2020-05-14 12:39:01.592642219 +0200
@@ -766,6 +766,9 @@ output_refs (lto_symtab_encoder_t encode
 	  for (int i = 0; node->iterate_reference (i, ref); i++)
 	    lto_output_ref (ob, ref, encoder);
 	}
+      if (cgraph_node *cnode = dyn_cast <cgraph_node *> (node))
+	if (cnode->declare_variant_alt)
+	  omp_lto_output_declare_variant_alt (ob, cnode, encoder);
     }
 
   streamer_write_uhwi_stream (ob->main_stream, 0);
@@ -1610,6 +1613,9 @@ input_refs (class lto_input_block *ib,
 	  input_ref (ib, node, nodes);
 	  count--;
 	}
+      if (cgraph_node *cnode = dyn_cast <cgraph_node *> (node))
+	if (cnode->declare_variant_alt)
+	  omp_lto_input_declare_variant_alt (ib, cnode, nodes);
     }
 }
 	    
--- gcc/omp-general.c.jj	2020-05-14 09:58:21.394411547 +0200
+++ gcc/omp-general.c	2020-05-14 13:14:09.338841298 +0200
@@ -42,6 +42,8 @@ along with GCC; see the file COPYING3.
 #include "hsa-common.h"
 #include "tree-pass.h"
 #include "omp-device-properties.h"
+#include "data-streamer.h"
+#include "streamer-hooks.h"
 
 enum omp_requires omp_requires_mask;
 
@@ -1898,6 +1900,91 @@ omp_resolve_declare_variant (tree base)
 	  ? TREE_PURPOSE (TREE_VALUE (variant1)) : base);
 }
 
+void
+omp_lto_output_declare_variant_alt (lto_simple_output_block *ob,
+				    cgraph_node *node,
+				    lto_symtab_encoder_t encoder)
+{
+  gcc_assert (node->declare_variant_alt);
+
+  omp_declare_variant_base_entry entry;
+  entry.base = NULL;
+  entry.node = node;
+  entry.variants = NULL;
+  omp_declare_variant_base_entry *entryp
+    = omp_declare_variant_alt->find_with_hash (&entry, DECL_UID (node->decl));
+  gcc_assert (entryp);
+
+  int nbase = lto_symtab_encoder_lookup (encoder, entryp->base);
+  gcc_assert (nbase != LCC_NOT_FOUND);
+  streamer_write_hwi_stream (ob->main_stream, nbase);
+
+  streamer_write_hwi_stream (ob->main_stream, entryp->variants->length ());
+
+  unsigned int i;
+  omp_declare_variant_entry *varentry;
+  FOR_EACH_VEC_SAFE_ELT (entryp->variants, i, varentry)
+    {
+      int nvar = lto_symtab_encoder_lookup (encoder, varentry->variant);
+      gcc_assert (nvar != LCC_NOT_FOUND);
+      streamer_write_hwi_stream (ob->main_stream, nvar);
+
+      for (widest_int *w = &varentry->score; ;
+	   w = &varentry->score_in_declare_simd_clone)
+	{
+	  unsigned len = w->get_len ();
+	  streamer_write_hwi_stream (ob->main_stream, len);
+	  const HOST_WIDE_INT *val = w->get_val ();
+	  for (unsigned j = 0; j < len; j++)
+	    streamer_write_hwi_stream (ob->main_stream, val[j]);
+	  if (w == &varentry->score_in_declare_simd_clone)
+	    break;
+	}
+
+      stream_write_tree (ob, varentry->ctx, false);
+      streamer_write_hwi_stream (ob->main_stream, varentry->matches);
+    }
+}
+
+void
+omp_lto_input_declare_variant_alt (lto_input_block *ib, cgraph_node *node,
+				   vec<symtab_node *> nodes)
+{
+  gcc_assert (node->declare_variant_alt);
+  omp_declare_variant_base_entry *entryp
+    = ggc_cleared_alloc<omp_declare_variant_base_entry> ();
+  entryp->base = dyn_cast<cgraph_node *> (nodes[streamer_read_hwi (ib)]);
+  entryp->node = node;
+  unsigned int len = streamer_read_hwi (ib);
+  vec_alloc (entryp->variants, len);
+
+  for (unsigned int i = 0; i < len; i++)
+    {
+      omp_declare_variant_entry varentry;
+      varentry.variant
+	= dyn_cast<cgraph_node *> (nodes[streamer_read_hwi (ib)]);
+      for (widest_int *w = &varentry.score; ;
+	   w = &varentry.score_in_declare_simd_clone)
+	{
+	  unsigned len2 = streamer_read_hwi (ib);
+	  HOST_WIDE_INT arr[WIDE_INT_MAX_ELTS];
+	  gcc_assert (len2 <= WIDE_INT_MAX_ELTS);
+	  for (unsigned int j = 0; j < len2; j++)
+	    arr[j] = streamer_read_hwi (ib);
+	  *w = widest_int::from_array (arr, len2, true);
+	  if (w == &varentry.score_in_declare_simd_clone)
+	    break;
+	}
+      varentry.ctx = stream_read_tree (ib, /*data_in*/NULL);
+      varentry.matches = streamer_read_hwi (ib) != 0;
+      entryp->variants->quick_push (varentry);
+    }
+  if (omp_declare_variant_alt == NULL)
+    omp_declare_variant_alt
+      = hash_table<omp_declare_variant_alt_hasher>::create_ggc (64);
+  *omp_declare_variant_alt->find_slot_with_hash (entryp, DECL_UID (node->decl),
+						 INSERT) = entryp;
+}
 
 /* Encode an oacc launch argument.  This matches the GOMP_LAUNCH_PACK
    macro on gomp-constants.h.  We do not check for overflow.  */


	Jakub


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] lto: LTO cgraph support for late declare variant resolution
  2020-05-14 12:17 ` [RFH] LTO " Jakub Jelinek
@ 2020-10-22 11:27   ` Jakub Jelinek
  0 siblings, 0 replies; 3+ messages in thread
From: Jakub Jelinek @ 2020-10-22 11:27 UTC (permalink / raw)
  To: Jan Hubicka, Martin Jambor, Richard Biener; +Cc: gcc-patches

On Thu, May 14, 2020 at 02:17:45PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > For LTO, the patch only saves/restores the two cgraph_node bits added in the
> > patch, but doesn't yet stream out and back in the on the side info for the
> > declare_variant_alt.  For the LTO partitioning, I believe those artificial
> > FUNCTION_DECLs with declare_variant_alt need to go into partition together
> > with anything that calls them (possibly duplicated), any way how to achieve
> > that?  Say if declare variant artificial fn foobar is directly
> > called from all of foo, bar and baz and not from qux and we want 4
> > partitions, one for each of foo, bar, baz, qux, then foobar is needed in the
> > first 3 partitions, and the IPA_REF_ADDRs recorded for foobar that right
> > after IPA the foobar call will be replaced with calls to foobar1, foobar2,
> > foobar3 or foobar (non-artificial) can of course stay in different
> > partitions if needed.
> 
> I've tried to add the saving/restoring next to ipa refs saving/restoring, as
> the declare variant alt stuff is kind of extension of those, unfortunately
> following doesn't compile, because I need to also write or read a tree there
> (ctx is a portion of DECL_ATTRIBUTES of the base function), but the ipa refs
> write/read back functions don't have arguments that can be used for that.

This patch adds the streaming out and in of those omp_declare_variant_alt
hash table on the side data for the declare_variant_alt cgraph_nodes and
treats for LTO purposes the declare_variant_alt nodes (which have no body)
as if they contained a body that calls all the possible variants.
After IPA all the calls to these magic declare_variant_alt calls are
replaced with call to one of the variant depending on which one has the
highest score in the context.

Honza, any comments/suggestions on this?

So far tested just on the new testcase.

2020-10-22  Jakub Jelinek  <jakub@redhat.com>

gcc/
	* lto-streamer.h (omp_lto_output_declare_variant_alt,
	omp_lto_input_declare_variant_alt): Declare variant.
	* symtab.c (symtab_node::get_partitioning_class): Return
	SYMBOL_DUPLICATE for declare_variant_alt nodes.
	* passes.c (ipa_write_summaries): Add declare_variant_alt to
	partition.
	* lto-cgraph.c (output_refs): Call omp_lto_output_declare_variant_alt
	on declare_variant_alt nodes.
	(input_refs): Call omp_lto_input_declare_variant_alt on
	declare_variant_alt nodes.
	* lto-streamer-out.c (output_function): Don't call
	collect_block_tree_leafs if DECL_INITIAL is error_mark_node.
	(lto_output): Call output_function even for declare_variant_alt
	nodes.
	* omp-general.c (omp_lto_output_declare_variant_alt,
	omp_lto_input_declare_variant_alt): New functions.
gcc/lto/
	* lto-common.c (lto_fixup_prevailing_decls): Don't use
	LTO_NO_PREVAIL on TREE_LIST's TREE_PURPOSE.
	* lto-partition.c (lto_balanced_map): Treat declare_variant_alt
	nodes like definitions.
libgomp/
	* testsuite/libgomp.c/declare-variant-1.c: New test.

--- gcc/lto-streamer.h.jj	2020-10-20 13:11:56.669053784 +0200
+++ gcc/lto-streamer.h	2020-10-22 11:17:37.806472939 +0200
@@ -927,6 +927,12 @@ bool reachable_from_this_partition_p (st
 lto_symtab_encoder_t compute_ltrans_boundary (lto_symtab_encoder_t encoder);
 void select_what_to_stream (void);
 
+/* In omp-general.c.  */
+void omp_lto_output_declare_variant_alt (lto_simple_output_block *,
+					 cgraph_node *, lto_symtab_encoder_t);
+void omp_lto_input_declare_variant_alt (lto_input_block *, cgraph_node *,
+					vec<symtab_node *>);
+
 /* In options-save.c.  */
 void cl_target_option_stream_out (struct output_block *, struct bitpack_d *,
 				  struct cl_target_option *);
--- gcc/symtab.c.jj	2020-10-20 13:11:56.670053770 +0200
+++ gcc/symtab.c	2020-10-22 11:17:37.824472676 +0200
@@ -1998,7 +1998,7 @@ symtab_node::get_partitioning_class (voi
   if (DECL_ABSTRACT_P (decl))
     return SYMBOL_EXTERNAL;
 
-  if (cnode && cnode->inlined_to)
+  if (cnode && (cnode->inlined_to || cnode->declare_variant_alt))
     return SYMBOL_DUPLICATE;
 
   /* Transparent aliases are always duplicated.  */
--- gcc/passes.c.jj	2020-08-27 18:42:35.622711897 +0200
+++ gcc/passes.c	2020-10-22 12:25:38.173798438 +0200
@@ -2722,7 +2722,8 @@ ipa_write_summaries (void)
     {
       struct cgraph_node *node = order[i];
 
-      if (node->definition && node->need_lto_streaming)
+      if ((node->definition || node->declare_variant_alt)
+	  && node->need_lto_streaming)
 	{
 	  if (gimple_has_body_p (node->decl))
 	    lto_prepare_function_for_streaming (node);
--- gcc/lto-cgraph.c.jj	2020-10-20 13:11:56.664053856 +0200
+++ gcc/lto-cgraph.c	2020-10-22 11:17:37.806472939 +0200
@@ -766,6 +766,9 @@ output_refs (lto_symtab_encoder_t encode
 	  for (int i = 0; node->iterate_reference (i, ref); i++)
 	    lto_output_ref (ob, ref, encoder);
 	}
+      if (cgraph_node *cnode = dyn_cast <cgraph_node *> (node))
+	if (cnode->declare_variant_alt)
+	  omp_lto_output_declare_variant_alt (ob, cnode, encoder);
     }
 
   streamer_write_uhwi_stream (ob->main_stream, 0);
@@ -1614,6 +1617,9 @@ input_refs (class lto_input_block *ib,
 	  input_ref (ib, node, nodes);
 	  count--;
 	}
+      if (cgraph_node *cnode = dyn_cast <cgraph_node *> (node))
+	if (cnode->declare_variant_alt)
+	  omp_lto_input_declare_variant_alt (ib, cnode, nodes);
     }
 }
 	    
--- gcc/lto-streamer-out.c.jj	2020-09-10 20:55:43.000000000 +0200
+++ gcc/lto-streamer-out.c	2020-10-22 12:30:40.129382789 +0200
@@ -2424,7 +2424,7 @@ output_function (struct cgraph_node *nod
   /* As we do not recurse into BLOCK_SUBBLOCKS but only BLOCK_SUPERCONTEXT
      collect block tree leafs and stream those.  */
   auto_vec<tree> block_tree_leafs;
-  if (DECL_INITIAL (function))
+  if (DECL_INITIAL (function) && DECL_INITIAL (function) != error_mark_node)
     collect_block_tree_leafs (DECL_INITIAL (function), block_tree_leafs);
   streamer_write_uhwi (ob, block_tree_leafs.length ());
   for (unsigned i = 0; i < block_tree_leafs.length (); ++i)
@@ -2788,7 +2788,8 @@ lto_output (void)
 		  && flag_incremental_link != INCREMENTAL_LINK_LTO)
 	      /* Thunks have no body but they may be synthetized
 		 at WPA time.  */
-	      || DECL_ARGUMENTS (cnode->decl)))
+	      || DECL_ARGUMENTS (cnode->decl)
+	      || cnode->declare_variant_alt))
 	output_function (cnode);
       else if ((vnode = dyn_cast <varpool_node *> (snode))
 	       && (DECL_INITIAL (vnode->decl) != error_mark_node
--- gcc/lto/lto-common.c.jj	2020-08-27 18:42:35.620711925 +0200
+++ gcc/lto/lto-common.c	2020-10-22 11:18:14.388937402 +0200
@@ -2592,7 +2592,6 @@ lto_fixup_prevailing_decls (tree t)
 	case TREE_LIST:
 	  LTO_SET_PREVAIL (TREE_VALUE (t));
 	  LTO_SET_PREVAIL (TREE_PURPOSE (t));
-	  LTO_NO_PREVAIL (TREE_PURPOSE (t));
 	  break;
 	default:
 	  gcc_unreachable ();
--- gcc/lto/lto-partition.c.jj	2020-01-17 09:31:28.000000000 +0100
+++ gcc/lto/lto-partition.c	2020-10-22 12:50:07.522311010 +0200
@@ -593,7 +593,8 @@ lto_balanced_map (int n_lto_partitions,
 
 	      last_visited_node++;
 
-	      gcc_assert (node->definition || node->weakref);
+	      gcc_assert (node->definition || node->weakref
+			  || node->declare_variant_alt);
 
 	      /* Compute boundary cost of callgraph edges.  */
 	      for (edge = node->callees; edge; edge = edge->next_callee)
@@ -704,7 +705,7 @@ lto_balanced_map (int n_lto_partitions,
 		int index;
 
 		node = dyn_cast <cgraph_node *> (ref->referring);
-		gcc_assert (node->definition);
+		gcc_assert (node->definition || node->declare_variant_alt);
 		index = lto_symtab_encoder_lookup (partition->encoder,
 						   node);
 		if (index != LCC_NOT_FOUND
--- gcc/omp-general.c.jj	2020-10-20 13:11:56.669053784 +0200
+++ gcc/omp-general.c	2020-10-22 11:17:37.807472924 +0200
@@ -42,6 +42,8 @@ along with GCC; see the file COPYING3.
 #include "tree-pass.h"
 #include "omp-device-properties.h"
 #include "tree-iterator.h"
+#include "data-streamer.h"
+#include "streamer-hooks.h"
 
 enum omp_requires omp_requires_mask;
 
@@ -2337,6 +2339,125 @@ omp_resolve_declare_variant (tree base)
 	  ? TREE_PURPOSE (TREE_VALUE (variant1)) : base);
 }
 
+void
+omp_lto_output_declare_variant_alt (lto_simple_output_block *ob,
+				    cgraph_node *node,
+				    lto_symtab_encoder_t encoder)
+{
+  gcc_assert (node->declare_variant_alt);
+
+  omp_declare_variant_base_entry entry;
+  entry.base = NULL;
+  entry.node = node;
+  entry.variants = NULL;
+  omp_declare_variant_base_entry *entryp
+    = omp_declare_variant_alt->find_with_hash (&entry, DECL_UID (node->decl));
+  gcc_assert (entryp);
+
+  int nbase = lto_symtab_encoder_lookup (encoder, entryp->base);
+  gcc_assert (nbase != LCC_NOT_FOUND);
+  streamer_write_hwi_stream (ob->main_stream, nbase);
+
+  streamer_write_hwi_stream (ob->main_stream, entryp->variants->length ());
+
+  unsigned int i;
+  omp_declare_variant_entry *varentry;
+  FOR_EACH_VEC_SAFE_ELT (entryp->variants, i, varentry)
+    {
+      int nvar = lto_symtab_encoder_lookup (encoder, varentry->variant);
+      gcc_assert (nvar != LCC_NOT_FOUND);
+      streamer_write_hwi_stream (ob->main_stream, nvar);
+
+      for (widest_int *w = &varentry->score; ;
+	   w = &varentry->score_in_declare_simd_clone)
+	{
+	  unsigned len = w->get_len ();
+	  streamer_write_hwi_stream (ob->main_stream, len);
+	  const HOST_WIDE_INT *val = w->get_val ();
+	  for (unsigned j = 0; j < len; j++)
+	    streamer_write_hwi_stream (ob->main_stream, val[j]);
+	  if (w == &varentry->score_in_declare_simd_clone)
+	    break;
+	}
+
+      HOST_WIDE_INT cnt = -1;
+      HOST_WIDE_INT i = varentry->matches ? 1 : 0;
+      for (tree attr = DECL_ATTRIBUTES (entryp->base->decl);
+	   attr; attr = TREE_CHAIN (attr), i += 2)
+	{
+	  attr = lookup_attribute ("omp declare variant base", attr);
+	  if (attr == NULL_TREE)
+	    break;
+
+	  if (varentry->ctx == TREE_VALUE (TREE_VALUE (attr)))
+	    {
+	      cnt = i;
+	      break;
+	    }
+	}
+
+      gcc_assert (cnt != -1);
+      streamer_write_hwi_stream (ob->main_stream, cnt);
+    }
+}
+
+void
+omp_lto_input_declare_variant_alt (lto_input_block *ib, cgraph_node *node,
+				   vec<symtab_node *> nodes)
+{
+  gcc_assert (node->declare_variant_alt);
+  omp_declare_variant_base_entry *entryp
+    = ggc_cleared_alloc<omp_declare_variant_base_entry> ();
+  entryp->base = dyn_cast<cgraph_node *> (nodes[streamer_read_hwi (ib)]);
+  entryp->node = node;
+  unsigned int len = streamer_read_hwi (ib);
+  vec_alloc (entryp->variants, len);
+
+  for (unsigned int i = 0; i < len; i++)
+    {
+      omp_declare_variant_entry varentry;
+      varentry.variant
+	= dyn_cast<cgraph_node *> (nodes[streamer_read_hwi (ib)]);
+      for (widest_int *w = &varentry.score; ;
+	   w = &varentry.score_in_declare_simd_clone)
+	{
+	  unsigned len2 = streamer_read_hwi (ib);
+	  HOST_WIDE_INT arr[WIDE_INT_MAX_ELTS];
+	  gcc_assert (len2 <= WIDE_INT_MAX_ELTS);
+	  for (unsigned int j = 0; j < len2; j++)
+	    arr[j] = streamer_read_hwi (ib);
+	  *w = widest_int::from_array (arr, len2, true);
+	  if (w == &varentry.score_in_declare_simd_clone)
+	    break;
+	}
+
+      HOST_WIDE_INT cnt = streamer_read_hwi (ib);
+      HOST_WIDE_INT j = 0;
+      varentry.ctx = NULL_TREE;
+      varentry.matches = (cnt & 1) ? true : false;
+      cnt &= ~HOST_WIDE_INT_1;
+      for (tree attr = DECL_ATTRIBUTES (entryp->base->decl);
+	   attr; attr = TREE_CHAIN (attr), j += 2)
+	{
+	  attr = lookup_attribute ("omp declare variant base", attr);
+	  if (attr == NULL_TREE)
+	    break;
+
+	  if (cnt == j)
+	    {
+	      varentry.ctx = TREE_VALUE (TREE_VALUE (attr));
+	      break;
+	    }
+	}
+      gcc_assert (varentry.ctx != NULL_TREE);
+      entryp->variants->quick_push (varentry);
+    }
+  if (omp_declare_variant_alt == NULL)
+    omp_declare_variant_alt
+      = hash_table<omp_declare_variant_alt_hasher>::create_ggc (64);
+  *omp_declare_variant_alt->find_slot_with_hash (entryp, DECL_UID (node->decl),
+						 INSERT) = entryp;
+}
 
 /* Encode an oacc launch argument.  This matches the GOMP_LAUNCH_PACK
    macro on gomp-constants.h.  We do not check for overflow.  */
--- libgomp/testsuite/libgomp.c/declare-variant-1.c.jj	2020-10-22 12:37:19.528542176 +0200
+++ libgomp/testsuite/libgomp.c/declare-variant-1.c	2020-10-22 13:01:52.875987620 +0200
@@ -0,0 +1,54 @@
+/* { dg-do link { target vect_simd_clones } } */
+/* { dg-require-effective-target lto } */
+/* { dg-require-effective-target fpic } */
+/* { dg-require-effective-target shared } */
+/* { dg-additional-options "-fdump-tree-gimple -fdump-tree-optimized -O2 -fPIC -shared -flto -flto-partition=one" } */
+/* { dg-additional-options "-mno-sse3" { target { i?86-*-* x86_64-*-* } } } */
+
+int
+f01 (int a)
+{
+  asm volatile ("" : "+g" (a) : "g" (1) : "memory");
+  return a;
+}
+
+int
+f02 (int a)
+{
+  asm volatile ("" : "+g" (a) : "g" (2) : "memory");
+  return a;
+}
+
+int
+f03 (int a)
+{
+  asm volatile ("" : "+g" (a) : "g" (3) : "memory");
+  return a;
+}
+
+#pragma omp declare variant (f01) match (device={isa("avx512f")}) /* 4 or 8 */
+#pragma omp declare variant (f02) match (implementation={vendor(score(3):gnu)},device={kind(cpu)}) /* (1 or 2) + 3 */
+#pragma omp declare variant (f03) match (implementation={vendor(score(5):gnu)},device={kind(host)}) /* (1 or 2) + 5 */
+int
+f04 (int a)
+{
+  asm volatile ("" : "+g" (a) : "g" (4) : "memory");
+  return a;
+}
+
+#pragma omp declare simd
+int
+test1 (int x)
+{
+  /* At gimplification time, we can't decide yet which function to call.  */
+  /* { dg-final { scan-tree-dump-times "f04 \\\(x" 2 "gimple" } } */
+  /* After simd clones are created, the original non-clone test1 shall
+     call f03 (score 6), the sse2/avx/avx2 clones too, but avx512f clones
+     shall call f01 with score 8.  */
+  /* { dg-final { scan-ltrans-tree-dump-not "f04 \\\(x" "optimized" } } */
+  /* { dg-final { scan-ltrans-tree-dump-times "f03 \\\(x" 14 "optimized" } } */
+  /* { dg-final { scan-ltrans-tree-dump-times "f01 \\\(x" 4 "optimized" } } */
+  int a = f04 (x);
+  int b = f04 (x);
+  return a + b;
+}


	Jakub


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-10-22 11:27 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-13 11:16 [RFC PATCH v2] cgraph support for late declare variant resolution Jakub Jelinek
2020-05-14 12:17 ` [RFH] LTO " Jakub Jelinek
2020-10-22 11:27   ` [PATCH] lto: " Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).