public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [gomp4.1] Handle new form of #pragma omp declare target
@ 2015-07-17 13:43 Jakub Jelinek
  2015-07-17 15:48 ` James Norris
                   ` (3 more replies)
  0 siblings, 4 replies; 48+ messages in thread
From: Jakub Jelinek @ 2015-07-17 13:43 UTC (permalink / raw)
  To: Ilya Verbin, Thomas Schwinge; +Cc: gcc-patches

Hi!

As the testcases show, #pragma omp declare target has now a new form (well,
two; with some issues on it pending), where it is used just as a single
declarative directive rather than a pair of them and allows marking
vars and functions by name as "omp declare target" vars/functions (which the
middle-end etc. already handles), but also "omp declare target link", which
is a deferred var, that is not initially mapped (on devices without shared
memory with host), but has to be mapped explicitly.

This patch only marks them with the new attribute, the actual middle-end
implementation needs to be implemented.

I believe OpenACC has something similar, but no idea if it is already
implemented.

Anyway, I think the implementation should be that in some pass running on
the ACCEL_COMPILER side (guarded by separate address space aka non-HSA)
we actually replace the variables with pointers to variables, then need
to somehow also mark those in the offloading tables, so that the library
registers them (the locations of the pointers to the vars), but also marks
them for special treatment, and then when actually trying to map them
(or their parts, guess that needs to be discussed) we allocate them or
whatever is requested and store the device pointer into the corresponding
variable.

Ilya, Thomas, thoughts on this?

2015-07-17  Jakub Jelinek  <jakub@redhat.com>

	* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_TO_DECLARE
	and OMP_CLAUSE_LINK.
	* tree.c (omp_clause_num_ops, omp_clause_code_name): Add entries for
	OMP_CLAUSE_{TO_DECLARE,LINK}.
	(walk_tree_1): Handle OMP_CLAUSE_{TO_DECLARE,LINK}.
	* tree-nested.c (convert_nonlocal_omp_clauses,
	convert_local_omp_clauses): Likewise.
	* tree-pretty-print.c (dump_omp_clause): Likewise.
c-family/
	* c-pragma.h (enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_LINK.
c/
	* c-parser.c (c_parser_omp_clause_name): Handle link clause.
	(c_parser_omp_variable_list): Formatting fix.
	(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_LINK.
	For PRAGMA_OMP_CLAUSE_TO, parse it as OMP_CLAUSE_TO_DECLARE
	rather than OMP_CLAUSE_TO if it is a declare target directive clause.
	(OMP_DECLARE_TARGET_CLAUSE_MASK): Define.
	(c_parser_omp_declare_target): Parse directive with clauses forms.
	* c-typeck.c (c_finish_omp_clauses): Handle
	OMP_CLAUSE_{TO_DECLARE,LINK}.
cp/
	* parser.c (cp_parser_omp_clause_name): Handle link clause.
	(cp_parser_omp_var_list_no_open): Formatting fix.
	(cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_LINK.
	For PRAGMA_OMP_CLAUSE_TO, parse it as OMP_CLAUSE_TO_DECLARE
	rather than OMP_CLAUSE_TO if it is a declare target directive clause.
	(OMP_DECLARE_TARGET_CLAUSE_MASK): Define.
	(cp_parser_omp_declare_target): Parse directive with clauses forms.
	* semantics.c (finish_omp_clauses): Handle
	OMP_CLAUSE_{TO_DECLARE,LINK}.
testsuite/
	* c-c++-common/gomp/declare-target-1.c: New test.
	* c-c++-common/gomp/declare-target-2.c: New test.

--- gcc/tree-core.h.jj	2015-07-15 13:02:31.000000000 +0200
+++ gcc/tree-core.h	2015-07-17 09:30:44.944431669 +0200
@@ -256,6 +256,13 @@ enum omp_clause_code {
   /* OpenMP clause: uniform (argument-list).  */
   OMP_CLAUSE_UNIFORM,
 
+  /* OpenMP clause: to (extended-list).
+     Only when it appears in declare target.  */
+  OMP_CLAUSE_TO_DECLARE,
+
+  /* OpenMP clause: link (variable-list).  */
+  OMP_CLAUSE_LINK,
+
   /* OpenMP clause: from (variable-list).  */
   OMP_CLAUSE_FROM,
 
--- gcc/tree.c.jj	2015-07-14 14:49:57.000000000 +0200
+++ gcc/tree.c	2015-07-17 09:33:51.270692623 +0200
@@ -288,6 +288,8 @@ unsigned const char omp_clause_num_ops[]
   2, /* OMP_CLAUSE_ALIGNED  */
   1, /* OMP_CLAUSE_DEPEND  */
   1, /* OMP_CLAUSE_UNIFORM  */
+  1, /* OMP_CLAUSE_TO_DECLARE  */
+  1, /* OMP_CLAUSE_LINK  */
   2, /* OMP_CLAUSE_FROM  */
   2, /* OMP_CLAUSE_TO  */
   2, /* OMP_CLAUSE_MAP  */
@@ -357,6 +359,8 @@ const char * const omp_clause_code_name[
   "aligned",
   "depend",
   "uniform",
+  "to",
+  "link",
   "from",
   "to",
   "map",
@@ -11392,6 +11396,8 @@ walk_tree_1 (tree *tp, walk_tree_fn func
 	case OMP_CLAUSE_GRAINSIZE:
 	case OMP_CLAUSE_NUM_TASKS:
 	case OMP_CLAUSE_HINT:
+	case OMP_CLAUSE_TO_DECLARE:
+	case OMP_CLAUSE_LINK:
 	case OMP_CLAUSE_USE_DEVICE_PTR:
 	case OMP_CLAUSE_IS_DEVICE_PTR:
 	case OMP_CLAUSE__LOOPTEMP_:
--- gcc/tree-nested.c.jj	2015-07-14 14:49:57.000000000 +0200
+++ gcc/tree-nested.c	2015-07-17 09:35:11.905507270 +0200
@@ -1098,6 +1098,8 @@ convert_nonlocal_omp_clauses (tree *pcla
 	case OMP_CLAUSE_FIRSTPRIVATE:
 	case OMP_CLAUSE_COPYPRIVATE:
 	case OMP_CLAUSE_SHARED:
+	case OMP_CLAUSE_TO_DECLARE:
+	case OMP_CLAUSE_LINK:
 	case OMP_CLAUSE_USE_DEVICE_PTR:
 	case OMP_CLAUSE_IS_DEVICE_PTR:
 	do_decl_clause:
@@ -1745,6 +1747,8 @@ convert_local_omp_clauses (tree *pclause
 	case OMP_CLAUSE_FIRSTPRIVATE:
 	case OMP_CLAUSE_COPYPRIVATE:
 	case OMP_CLAUSE_SHARED:
+	case OMP_CLAUSE_TO_DECLARE:
+	case OMP_CLAUSE_LINK:
 	case OMP_CLAUSE_USE_DEVICE_PTR:
 	case OMP_CLAUSE_IS_DEVICE_PTR:
 	do_decl_clause:
--- gcc/tree-pretty-print.c.jj	2015-07-15 13:02:31.000000000 +0200
+++ gcc/tree-pretty-print.c	2015-07-17 09:36:30.822347172 +0200
@@ -344,6 +344,12 @@ dump_omp_clause (pretty_printer *pp, tre
     case OMP_CLAUSE_USE_DEVICE:
       name = "use_device";
       goto print_remap;
+    case OMP_CLAUSE_TO_DECLARE:
+      name = "to";
+      goto print_remap;
+    case OMP_CLAUSE_LINK:
+      name = "link";
+      goto print_remap;
   print_remap:
       pp_string (pp, name);
       pp_left_paren (pp);
--- gcc/c-family/c-pragma.h.jj	2015-07-14 14:49:57.000000000 +0200
+++ gcc/c-family/c-pragma.h	2015-07-17 09:21:03.190983600 +0200
@@ -101,6 +101,7 @@ typedef enum pragma_omp_clause {
   PRAGMA_OMP_CLAUSE_IS_DEVICE_PTR,
   PRAGMA_OMP_CLAUSE_LASTPRIVATE,
   PRAGMA_OMP_CLAUSE_LINEAR,
+  PRAGMA_OMP_CLAUSE_LINK,
   PRAGMA_OMP_CLAUSE_MAP,
   PRAGMA_OMP_CLAUSE_MERGEABLE,
   PRAGMA_OMP_CLAUSE_NOGROUP,
--- gcc/c/c-parser.c.jj	2015-07-16 18:09:25.000000000 +0200
+++ gcc/c/c-parser.c	2015-07-17 14:11:08.553694975 +0200
@@ -9953,6 +9953,8 @@ c_parser_omp_clause_name (c_parser *pars
 	    result = PRAGMA_OMP_CLAUSE_LASTPRIVATE;
 	  else if (!strcmp ("linear", p))
 	    result = PRAGMA_OMP_CLAUSE_LINEAR;
+	  else if (!strcmp ("link", p))
+	    result = PRAGMA_OMP_CLAUSE_LINK;
 	  break;
 	case 'm':
 	  if (!strcmp ("map", p))
@@ -10235,7 +10237,7 @@ c_parser_omp_variable_list (c_parser *pa
 			  && !TREE_READONLY (low_bound))
 			{
 			  error_at (clause_loc,
-					"%qD is not a constant", low_bound);
+				    "%qD is not a constant", low_bound);
 			  t = error_mark_node;
 			}
 
@@ -10243,7 +10245,7 @@ c_parser_omp_variable_list (c_parser *pa
 			  && !TREE_READONLY (length))
 			{
 			  error_at (clause_loc,
-					"%qD is not a constant", length);
+				    "%qD is not a constant", length);
 			  t = error_mark_node;
 			}
 		    }
@@ -12600,8 +12602,18 @@ c_parser_omp_all_clauses (c_parser *pars
 	  if (!first)
 	    goto clause_not_first;
 	  break;
+	case PRAGMA_OMP_CLAUSE_LINK:
+	  clauses
+	    = c_parser_omp_var_list_parens (parser, OMP_CLAUSE_LINK, clauses);
+	  c_name = "link";
+	  break;
 	case PRAGMA_OMP_CLAUSE_TO:
-	  clauses = c_parser_omp_clause_to (parser, clauses);
+	  if ((mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_LINK)) != 0)
+	    clauses
+	      = c_parser_omp_var_list_parens (parser, OMP_CLAUSE_TO_DECLARE,
+					      clauses);
+	  else
+	    clauses = c_parser_omp_clause_to (parser, clauses);
 	  c_name = "to";
 	  break;
 	case PRAGMA_OMP_CLAUSE_FROM:
@@ -15313,13 +15325,64 @@ c_finish_omp_declare_simd (c_parser *par
 /* OpenMP 4.0:
    # pragma omp declare target new-line
    declarations and definitions
-   # pragma omp end declare target new-line  */
+   # pragma omp end declare target new-line
+
+   OpenMP 4.1:
+   # pragma omp declare target ( extended-list ) new-line
+
+   # pragma omp declare target declare-target-clauses[seq] new-line  */
+
+#define OMP_DECLARE_TARGET_CLAUSE_MASK				\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_TO)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_LINK))
 
 static void
 c_parser_omp_declare_target (c_parser *parser)
 {
-  c_parser_skip_to_pragma_eol (parser);
-  current_omp_declare_target_attribute++;
+  location_t loc = c_parser_peek_token (parser)->location;
+  tree clauses = NULL_TREE;
+  if (c_parser_next_token_is (parser, CPP_NAME))
+    clauses = c_parser_omp_all_clauses (parser, OMP_DECLARE_TARGET_CLAUSE_MASK,
+					"#pragma omp declare target");
+  else if (c_parser_next_token_is (parser, CPP_OPEN_PAREN))
+    {
+      clauses = c_parser_omp_var_list_parens (parser, OMP_CLAUSE_TO_DECLARE,
+					      clauses);
+      c_parser_skip_to_pragma_eol (parser);
+    }
+  else
+    {
+      c_parser_skip_to_pragma_eol (parser);
+      current_omp_declare_target_attribute++;
+      return;
+    }
+  if (current_omp_declare_target_attribute)
+    error_at (loc, "%<#pragma omp declare target%> with clauses in between "
+		   "%<#pragma omp declare target%> without clauses and "
+		   "%<#pragma omp end declare target%>");
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      tree t = OMP_CLAUSE_DECL (c), id;
+      tree at1 = lookup_attribute ("omp declare target", DECL_ATTRIBUTES (t));
+      tree at2 = lookup_attribute ("omp declare target link",
+				   DECL_ATTRIBUTES (t));
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LINK)
+	{
+	  id = get_identifier ("omp declare target link");
+	  std::swap (at1, at2);
+	}
+      else
+	id = get_identifier ("omp declare target");
+      if (at2)
+	{
+	  error_at (OMP_CLAUSE_LOCATION (c),
+		    "%qD specified both in declare target %<link%> and %<to%>"
+		    " clauses", t);
+	  continue;
+	}
+      if (!at1)
+	DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+    }
 }
 
 static void
--- gcc/c/c-typeck.c.jj	2015-07-15 13:00:32.000000000 +0200
+++ gcc/c/c-typeck.c	2015-07-17 13:06:58.297769199 +0200
@@ -12576,6 +12576,36 @@ c_finish_omp_clauses (tree clauses, bool
 	    bitmap_set_bit (&map_head, DECL_UID (t));
 	  break;
 
+	case OMP_CLAUSE_TO_DECLARE:
+	  t = OMP_CLAUSE_DECL (c);
+	  if (TREE_CODE (t) == FUNCTION_DECL)
+	    break;
+	  /* FALLTHRU */
+	case OMP_CLAUSE_LINK:
+	  t = OMP_CLAUSE_DECL (c);
+	  if (!VAR_P (t))
+	    {
+	      error_at (OMP_CLAUSE_LOCATION (c),
+			"%qE is not a variable in clause %qs", t,
+			omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
+	      remove = true;
+	    }
+	  else if (DECL_THREAD_LOCAL_P (t))
+	    {
+	      error_at (OMP_CLAUSE_LOCATION (c),
+			"%qD is threadprivate variable in %qs clause", t,
+			omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
+	      remove = true;
+	    }
+	  else if (!lang_hooks.types.omp_mappable_type (TREE_TYPE (t)))
+	    {
+	      error_at (OMP_CLAUSE_LOCATION (c),
+			"%qD does not have a mappable type in %qs clause", t,
+			omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
+	      remove = true;
+	    }
+	  break;
+
 	case OMP_CLAUSE_UNIFORM:
 	  t = OMP_CLAUSE_DECL (c);
 	  if (TREE_CODE (t) != PARM_DECL)
--- gcc/cp/parser.c.jj	2015-07-16 18:09:25.000000000 +0200
+++ gcc/cp/parser.c	2015-07-17 14:04:34.945101113 +0200
@@ -27748,6 +27748,8 @@ cp_parser_omp_clause_name (cp_parser *pa
 	    result = PRAGMA_OMP_CLAUSE_LASTPRIVATE;
 	  else if (!strcmp ("linear", p))
 	    result = PRAGMA_OMP_CLAUSE_LINEAR;
+	  else if (!strcmp ("link", p))
+	    result = PRAGMA_OMP_CLAUSE_LINK;
 	  break;
 	case 'm':
 	  if (!strcmp ("map", p))
@@ -27987,7 +27989,7 @@ cp_parser_omp_var_list_no_open (cp_parse
 			  && !TREE_READONLY (low_bound))
 			{
 			  error_at (token->location,
-					"%qD is not a constant", low_bound);
+				    "%qD is not a constant", low_bound);
 			  decl = error_mark_node;
 			}
 
@@ -27995,7 +27997,7 @@ cp_parser_omp_var_list_no_open (cp_parse
 			  && !TREE_READONLY (length))
 			{
 			  error_at (token->location,
-					"%qD is not a constant", length);
+				    "%qD is not a constant", length);
 			  decl = error_mark_node;
 			}
 		    }
@@ -30198,14 +30200,20 @@ cp_parser_omp_all_clauses (cp_parser *pa
 	  if (!first)
 	    goto clause_not_first;
 	  break;
+	case PRAGMA_OMP_CLAUSE_LINK:
+	  clauses = cp_parser_omp_var_list (parser, OMP_CLAUSE_LINK, clauses);
+	  c_name = "to";
+	  break;
 	case PRAGMA_OMP_CLAUSE_TO:
-	  clauses = cp_parser_omp_var_list (parser, OMP_CLAUSE_TO,
-					    clauses);
+	  if ((mask & (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_LINK)) != 0)
+	    clauses = cp_parser_omp_var_list (parser, OMP_CLAUSE_TO_DECLARE,
+					      clauses);
+	  else
+	    clauses = cp_parser_omp_var_list (parser, OMP_CLAUSE_TO, clauses);
 	  c_name = "to";
 	  break;
 	case PRAGMA_OMP_CLAUSE_FROM:
-	  clauses = cp_parser_omp_var_list (parser, OMP_CLAUSE_FROM,
-					    clauses);
+	  clauses = cp_parser_omp_var_list (parser, OMP_CLAUSE_FROM, clauses);
 	  c_name = "from";
 	  break;
 	case PRAGMA_OMP_CLAUSE_UNIFORM:
@@ -33168,13 +33176,65 @@ cp_parser_late_parsing_omp_declare_simd
 /* OpenMP 4.0:
    # pragma omp declare target new-line
    declarations and definitions
-   # pragma omp end declare target new-line  */
+   # pragma omp end declare target new-line
+
+   OpenMP 4.1:
+   # pragma omp declare target ( extended-list ) new-line
+
+   # pragma omp declare target declare-target-clauses[seq] new-line  */
+
+#define OMP_DECLARE_TARGET_CLAUSE_MASK				\
+	( (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_TO)		\
+	| (OMP_CLAUSE_MASK_1 << PRAGMA_OMP_CLAUSE_LINK))
 
 static void
 cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok)
 {
-  cp_parser_skip_to_pragma_eol (parser, pragma_tok);
-  scope_chain->omp_declare_target_attribute++;
+  tree clauses = NULL_TREE;
+  if (cp_lexer_next_token_is (parser->lexer, CPP_NAME))
+    clauses
+      = cp_parser_omp_all_clauses (parser, OMP_DECLARE_TARGET_CLAUSE_MASK,
+				   "#pragma omp declare target", pragma_tok);
+  else if (cp_lexer_next_token_is (parser->lexer, CPP_OPEN_PAREN))
+    {
+      clauses = cp_parser_omp_var_list (parser, OMP_CLAUSE_TO_DECLARE,
+					clauses);
+      cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+    }
+  else
+    {
+      cp_parser_skip_to_pragma_eol (parser, pragma_tok);
+      scope_chain->omp_declare_target_attribute++;
+      return;
+    }
+  if (scope_chain->omp_declare_target_attribute)
+    error_at (pragma_tok->location,
+	      "%<#pragma omp declare target%> with clauses in between "
+	      "%<#pragma omp declare target%> without clauses and "
+	      "%<#pragma omp end declare target%>");
+  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
+    {
+      tree t = OMP_CLAUSE_DECL (c), id;
+      tree at1 = lookup_attribute ("omp declare target", DECL_ATTRIBUTES (t));
+      tree at2 = lookup_attribute ("omp declare target link",
+				   DECL_ATTRIBUTES (t));
+      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LINK)
+	{
+	  id = get_identifier ("omp declare target link");
+	  std::swap (at1, at2);
+	}
+      else
+	id = get_identifier ("omp declare target");
+      if (at2)
+	{
+	  error_at (OMP_CLAUSE_LOCATION (c),
+		    "%qD specified both in declare target %<link%> and %<to%>"
+		    " clauses", t);
+	  continue;
+	}
+      if (!at1)
+	DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+    }
 }
 
 static void
--- gcc/cp/semantics.c.jj	2015-07-16 17:56:41.000000000 +0200
+++ gcc/cp/semantics.c	2015-07-17 13:59:27.177346223 +0200
@@ -6266,6 +6266,36 @@ finish_omp_clauses (tree clauses, bool a
 	    bitmap_set_bit (&map_head, DECL_UID (t));
 	  break;
 
+	case OMP_CLAUSE_TO_DECLARE:
+	  t = OMP_CLAUSE_DECL (c);
+	  if (TREE_CODE (t) == FUNCTION_DECL)
+	    break;
+	  /* FALLTHRU */
+	case OMP_CLAUSE_LINK:
+	  t = OMP_CLAUSE_DECL (c);
+	  if (!VAR_P (t))
+	    {
+	      error_at (OMP_CLAUSE_LOCATION (c),
+			"%qE is not a variable in clause %qs", t,
+			omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
+	      remove = true;
+	    }
+	  else if (DECL_THREAD_LOCAL_P (t))
+	    {
+	      error_at (OMP_CLAUSE_LOCATION (c),
+			"%qD is threadprivate variable in %qs clause", t,
+			omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
+	      remove = true;
+	    }
+	  else if (!cp_omp_mappable_type (TREE_TYPE (t)))
+	    {
+	      error_at (OMP_CLAUSE_LOCATION (c),
+			"%qD does not have a mappable type in %qs clause", t,
+			omp_clause_code_name[OMP_CLAUSE_CODE (c)]);
+	      remove = true;
+	    }
+	  break;
+
 	case OMP_CLAUSE_UNIFORM:
 	  t = OMP_CLAUSE_DECL (c);
 	  if (TREE_CODE (t) != PARM_DECL)
--- gcc/testsuite/c-c++-common/gomp/declare-target-1.c.jj	2015-07-17 14:07:10.523953776 +0200
+++ gcc/testsuite/c-c++-common/gomp/declare-target-1.c	2015-07-17 14:07:30.472678409 +0200
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-fopenmp" } */
+
+int foo (void), bar (void);
+extern int a;
+int b;
+char d;
+#pragma omp declare target
+long c;
+#pragma omp end declare target
+
+#pragma omp declare target (bar, a)
+#pragma omp declare target to (b) link (d) to (foo)
--- gcc/testsuite/c-c++-common/gomp/declare-target-2.c.jj	2015-07-17 14:23:16.246720738 +0200
+++ gcc/testsuite/c-c++-common/gomp/declare-target-2.c	2015-07-17 14:21:32.000000000 +0200
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-fopenmp" } */
+
+extern int a;
+#pragma omp declare target
+#pragma omp declare target to (a)		/* { dg-error "with clauses in between" } */
+#pragma omp end declare target
+int b;
+#pragma omp declare target to (b) link (b)	/* { dg-error "specified both in declare target" } */
+int c;
+#pragma omp declare target (c)
+#pragma omp declare target link (c)		/* { dg-error "specified both in declare target" } */
+int foo (void);
+#pragma omp declare target link (foo)		/* { dg-error "is not a variable in clause" } */
+struct S;
+extern struct S d[];				/* { dg-error "array type has incomplete element type" "" { target c } } */
+#pragma omp declare target to (d)		/* { dg-error "does not have a mappable type in" } */
+extern struct S e;
+#pragma omp declare target link (e)		/* { dg-error "does not have a mappable type in" } */
+extern int f[];
+#pragma omp declare target to (f)		/* { dg-error "does not have a mappable type in" } */
+int g, h;
+#pragma omp threadprivate (g, h)
+#pragma omp declare target to (g)		/* { dg-error "is threadprivate variable in" } */
+#pragma omp declare target link (h)		/* { dg-error "is threadprivate variable in" } */
+int j[10];
+#pragma omp declare target to (j[0:4])		/* { dg-error "expected" } */

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.1] Handle new form of #pragma omp declare target
  2015-07-17 13:43 [gomp4.1] Handle new form of #pragma omp declare target Jakub Jelinek
@ 2015-07-17 15:48 ` James Norris
  2015-10-26 18:45 ` Ilya Verbin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 48+ messages in thread
From: James Norris @ 2015-07-17 15:48 UTC (permalink / raw)
  To: Jakub Jelinek, Ilya Verbin, Thomas Schwinge; +Cc: gcc-patches

Jakub,

On 07/17/2015 08:05 AM, Jakub Jelinek wrote:
> Hi!
>
> ...
>
> I believe OpenACC has something similar, but no idea if it is already
> implemented.

Yes, it is implemented in gomp-4_0-branch.

While the purpose for 'omp declare target' and 'acc declare' are 
similar, the data movement, via the clauses, provided with the latter 
make it very different than the former.

The data movement requires that data be moved at the entry and
exit of an 'associated region'. Associated region to mean either
a function, subroutine, entire program or Fortran module. I choose
to implement this in the front-ends.

For discussion purposes, I'll use the C front-end: c_parser_oacc_declare 
and finish_oacc_declare.

As far as the syntax, OpenMP is alot easier to deal with than
OpenACC. The handling of said is reflected in c_parser_oacc_declare.
Here also is the handling of the numerous data movement clauses.
One in particular requires special handling: create. This can
be seen toward the end of the function, There is a libgomp
component GOACC_register_static (oacc-parallel.c) that is used
in conjunction with the create clause.

The creation and deletion of the 'associated region' is done
in finish_oacc_declare. Depending upon where the directive was
found requires different handling, i.e., global variable scope
versus local variable scope. In addition, if there is data
movement from target -> host, this must be handled appropriately.

>
>...
>
> Ilya, Thomas, thoughts on this?
>

Jim answering at the behest of Thomas....

If the above explanation is not sufficient please yell. It may
make more sense to carve out the code in question and document
it more thoroughly for discussion purposes. Also the implementation 
approach in the front-ends may be entirely wrong. There may be an 
approach to do it in the 'middle'. However, my lack of experience in the 
middle may have caused me to go down the wrong path.

Jim



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.1] Handle new form of #pragma omp declare target
  2015-07-17 13:43 [gomp4.1] Handle new form of #pragma omp declare target Jakub Jelinek
  2015-07-17 15:48 ` James Norris
@ 2015-10-26 18:45 ` Ilya Verbin
  2015-10-26 19:11   ` Jakub Jelinek
  2015-10-27 21:15 ` [gomp4.1] Handle new form of #pragma omp declare target Ilya Verbin
  2015-11-23 11:33 ` Thomas Schwinge
  3 siblings, 1 reply; 48+ messages in thread
From: Ilya Verbin @ 2015-10-26 18:45 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Thomas Schwinge, gcc-patches

On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote:
> As the testcases show, #pragma omp declare target has now a new form (well,
> two; with some issues on it pending), where it is used just as a single
> declarative directive rather than a pair of them and allows marking
> vars and functions by name as "omp declare target" vars/functions (which the
> middle-end etc. already handles), but also "omp declare target link", which
> is a deferred var, that is not initially mapped (on devices without shared
> memory with host), but has to be mapped explicitly.

I don't quite understand how link should work.  OpenMP 4.5 says:

"The list items of a link clause are not mapped by the declare target directive.
Instead, their mapping is deferred until they are mapped by target data or
target constructs. They are mapped only for such regions."

But doesn't this mean that the example bellow should work identically
with/without USE_LINK defined?  Or is there some difference on other testcases?

int a = 1;

#ifdef USE_LINK
#pragma omp declare target link(a)
#endif

int main ()
{
  a = 2;
  int res;
  #pragma omp target map(to: a) map(from: res)
    res = a;
  return res;
}

> This patch only marks them with the new attribute, the actual middle-end
> implementation needs to be implemented.
> 
> I believe OpenACC has something similar, but no idea if it is already
> implemented.
> 
> Anyway, I think the implementation should be that in some pass running on
> the ACCEL_COMPILER side (guarded by separate address space aka non-HSA)

HSA does not define ACCEL_COMPILER, because it uses only one compiler.

> we actually replace the variables with pointers to variables, then need
> to somehow also mark those in the offloading tables, so that the library

I see 2 possible options: use the MSB of the size, or introduce the third field
for flags.

> registers them (the locations of the pointers to the vars), but also marks
> them for special treatment, and then when actually trying to map them
> (or their parts, guess that needs to be discussed) we allocate them or
> whatever is requested and store the device pointer into the corresponding
> variable.
> 
> Ilya, Thomas, thoughts on this?

  -- Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.1] Handle new form of #pragma omp declare target
  2015-10-26 18:45 ` Ilya Verbin
@ 2015-10-26 19:11   ` Jakub Jelinek
  2015-10-26 19:49     ` Ilya Verbin
  0 siblings, 1 reply; 48+ messages in thread
From: Jakub Jelinek @ 2015-10-26 19:11 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, gcc-patches

On Mon, Oct 26, 2015 at 09:35:52PM +0300, Ilya Verbin wrote:
> On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote:
> > As the testcases show, #pragma omp declare target has now a new form (well,
> > two; with some issues on it pending), where it is used just as a single
> > declarative directive rather than a pair of them and allows marking
> > vars and functions by name as "omp declare target" vars/functions (which the
> > middle-end etc. already handles), but also "omp declare target link", which
> > is a deferred var, that is not initially mapped (on devices without shared
> > memory with host), but has to be mapped explicitly.
> 
> I don't quite understand how link should work.  OpenMP 4.5 says:
> 
> "The list items of a link clause are not mapped by the declare target directive.
> Instead, their mapping is deferred until they are mapped by target data or
> target constructs. They are mapped only for such regions."
>
> But doesn't this mean that the example bellow should work identically
> with/without USE_LINK defined?  Or is there some difference on other testcases?

On your testcase, the end result is pretty much the same, the variable is
not mapped initially to the device, and at the beginning of omp target it is
mapped to device, at the end of the region it is unmapped from the device
(without copying back).

But consider:

int a = 1, b = 1;
#pragma omp declare target link (a) to (b)
int
foo (void)
{
  return a++ + b++;
}
#pragma omp declare target to (foo)
int
main ()
{
  a = 2;
  b = 2;
  int res;
  #pragma omp target map (to: a, b) map (from: res)
  {
    res = foo () + foo ();
  }
  // This assumes only non-shared address space, so would need to be guarded
  // for that.
  if (res != (2 + 1) + (3 + 2))
    __builtin_abort ();
  return 0;
}

Without declare target link or to, you can't use the global variables
in orphaned accelerated routines (unless you e.g. take the address of the
mapped variable in the region and pass it around).
The to variables (non-deferred) are always mapped and are initialized with
the original initializer, refcount is infinity.  link (deferred) work more
like the normal mapping, referencing those vars when they aren't explicitly
(or implicitly) mapped is unspecified behavior, if it is e.g. mapped freshly
with to kind, it gets the current value of the host var rather than the
original one.  But, beyond the mapping the compiler needs to ensure that
all uses of the link global var (or perhaps just all uses of the link global
var outside of the target construct body where it is mapped, because you
could use there the pointer you got from GOMP_target) are replaced by
dereference of some artificial pointer, so a becomes *a_tmp and &a becomes
&*a_tmp, and that the runtime library during registration of the tables is
told about the address of this artificial pointer.  During registration,
I'd expect it would stick an entry for this range into the table, with some
special flag or something similar, indicating that it is deferred mapping
and where the offloading device pointer is.  During mapping, it would map it
as any other not yet mapped object, but additionally would also set this
device pointer to the device address of the mapped object.  We also need to
ensure that when we drop the refcount of that mapping back to 0, we get it
back to the state where it is described as a range with registered deferred
mapping and where the device pointer is.

> > This patch only marks them with the new attribute, the actual middle-end
> > implementation needs to be implemented.
> > 
> > I believe OpenACC has something similar, but no idea if it is already
> > implemented.
> > 
> > Anyway, I think the implementation should be that in some pass running on
> > the ACCEL_COMPILER side (guarded by separate address space aka non-HSA)
> 
> HSA does not define ACCEL_COMPILER, because it uses only one compiler.

HSA is a non-issue here, as it has shared address space, therefore map
clause does nothing, declare target to or link clauses also don't do
anything.

> > we actually replace the variables with pointers to variables, then need
> > to somehow also mark those in the offloading tables, so that the library
> 
> I see 2 possible options: use the MSB of the size, or introduce the third field
> for flags.

Well, it can be either recorded in the host variable tables (which contain
address and size pair, right), or in corresponding offloading device table
(which contains the pointer, something else?).

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.1] Handle new form of #pragma omp declare target
  2015-10-26 19:11   ` Jakub Jelinek
@ 2015-10-26 19:49     ` Ilya Verbin
  2015-10-26 19:55       ` Jakub Jelinek
  0 siblings, 1 reply; 48+ messages in thread
From: Ilya Verbin @ 2015-10-26 19:49 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin

On Mon, Oct 26, 2015 at 20:05:39 +0100, Jakub Jelinek wrote:
> On Mon, Oct 26, 2015 at 09:35:52PM +0300, Ilya Verbin wrote:
> > On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote:
> > > As the testcases show, #pragma omp declare target has now a new form (well,
> > > two; with some issues on it pending), where it is used just as a single
> > > declarative directive rather than a pair of them and allows marking
> > > vars and functions by name as "omp declare target" vars/functions (which the
> > > middle-end etc. already handles), but also "omp declare target link", which
> > > is a deferred var, that is not initially mapped (on devices without shared
> > > memory with host), but has to be mapped explicitly.
> > 
> > I don't quite understand how link should work.  OpenMP 4.5 says:
> > 
> > "The list items of a link clause are not mapped by the declare target directive.
> > Instead, their mapping is deferred until they are mapped by target data or
> > target constructs. They are mapped only for such regions."
> >
> > But doesn't this mean that the example bellow should work identically
> > with/without USE_LINK defined?  Or is there some difference on other testcases?
> 
> On your testcase, the end result is pretty much the same, the variable is
> not mapped initially to the device, and at the beginning of omp target it is
> mapped to device, at the end of the region it is unmapped from the device
> (without copying back).
> 
> But consider:
> 
> int a = 1, b = 1;
> #pragma omp declare target link (a) to (b)
> int
> foo (void)
> {
>   return a++ + b++;
> }
> #pragma omp declare target to (foo)
> int
> main ()
> {
>   a = 2;
>   b = 2;
>   int res;
>   #pragma omp target map (to: a, b) map (from: res)
>   {
>     res = foo () + foo ();
>   }
>   // This assumes only non-shared address space, so would need to be guarded
>   // for that.
>   if (res != (2 + 1) + (3 + 2))
>     __builtin_abort ();
>   return 0;
> }
> 
> Without declare target link or to, you can't use the global variables
> in orphaned accelerated routines (unless you e.g. take the address of the
> mapped variable in the region and pass it around).
> The to variables (non-deferred) are always mapped and are initialized with
> the original initializer, refcount is infinity.  link (deferred) work more
> like the normal mapping, referencing those vars when they aren't explicitly
> (or implicitly) mapped is unspecified behavior, if it is e.g. mapped freshly
> with to kind, it gets the current value of the host var rather than the
> original one.  But, beyond the mapping the compiler needs to ensure that
> all uses of the link global var (or perhaps just all uses of the link global
> var outside of the target construct body where it is mapped, because you
> could use there the pointer you got from GOMP_target) are replaced by
> dereference of some artificial pointer, so a becomes *a_tmp and &a becomes
> &*a_tmp, and that the runtime library during registration of the tables is
> told about the address of this artificial pointer.  During registration,
> I'd expect it would stick an entry for this range into the table, with some
> special flag or something similar, indicating that it is deferred mapping
> and where the offloading device pointer is.  During mapping, it would map it
> as any other not yet mapped object, but additionally would also set this
> device pointer to the device address of the mapped object.  We also need to
> ensure that when we drop the refcount of that mapping back to 0, we get it
> back to the state where it is described as a range with registered deferred
> mapping and where the device pointer is.

Ok, got it, I'll try implement this...

> > > we actually replace the variables with pointers to variables, then need
> > > to somehow also mark those in the offloading tables, so that the library
> > 
> > I see 2 possible options: use the MSB of the size, or introduce the third field
> > for flags.
> 
> Well, it can be either recorded in the host variable tables (which contain
> address and size pair, right), or in corresponding offloading device table
> (which contains the pointer, something else?).

It contains a size too, which is checked in libgomp:
	  gomp_fatal ("Can't map target variables (size mismatch)");
Yes, we can remove this check, and use second field in device table for flags.

  -- Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.1] Handle new form of #pragma omp declare target
  2015-10-26 19:49     ` Ilya Verbin
@ 2015-10-26 19:55       ` Jakub Jelinek
  2015-11-16 15:41         ` [gomp4.5] Handle #pragma omp declare target link Ilya Verbin
  0 siblings, 1 reply; 48+ messages in thread
From: Jakub Jelinek @ 2015-10-26 19:55 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Thomas Schwinge, gcc-patches, Kirill Yukhin

On Mon, Oct 26, 2015 at 10:39:04PM +0300, Ilya Verbin wrote:
> > Without declare target link or to, you can't use the global variables
> > in orphaned accelerated routines (unless you e.g. take the address of the
> > mapped variable in the region and pass it around).
> > The to variables (non-deferred) are always mapped and are initialized with
> > the original initializer, refcount is infinity.  link (deferred) work more
> > like the normal mapping, referencing those vars when they aren't explicitly
> > (or implicitly) mapped is unspecified behavior, if it is e.g. mapped freshly
> > with to kind, it gets the current value of the host var rather than the
> > original one.  But, beyond the mapping the compiler needs to ensure that
> > all uses of the link global var (or perhaps just all uses of the link global
> > var outside of the target construct body where it is mapped, because you
> > could use there the pointer you got from GOMP_target) are replaced by
> > dereference of some artificial pointer, so a becomes *a_tmp and &a becomes
> > &*a_tmp, and that the runtime library during registration of the tables is
> > told about the address of this artificial pointer.  During registration,
> > I'd expect it would stick an entry for this range into the table, with some
> > special flag or something similar, indicating that it is deferred mapping
> > and where the offloading device pointer is.  During mapping, it would map it
> > as any other not yet mapped object, but additionally would also set this
> > device pointer to the device address of the mapped object.  We also need to
> > ensure that when we drop the refcount of that mapping back to 0, we get it
> > back to the state where it is described as a range with registered deferred
> > mapping and where the device pointer is.
> 
> Ok, got it, I'll try implement this...

Thanks.

> > > > we actually replace the variables with pointers to variables, then need
> > > > to somehow also mark those in the offloading tables, so that the library
> > > 
> > > I see 2 possible options: use the MSB of the size, or introduce the third field
> > > for flags.
> > 
> > Well, it can be either recorded in the host variable tables (which contain
> > address and size pair, right), or in corresponding offloading device table
> > (which contains the pointer, something else?).
> 
> It contains a size too, which is checked in libgomp:
> 	  gomp_fatal ("Can't map target variables (size mismatch)");
> Yes, we can remove this check, and use second field in device table for flags.

Yeah, or e.g. just use MSB of that size (so check that either the size is
the same (then it is target to) or it is MSB | size (then it is target link).
Objects larger than half of the address space aren't really supportable
anyway.

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.1] Handle new form of #pragma omp declare target
  2015-07-17 13:43 [gomp4.1] Handle new form of #pragma omp declare target Jakub Jelinek
  2015-07-17 15:48 ` James Norris
  2015-10-26 18:45 ` Ilya Verbin
@ 2015-10-27 21:15 ` Ilya Verbin
  2015-10-30 17:48   ` Ilya Verbin
  2015-11-23 11:33 ` Thomas Schwinge
  3 siblings, 1 reply; 48+ messages in thread
From: Ilya Verbin @ 2015-10-27 21:15 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Kirill Yukhin

On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote:
> As the testcases show, #pragma omp declare target has now a new form (well,
> two; with some issues on it pending), where it is used just as a single
> declarative directive rather than a pair of them and allows marking
> vars and functions by name as "omp declare target" vars/functions (which the
> middle-end etc. already handles),

There is an issue - such variables are not added to the offloading tables,
because when varpool_node::get_create is called for the first time, the variable
doesn't yet have "omp declare target" attribute, and when it's called for the
second time, it just returns existing node.  Functions also aren't marked as
offloadable.  I tried to fix this by moving the code from
varpool_node::get_create to varpool_node::finalize_decl, but it helped only C,
but doesn't fix C++.  Therefore, I decided to iterate through all functions and
variables, like in the patch bellow.  But it doesn't work for static vars,
declared inside functions, because they do not appear in symtab :(


diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 1a64d789..0ba04ef 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -511,16 +511,6 @@ cgraph_node::create (tree decl)
   gcc_assert (TREE_CODE (decl) == FUNCTION_DECL);
 
   node->decl = decl;
-
-  if ((flag_openacc || flag_openmp)
-      && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
-    {
-      node->offloadable = 1;
-#ifdef ENABLE_OFFLOADING
-      g->have_offload = true;
-#endif
-    }
-
   node->register_symbol ();
 
   if (DECL_CONTEXT (decl) && TREE_CODE (DECL_CONTEXT (decl)) == FUNCTION_DECL)
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 04a4d3f..9ac7b36 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -1016,6 +1016,25 @@ analyze_functions (bool first_time)
   symtab->state = CONSTRUCTION;
   input_location = UNKNOWN_LOCATION;
 
+  /* Process offloadable functions and variables.  */
+  if (first_time && (flag_openacc || flag_openmp))
+    FOR_EACH_SYMBOL (node)
+      if (lookup_attribute ("omp declare target", DECL_ATTRIBUTES (node->decl)))
+	{
+	  node->offloadable = 1;
+
+#ifdef ENABLE_OFFLOADING
+	  g->have_offload = true;
+
+	  if (TREE_CODE (node->decl) == VAR_DECL && !DECL_EXTERNAL (node->decl))
+	    {
+	      if (!in_lto_p)
+		vec_safe_push (offload_vars, node->decl);
+	      node->force_output = 1;
+	    }
+#endif
+	}
+
   /* Ugly, but the fixup can not happen at a time same body alias is created;
      C++ FE is confused about the COMDAT groups being right.  */
   if (symtab->cpp_implicit_aliases_done)
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 7d11e20..077dd40 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -154,19 +154,6 @@ varpool_node::get_create (tree decl)
 
   node = varpool_node::create_empty ();
   node->decl = decl;
-
-  if ((flag_openacc || flag_openmp) && !DECL_EXTERNAL (decl)
-      && lookup_attribute ("omp declare target", DECL_ATTRIBUTES (decl)))
-    {
-      node->offloadable = 1;
-#ifdef ENABLE_OFFLOADING
-      g->have_offload = true;
-      if (!in_lto_p)
-	vec_safe_push (offload_vars, decl);
-      node->force_output = 1;
-#endif
-    }
-
   node->register_symbol ();
   return node;
 }
diff --git a/libgomp/testsuite/libgomp.c++/target-13.C b/libgomp/testsuite/libgomp.c++/target-13.C
index 376672d..5279ac0 100644
--- a/libgomp/testsuite/libgomp.c++/target-13.C
+++ b/libgomp/testsuite/libgomp.c++/target-13.C
@@ -1,11 +1,14 @@
 extern "C" void abort (void);
 
+int g;
+#pragma omp declare target (g)
+
 #pragma omp declare target
 int
 foo (void)
 {
   static int s;
-  return ++s;
+  return ++s + g;
 }
 #pragma omp end declare target
 
diff --git a/libgomp/testsuite/libgomp.c/target-28.c b/libgomp/testsuite/libgomp.c/target-28.c
index c9a2999..96e9e05 100644
--- a/libgomp/testsuite/libgomp.c/target-28.c
+++ b/libgomp/testsuite/libgomp.c/target-28.c
@@ -1,11 +1,14 @@
 extern void abort (void);
 
+int g;
+#pragma omp declare target (g)
+
 #pragma omp declare target
 int
 foo (void)
 {
   static int s;
-  return ++s;
+  return ++s + g;
 }
 #pragma omp end declare target
 
 
  -- Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.1] Handle new form of #pragma omp declare target
  2015-10-27 21:15 ` [gomp4.1] Handle new form of #pragma omp declare target Ilya Verbin
@ 2015-10-30 17:48   ` Ilya Verbin
  2015-10-30 19:23     ` Jakub Jelinek
  0 siblings, 1 reply; 48+ messages in thread
From: Ilya Verbin @ 2015-10-30 17:48 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Kirill Yukhin

On Wed, Oct 28, 2015 at 00:11:03 +0300, Ilya Verbin wrote:
> On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote:
> > As the testcases show, #pragma omp declare target has now a new form (well,
> > two; with some issues on it pending), where it is used just as a single
> > declarative directive rather than a pair of them and allows marking
> > vars and functions by name as "omp declare target" vars/functions (which the
> > middle-end etc. already handles),
> 
> There is an issue - such variables are not added to the offloading tables,
> because when varpool_node::get_create is called for the first time, the variable
> doesn't yet have "omp declare target" attribute, and when it's called for the
> second time, it just returns existing node.  Functions also aren't marked as
> offloadable.  I tried to fix this by moving the code from
> varpool_node::get_create to varpool_node::finalize_decl, but it helped only C,
> but doesn't fix C++.  Therefore, I decided to iterate through all functions and
> variables, like in the patch bellow.  But it doesn't work for static vars,
> declared inside functions, because they do not appear in symtab :(

Ping?  Where should I set node->offloadable for "omp declare target to (list)"
functions, global and static vars?

Thanks,
  -- Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.1] Handle new form of #pragma omp declare target
  2015-10-30 17:48   ` Ilya Verbin
@ 2015-10-30 19:23     ` Jakub Jelinek
  2015-11-02 16:54       ` Ilya Verbin
  0 siblings, 1 reply; 48+ messages in thread
From: Jakub Jelinek @ 2015-10-30 19:23 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: gcc-patches, Kirill Yukhin

On Fri, Oct 30, 2015 at 08:44:07PM +0300, Ilya Verbin wrote:
> On Wed, Oct 28, 2015 at 00:11:03 +0300, Ilya Verbin wrote:
> > On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote:
> > > As the testcases show, #pragma omp declare target has now a new form (well,
> > > two; with some issues on it pending), where it is used just as a single
> > > declarative directive rather than a pair of them and allows marking
> > > vars and functions by name as "omp declare target" vars/functions (which the
> > > middle-end etc. already handles),
> > 
> > There is an issue - such variables are not added to the offloading tables,
> > because when varpool_node::get_create is called for the first time, the variable
> > doesn't yet have "omp declare target" attribute, and when it's called for the
> > second time, it just returns existing node.  Functions also aren't marked as
> > offloadable.  I tried to fix this by moving the code from
> > varpool_node::get_create to varpool_node::finalize_decl, but it helped only C,
> > but doesn't fix C++.  Therefore, I decided to iterate through all functions and
> > variables, like in the patch bellow.  But it doesn't work for static vars,
> > declared inside functions, because they do not appear in symtab :(
> 
> Ping?  Where should I set node->offloadable for "omp declare target to (list)"
> functions, global and static vars?

Perhaps already somewhere in the FEs?  I mean, when the varpool node is
created after the decl has that attribute, it already should set offsetable
itself, so perhaps when adding the attribute check if corresponding varpool
node exists already (but don't create it) and if yes, set offloadable?

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.1] Handle new form of #pragma omp declare target
  2015-10-30 19:23     ` Jakub Jelinek
@ 2015-11-02 16:54       ` Ilya Verbin
  2015-11-02 18:01         ` Jakub Jelinek
  0 siblings, 1 reply; 48+ messages in thread
From: Ilya Verbin @ 2015-11-02 16:54 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Kirill Yukhin

On Fri, Oct 30, 2015 at 20:12:25 +0100, Jakub Jelinek wrote:
> On Fri, Oct 30, 2015 at 08:44:07PM +0300, Ilya Verbin wrote:
> > On Wed, Oct 28, 2015 at 00:11:03 +0300, Ilya Verbin wrote:
> > > On Fri, Jul 17, 2015 at 15:05:59 +0200, Jakub Jelinek wrote:
> > > > As the testcases show, #pragma omp declare target has now a new form (well,
> > > > two; with some issues on it pending), where it is used just as a single
> > > > declarative directive rather than a pair of them and allows marking
> > > > vars and functions by name as "omp declare target" vars/functions (which the
> > > > middle-end etc. already handles),
> > > 
> > > There is an issue - such variables are not added to the offloading tables,
> > > because when varpool_node::get_create is called for the first time, the variable
> > > doesn't yet have "omp declare target" attribute, and when it's called for the
> > > second time, it just returns existing node.  Functions also aren't marked as
> > > offloadable.  I tried to fix this by moving the code from
> > > varpool_node::get_create to varpool_node::finalize_decl, but it helped only C,
> > > but doesn't fix C++.  Therefore, I decided to iterate through all functions and
> > > variables, like in the patch bellow.  But it doesn't work for static vars,
> > > declared inside functions, because they do not appear in symtab :(
> > 
> > Ping?  Where should I set node->offloadable for "omp declare target to (list)"
> > functions, global and static vars?
> 
> Perhaps already somewhere in the FEs?  I mean, when the varpool node is
> created after the decl has that attribute, it already should set offsetable
> itself, so perhaps when adding the attribute check if corresponding varpool
> node exists already (but don't create it) and if yes, set offloadable?

Here is the patch.
make check RUNTESTFLAGS=gomp.exp and check-target-libgomp passed.
OK for gomp-4_5-branch?


gcc/c/
	* c-parser.c: Include context.h.
	(c_parser_omp_declare_target): If decl has "omp declare target" or
	"omp declare target link" attribute, and cgraph or varpool node already
	exists, then set corresponding flags.
gcc/cp/
	* parser.c: Include context.h.
	(cp_parser_omp_declare_target): If decl has "omp declare target" or
	"omp declare target link" attribute, and cgraph or varpool node already
	exists, then set corresponding flags.
libgomp/
	* testsuite/libgomp.c++/target-13.C: Add global variable with "omp
	declare target (<list>)" directive, use it in foo.
	* testsuite/libgomp.c/target-28.c: Likewise.


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index a169457..049417c 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gomp-constants.h"
 #include "c-family/c-indentation.h"
 #include "gimple-expr.h"
+#include "context.h"
 
 \f
 /* Initialization routine for this file.  */
@@ -15600,7 +15601,22 @@ c_parser_omp_declare_target (c_parser *parser)
 	  continue;
 	}
       if (!at1)
-	DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+	{
+	  symtab_node *node = symtab_node::get (t);
+	  DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+	  if (node != NULL)
+	    {
+	      node->offloadable = 1;
+#ifdef ENABLE_OFFLOADING
+	      g->have_offload = true;
+	      if (is_a <varpool_node *> (node))
+		{
+		  vec_safe_push (offload_vars, t);
+		  node->force_output = 1;
+		}
+#endif
+	    }
+	}
     }
 }
 
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index a374e6c..de77a4b 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "omp-low.h"
 #include "gomp-constants.h"
 #include "c-family/c-indentation.h"
+#include "context.h"
 
 \f
 /* The lexer.  */
@@ -34773,7 +34774,22 @@ cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok)
 	  continue;
 	}
       if (!at1)
-	DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+	{
+	  symtab_node *node = symtab_node::get (t);
+	  DECL_ATTRIBUTES (t) = tree_cons (id, NULL_TREE, DECL_ATTRIBUTES (t));
+	  if (node != NULL)
+	    {
+	      node->offloadable = 1;
+#ifdef ENABLE_OFFLOADING
+	      g->have_offload = true;
+	      if (is_a <varpool_node *> (node))
+		{
+		  vec_safe_push (offload_vars, t);
+		  node->force_output = 1;
+		}
+#endif
+	    }
+	}
     }
 }
 
diff --git a/libgomp/testsuite/libgomp.c++/target-13.C b/libgomp/testsuite/libgomp.c++/target-13.C
index 376672d..5279ac0 100644
--- a/libgomp/testsuite/libgomp.c++/target-13.C
+++ b/libgomp/testsuite/libgomp.c++/target-13.C
@@ -1,11 +1,14 @@
 extern "C" void abort (void);
 
+int g;
+#pragma omp declare target (g)
+
 #pragma omp declare target
 int
 foo (void)
 {
   static int s;
-  return ++s;
+  return ++s + g;
 }
 #pragma omp end declare target
 
diff --git a/libgomp/testsuite/libgomp.c/target-28.c b/libgomp/testsuite/libgomp.c/target-28.c
index c9a2999..96e9e05 100644
--- a/libgomp/testsuite/libgomp.c/target-28.c
+++ b/libgomp/testsuite/libgomp.c/target-28.c
@@ -1,11 +1,14 @@
 extern void abort (void);
 
+int g;
+#pragma omp declare target (g)
+
 #pragma omp declare target
 int
 foo (void)
 {
   static int s;
-  return ++s;
+  return ++s + g;
 }
 #pragma omp end declare target
 

  -- Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.1] Handle new form of #pragma omp declare target
  2015-11-02 16:54       ` Ilya Verbin
@ 2015-11-02 18:01         ` Jakub Jelinek
  0 siblings, 0 replies; 48+ messages in thread
From: Jakub Jelinek @ 2015-11-02 18:01 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: gcc-patches, Kirill Yukhin

On Mon, Nov 02, 2015 at 07:54:17PM +0300, Ilya Verbin wrote:
> Here is the patch.
> make check RUNTESTFLAGS=gomp.exp and check-target-libgomp passed.
> OK for gomp-4_5-branch?
> 
> 
> gcc/c/
> 	* c-parser.c: Include context.h.
> 	(c_parser_omp_declare_target): If decl has "omp declare target" or
> 	"omp declare target link" attribute, and cgraph or varpool node already
> 	exists, then set corresponding flags.
> gcc/cp/
> 	* parser.c: Include context.h.
> 	(cp_parser_omp_declare_target): If decl has "omp declare target" or
> 	"omp declare target link" attribute, and cgraph or varpool node already
> 	exists, then set corresponding flags.
> libgomp/
> 	* testsuite/libgomp.c++/target-13.C: Add global variable with "omp
> 	declare target (<list>)" directive, use it in foo.
> 	* testsuite/libgomp.c/target-28.c: Likewise.

Yes, thanks.

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [gomp4.5] Handle #pragma omp declare target link
  2015-10-26 19:55       ` Jakub Jelinek
@ 2015-11-16 15:41         ` Ilya Verbin
  2015-11-19 15:31           ` Jakub Jelinek
  0 siblings, 1 reply; 48+ messages in thread
From: Ilya Verbin @ 2015-11-16 15:41 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Kirill Yukhin

Hi!

On Mon, Oct 26, 2015 at 20:49:40 +0100, Jakub Jelinek wrote:
> On Mon, Oct 26, 2015 at 10:39:04PM +0300, Ilya Verbin wrote:
> > > Without declare target link or to, you can't use the global variables
> > > in orphaned accelerated routines (unless you e.g. take the address of the
> > > mapped variable in the region and pass it around).
> > > The to variables (non-deferred) are always mapped and are initialized with
> > > the original initializer, refcount is infinity.  link (deferred) work more
> > > like the normal mapping, referencing those vars when they aren't explicitly
> > > (or implicitly) mapped is unspecified behavior, if it is e.g. mapped freshly
> > > with to kind, it gets the current value of the host var rather than the
> > > original one.  But, beyond the mapping the compiler needs to ensure that
> > > all uses of the link global var (or perhaps just all uses of the link global
> > > var outside of the target construct body where it is mapped, because you
> > > could use there the pointer you got from GOMP_target) are replaced by
> > > dereference of some artificial pointer, so a becomes *a_tmp and &a becomes
> > > &*a_tmp, and that the runtime library during registration of the tables is
> > > told about the address of this artificial pointer.  During registration,
> > > I'd expect it would stick an entry for this range into the table, with some
> > > special flag or something similar, indicating that it is deferred mapping
> > > and where the offloading device pointer is.  During mapping, it would map it
> > > as any other not yet mapped object, but additionally would also set this
> > > device pointer to the device address of the mapped object.  We also need to
> > > ensure that when we drop the refcount of that mapping back to 0, we get it
> > > back to the state where it is described as a range with registered deferred
> > > mapping and where the device pointer is.
> > 
> > Ok, got it, I'll try implement this...
> 
> Thanks.
> 
> > > > > we actually replace the variables with pointers to variables, then need
> > > > > to somehow also mark those in the offloading tables, so that the library
> > > > 
> > > > I see 2 possible options: use the MSB of the size, or introduce the third field
> > > > for flags.
> > > 
> > > Well, it can be either recorded in the host variable tables (which contain
> > > address and size pair, right), or in corresponding offloading device table
> > > (which contains the pointer, something else?).
> > 
> > It contains a size too, which is checked in libgomp:
> > 	  gomp_fatal ("Can't map target variables (size mismatch)");
> > Yes, we can remove this check, and use second field in device table for flags.
> 
> Yeah, or e.g. just use MSB of that size (so check that either the size is
> the same (then it is target to) or it is MSB | size (then it is target link).
> Objects larger than half of the address space aren't really supportable
> anyway.

Here is WIP patch, not for check-in.  There are still many FIXMEs, which I am
going to resolve, however target-link-1.c testcase pass.
Is this approach correct?  Any comments on FIXMEs?


diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 23d0107..58771c0 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -15895,7 +15895,10 @@ c_parser_omp_declare_target (c_parser *parser)
 	      g->have_offload = true;
 	      if (is_a <varpool_node *> (node))
 		{
-		  vec_safe_push (offload_vars, t);
+		  omp_offload_var var;
+		  var.decl = t;
+		  var.link_ptr_decl = NULL_TREE;
+		  vec_safe_push (offload_vars, var);
 		  node->force_output = 1;
 		}
 #endif
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d1f4970..b890f6d 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -34999,7 +34999,10 @@ cp_parser_omp_declare_target (cp_parser *parser, cp_token *pragma_tok)
 	      g->have_offload = true;
 	      if (is_a <varpool_node *> (node))
 		{
-		  vec_safe_push (offload_vars, t);
+		  omp_offload_var var;
+		  var.decl = t;
+		  var.link_ptr_decl = NULL_TREE;
+		  vec_safe_push (offload_vars, var);
 		  node->force_output = 1;
 		}
 #endif
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 67a9024..878a9c5 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -1106,7 +1106,7 @@ output_offload_tables (void)
       streamer_write_enum (ob->main_stream, LTO_symtab_tags,
 			   LTO_symtab_last_tag, LTO_symtab_variable);
       lto_output_var_decl_index (ob->decl_state, ob->main_stream,
-				 (*offload_vars)[i]);
+				 (*offload_vars)[i].decl);
     }
 
   streamer_write_uhwi_stream (ob->main_stream, 0);
@@ -1902,7 +1902,10 @@ input_offload_tables (void)
 	      int decl_index = streamer_read_uhwi (ib);
 	      tree var_decl
 		= lto_file_decl_data_get_var_decl (file_data, decl_index);
-	      vec_safe_push (offload_vars, var_decl);
+	      omp_offload_var var;
+	      var.decl = var_decl;
+	      var.link_ptr_decl = NULL_TREE;
+	      vec_safe_push (offload_vars, var);
 	    }
 	  else
 	    fatal_error (input_location,
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index ee33551..5900f1a 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -373,7 +373,8 @@ unshare_and_remap (tree x, tree from, tree to)
 }
 
 /* Holds offload tables with decls.  */
-vec<tree, va_gc> *offload_funcs, *offload_vars;
+vec<tree, va_gc> *offload_funcs;
+vec<omp_offload_var, va_gc> *offload_vars;
 
 /* Convenience function for calling scan_omp_1_op on tree operands.  */
 
@@ -2009,7 +2010,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	  decl = OMP_CLAUSE_DECL (c);
 	  /* Global variables with "omp declare target" attribute
 	     don't need to be copied, the receiver side will use them
-	     directly.  */
+	     directly.  However, global variables with "omp declare target link"
+	     attribute need to be copied.  */
 	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 	      && DECL_P (decl)
 	      && ((OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FIRSTPRIVATE_POINTER
@@ -2017,7 +2019,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 		       != GOMP_MAP_FIRSTPRIVATE_REFERENCE))
 		  || TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
 	      && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
-	      && varpool_node::get_create (decl)->offloadable)
+	      && varpool_node::get_create (decl)->offloadable
+	      && !lookup_attribute ("omp declare target link",
+				    DECL_ATTRIBUTES (decl)))
 	    break;
 	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 	      && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER)
@@ -18331,23 +18335,50 @@ make_pass_omp_simd_clone (gcc::context *ctxt)
   return new pass_omp_simd_clone (ctxt);
 }
 
-/* Helper function for omp_finish_file routine.  Takes decls from V_DECLS and
-   adds their addresses and sizes to constructor-vector V_CTOR.  */
+/* Helper function for omp_finish_file routine.  Takes func decls from V_DECLS
+   and adds their addresses to constructor-vector V_CTOR.  */
 static void
-add_decls_addresses_to_decl_constructor (vec<tree, va_gc> *v_decls,
-					 vec<constructor_elt, va_gc> *v_ctor)
+add_funcs_to_decl_constructor (vec<tree, va_gc> *v_decls,
+			       vec<constructor_elt, va_gc> *v_ctor)
 {
   unsigned len = vec_safe_length (v_decls);
   for (unsigned i = 0; i < len; i++)
     {
       tree it = (*v_decls)[i];
-      bool is_function = TREE_CODE (it) != VAR_DECL;
-
       CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, build_fold_addr_expr (it));
-      if (!is_function)
-	CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE,
-				fold_convert (const_ptr_type_node,
-					      DECL_SIZE_UNIT (it)));
+    }
+}
+
+/* Helper function for omp_finish_file routine.  Takes var decls from V_DECLS
+   and adds their addresses and sizes to constructor-vector V_CTOR.  */
+static void
+add_vars_to_decl_constructor (vec<omp_offload_var, va_gc> *v_decls,
+			      vec<constructor_elt, va_gc> *v_ctor)
+{
+  unsigned len = vec_safe_length (v_decls);
+  for (unsigned i = 0; i < len; i++)
+    {
+      omp_offload_var var = (*v_decls)[i];
+      tree addr;
+      tree size = fold_convert (const_ptr_type_node, DECL_SIZE_UNIT (var.decl));
+
+      if (var.link_ptr_decl == NULL_TREE)
+	addr = build_fold_addr_expr (var.decl);
+      else
+	{
+	  /* For "omp declare target link" var use address of the pointer
+	     instead of address of the var.  */
+	  addr = build_fold_addr_expr (var.link_ptr_decl);
+	  /* Most significant bit of the size marks such vars.  */
+	  unsigned HOST_WIDE_INT isize = tree_to_uhwi (size);
+	  isize |= 1ULL << (int_size_in_bytes (const_ptr_type_node) * 8 - 1);
+	  size = wide_int_to_tree (const_ptr_type_node, isize);
+
+	  /* FIXME: Remove varpool node of var?  */
+	}
+
+      CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, addr);
+      CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, size);
     }
 }
 
@@ -18369,8 +18400,8 @@ omp_finish_file (void)
       vec_alloc (v_f, num_funcs);
       vec_alloc (v_v, num_vars * 2);
 
-      add_decls_addresses_to_decl_constructor (offload_funcs, v_f);
-      add_decls_addresses_to_decl_constructor (offload_vars, v_v);
+      add_funcs_to_decl_constructor (offload_funcs, v_f);
+      add_vars_to_decl_constructor (offload_vars, v_v);
 
       tree vars_decl_type = build_array_type_nelts (pointer_sized_int_node,
 						    num_vars * 2);
@@ -18412,7 +18443,7 @@ omp_finish_file (void)
 	}
       for (unsigned i = 0; i < num_vars; i++)
 	{
-	  tree it = (*offload_vars)[i];
+	  tree it = (*offload_vars)[i].decl;
 	  targetm.record_offload_symbol (it);
 	}
     }
@@ -19538,4 +19569,145 @@ make_pass_oacc_device_lower (gcc::context *ctxt)
   return new pass_oacc_device_lower (ctxt);
 }
 
+/* "omp declare target link" handling pass.  */
+
+namespace {
+
+const pass_data pass_data_omp_target_link =
+{
+  GIMPLE_PASS,			/* type */
+  "omptargetlink",		/* name */
+  OPTGROUP_NONE,		/* optinfo_flags */
+  TV_NONE,			/* tv_id */
+  PROP_ssa,			/* properties_required */
+  0,				/* properties_provided */
+  0,				/* properties_destroyed */
+  0,				/* todo_flags_start */
+  TODO_update_ssa,		/* todo_flags_finish */
+};
+
+class pass_omp_target_link : public gimple_opt_pass
+{
+public:
+  pass_omp_target_link (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_omp_target_link, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *fun)
+    {
+#ifdef ACCEL_COMPILER
+      /* FIXME: Replace globals in target regions too or not?  */
+      return lookup_attribute ("omp declare target",
+			       DECL_ATTRIBUTES (fun->decl));
+#else
+      (void) fun;
+      return false;
+#endif
+    }
+
+  virtual unsigned execute (function *);
+};
+
+unsigned
+pass_omp_target_link::execute (function *fun)
+{
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      gimple_stmt_iterator gsi;
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  unsigned i;
+	  gimple *stmt = gsi_stmt (gsi);
+	  for (i = 0; i < gimple_num_ops (stmt); i++)
+	    {
+	      tree op = gimple_op (stmt, i);
+	      tree var = NULL_TREE;
+
+	      if (!op)
+		continue;
+	      if (TREE_CODE (op) == VAR_DECL)
+		var = op;
+	      else if (TREE_CODE (op) == ADDR_EXPR)
+		{
+		  tree op1 = TREE_OPERAND (op, 0);
+		  if (TREE_CODE (op1) == VAR_DECL)
+		    var = op1;
+		}
+	      /* FIXME: Support arrays.  What else?  */
+
+	      if (var && lookup_attribute ("omp declare target link",
+					   DECL_ATTRIBUTES (var)))
+		{
+		  tree type = TREE_TYPE (var);
+		  tree ptype = build_pointer_type (type);
+
+		  /* Find var in offload table.  */
+		  omp_offload_var *table_entry = NULL;
+		  for (unsigned j = 0; j < vec_safe_length (offload_vars); j++)
+		    if ((*offload_vars)[j].decl == var)
+		      {
+			table_entry = &(*offload_vars)[j];
+			break;
+		      }
+		  gcc_assert (table_entry);
+
+		  /* Get or create artificial pointer for the var.  */
+		  tree ptr_decl;
+		  if (table_entry->link_ptr_decl != NULL_TREE)
+		    ptr_decl = table_entry->link_ptr_decl;
+		  else
+		    {
+		      /* FIXME: Create a new node instead of copying?
+			 Which info to preserve?  */
+		      ptr_decl = copy_node (var);
+		      TREE_TYPE (ptr_decl) = ptype;
+		      DECL_MODE (ptr_decl) = TYPE_MODE (ptype);
+		      DECL_SIZE (ptr_decl) = TYPE_SIZE (ptype);
+		      DECL_SIZE_UNIT (ptr_decl) = TYPE_SIZE_UNIT (ptype);
+		      DECL_ARTIFICIAL (ptr_decl) = 1;
+		      /* FIXME: Add new function clone_variable_name?
+			 clone_function_name adds dots into the name, which are
+			 bad for vars.  */
+		      DECL_NAME (ptr_decl)
+			= clone_function_name (var, "linkptr");
+		      SET_DECL_ASSEMBLER_NAME (ptr_decl, DECL_NAME (ptr_decl));
+		      SET_DECL_RTL (ptr_decl, NULL);
+		      varpool_node::finalize_decl (ptr_decl);
+		      table_entry->link_ptr_decl = ptr_decl;
+		    }
+
+		  /* Replace the use of var with dereference of ptr_decl.  */
+		  tree tmp_ssa = make_temp_ssa_name (ptype, NULL, "linkptr");
+		  gimple *new_stmt = gimple_build_assign (tmp_ssa, ptr_decl);
+		  gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+		  tree mem_ref = build_simple_mem_ref (tmp_ssa);
+
+		  if (TREE_CODE (op) == VAR_DECL)
+		    *gimple_op_ptr (stmt, i) = mem_ref;
+		  else if (TREE_CODE (op) == ADDR_EXPR)
+		    {
+		      tree op1 = TREE_OPERAND (op, 0);
+		      if (TREE_CODE (op1) == VAR_DECL)
+			TREE_OPERAND (op, 0) = mem_ref;
+		      recompute_tree_invariant_for_addr_expr (op);
+		    }
+		  update_stmt (stmt);
+		}
+	    }
+	}
+    }
+
+  return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_omp_target_link (gcc::context *ctxt)
+{
+  return new pass_omp_target_link (ctxt);
+}
+
 #include "gt-omp-low.h"
diff --git a/gcc/omp-low.h b/gcc/omp-low.h
index ee0f8ac..c6e4d5a 100644
--- a/gcc/omp-low.h
+++ b/gcc/omp-low.h
@@ -34,7 +34,16 @@ extern tree get_oacc_fn_attrib (tree);
 extern int get_oacc_ifn_dim_arg (const gimple *);
 extern int get_oacc_fn_dim_size (tree, int);
 
+struct omp_offload_var
+{
+  /* Declaration representing global variable.  */
+  tree decl;
+
+  /* Artificial pointer for "omp declare target link" variables.  */
+  tree link_ptr_decl;
+};
+
 extern GTY(()) vec<tree, va_gc> *offload_funcs;
-extern GTY(()) vec<tree, va_gc> *offload_vars;
+extern GTY(()) vec<omp_offload_var, va_gc> *offload_vars;
 
 #endif /* GCC_OMP_LOW_H */
diff --git a/gcc/passes.def b/gcc/passes.def
index c0ab6b9..b32a5e5 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -151,6 +151,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_fixup_cfg);
   NEXT_PASS (pass_lower_eh_dispatch);
   NEXT_PASS (pass_oacc_device_lower);
+  NEXT_PASS (pass_omp_target_link);
   NEXT_PASS (pass_all_optimizations);
   PUSH_INSERT_PASSES_WITHIN (pass_all_optimizations)
       NEXT_PASS (pass_remove_cgraph_callee_edges);
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 49e22a9..554f3d2 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -413,6 +413,7 @@ extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_omp_target_link (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_oacc_device_lower (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_object_sizes (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 478f365..ca8457d 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -156,7 +156,12 @@ varpool_node::get_create (tree decl)
 #ifdef ENABLE_OFFLOADING
       g->have_offload = true;
       if (!in_lto_p)
-	vec_safe_push (offload_vars, decl);
+	{
+	  omp_offload_var var;
+	  var.decl = decl;
+	  var.link_ptr_decl = NULL_TREE;
+	  vec_safe_push (offload_vars, var);
+	}
       node->force_output = 1;
 #endif
     }
diff --git a/libgomp/target.c b/libgomp/target.c
index ef22329..195be43 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -78,6 +78,17 @@ static int num_devices;
 /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
 static int num_devices_openmp;
 
+/* FIXME: Quick and dirty prototype of keeping correspondence between host
+   address of the object and target address of the artificial link pointer.
+   Move it to gomp_device_descr, or where?  */
+struct link_struct
+{
+  uintptr_t host_start;
+  uintptr_t tgt_link_ptr;
+};
+static struct link_struct links[100];
+static int link_num;
+
 /* Similar to gomp_realloc, but release register_lock before gomp_fatal.  */
 
 static void *
@@ -763,6 +774,21 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
     }
 
+  /* Set pointers to "omp declare target link" variables.  */
+  for (i = 0; i < mapnum; i++)
+    /* FIXME: Remove this ugly loop.  */
+    for (int j = 0; j < link_num; j++)
+      if (links[j].host_start == (uintptr_t) hostaddrs[i])
+	{
+	  cur_node.tgt_offset = gomp_map_val (tgt, hostaddrs, i);
+	  /* Set link pointer on target to the device address of the mapped
+	     object.  */
+	  devicep->host2dev_func (devicep->target_id,
+				  (void *) links[j].tgt_link_ptr,
+				  (void *) &cur_node.tgt_offset,
+				  sizeof (void *));
+	}
+
   /* If the variable from "omp target enter data" map-list was already mapped,
      tgt is not needed.  Otherwise tgt will be freed by gomp_unmap_vars or
      gomp_exit_data.  */
@@ -981,6 +1007,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 
   /* Insert host-target address mapping into splay tree.  */
   struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
+  /* FIXME: Do not allocate space for link vars.  */
   tgt->array = gomp_malloc ((num_funcs + num_vars) * sizeof (*tgt->array));
   tgt->refcount = REFCOUNT_INFINITY;
   tgt->tgt_start = 0;
@@ -1009,26 +1036,44 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   for (i = 0; i < num_vars; i++)
     {
       struct addr_pair *target_var = &target_table[num_funcs + i];
-      if (target_var->end - target_var->start
-	  != (uintptr_t) host_var_table[i * 2 + 1])
+      uintptr_t target_size = target_var->end - target_var->start;
+
+      /* Most significant bit of the size marks "omp declare target link"
+	 variables.  */
+      bool is_link = target_size & (1ULL << (sizeof (uintptr_t) * 8 - 1));
+
+      if (!is_link)
 	{
-	  gomp_mutex_unlock (&devicep->lock);
-	  if (is_register_lock)
-	    gomp_mutex_unlock (&register_lock);
-	  gomp_fatal ("Can't map target variables (size mismatch)");
-	}
+	  if ((uintptr_t) host_var_table[i * 2 + 1] != target_size)
+	    {
+	      gomp_mutex_unlock (&devicep->lock);
+	      if (is_register_lock)
+		gomp_mutex_unlock (&register_lock);
+	      gomp_fatal ("Can't map target variables (size mismatch)");
+	    }
 
-      splay_tree_key k = &array->key;
-      k->host_start = (uintptr_t) host_var_table[i * 2];
-      k->host_end = k->host_start + (uintptr_t) host_var_table[i * 2 + 1];
-      k->tgt = tgt;
-      k->tgt_offset = target_var->start;
-      k->refcount = REFCOUNT_INFINITY;
-      k->async_refcount = 0;
-      array->left = NULL;
-      array->right = NULL;
-      splay_tree_insert (&devicep->mem_map, array);
-      array++;
+	  splay_tree_key k = &array->key;
+	  k->host_start = (uintptr_t) host_var_table[i * 2];
+	  k->host_end = k->host_start + (uintptr_t) host_var_table[i * 2 + 1];
+	  k->tgt = tgt;
+	  k->tgt_offset = target_var->start;
+	  k->refcount = REFCOUNT_INFINITY;
+	  k->async_refcount = 0;
+	  array->left = NULL;
+	  array->right = NULL;
+	  splay_tree_insert (&devicep->mem_map, array);
+	  array++;
+	}
+      else
+	{
+	  /* Do not map "omp declare target link" variables, only keep target
+	     address of the artificial pointer.  */
+	  /* FIXME: Where to keep it?  */
+	  struct link_struct l;
+	  l.host_start = (uintptr_t) host_var_table[i * 2];
+	  l.tgt_link_ptr = target_var->start;
+	  links[link_num++] = l;
+	}
     }
 
   free (target_table);
diff --git a/libgomp/testsuite/libgomp.c/target-link-1.c b/libgomp/testsuite/libgomp.c/target-link-1.c
new file mode 100644
index 0000000..332bc14
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-link-1.c
@@ -0,0 +1,56 @@
+int a = 1, b = 1;
+double c = 1.0;
+long long d[27];
+#pragma omp declare target link (a) to (b) link (c, d)
+
+/* FIXME: When the function is inlined, it gets the wrong values.  */
+__attribute__((noinline, noclone)) int
+foo (void)
+{
+  return a++ + b++;
+}
+
+/* FIXME: When the function is inlined, it gets the wrong values.  */
+__attribute__((noinline, noclone)) int
+bar (void)
+{
+  int *p1 = &a;
+  int *p2 = &b;
+  c += 0.1;
+  d[10]++; /* FIXME: Support arrays in pass_omp_target_link::execute.  */
+  return *p1 + *p2;
+}
+
+#pragma omp declare target (foo, bar)
+
+int
+main ()
+{
+  int res;
+  a = b = 2;
+  #pragma omp target map (to: a, b, c, d) map (from: res)
+  {
+    a; c; d; /* FIXME: Do not remove map(a,c,d) during gimplification.  */
+    res = foo () + foo ();
+    res += bar ();
+  }
+
+  int shared_mem = 0;
+  #pragma omp target map (alloc: shared_mem)
+    shared_mem = 1;
+
+  if ((shared_mem && res != (2 + 2) + (3 + 3) + (4 + 4))
+      || (!shared_mem && res != (2 + 1) + (3 + 2) + (4 + 3)))
+    __builtin_abort ();
+
+  #pragma omp target map (to: a) map (from: res)
+  {
+    a; /* FIXME: Do not remove map(a) during gimplification.  */
+    res = foo ();
+  }
+
+  if ((shared_mem && res != 4 + 4) || (!shared_mem && res != 2 + 3))
+    __builtin_abort ();
+
+  return 0;
+}


  -- Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-11-16 15:41         ` [gomp4.5] Handle #pragma omp declare target link Ilya Verbin
@ 2015-11-19 15:31           ` Jakub Jelinek
  2015-11-27 16:51             ` Ilya Verbin
  0 siblings, 1 reply; 48+ messages in thread
From: Jakub Jelinek @ 2015-11-19 15:31 UTC (permalink / raw)
  To: Ilya Verbin, Jan Hubicka, Richard Biener; +Cc: gcc-patches, Kirill Yukhin

On Mon, Nov 16, 2015 at 06:40:43PM +0300, Ilya Verbin wrote:
> Here is WIP patch, not for check-in.  There are still many FIXMEs, which I am
> going to resolve, however target-link-1.c testcase pass.
> Is this approach correct?  Any comments on FIXMEs?
> 
> 
> diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
> index 23d0107..58771c0 100644
> --- a/gcc/c/c-parser.c
> +++ b/gcc/c/c-parser.c
> @@ -15895,7 +15895,10 @@ c_parser_omp_declare_target (c_parser *parser)
>  	      g->have_offload = true;
>  	      if (is_a <varpool_node *> (node))
>  		{
> -		  vec_safe_push (offload_vars, t);
> +		  omp_offload_var var;
> +		  var.decl = t;
> +		  var.link_ptr_decl = NULL_TREE;
> +		  vec_safe_push (offload_vars, var);
>  		  node->force_output = 1;
>  		}

Another possible approach would be to keep offload_vars as
vector of trees, and simply push 2 trees in each case.
Or not to change this at all, see below.

> @@ -2009,7 +2010,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
>  	  decl = OMP_CLAUSE_DECL (c);
>  	  /* Global variables with "omp declare target" attribute
>  	     don't need to be copied, the receiver side will use them
> -	     directly.  */
> +	     directly.  However, global variables with "omp declare target link"
> +	     attribute need to be copied.  */
>  	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
>  	      && DECL_P (decl)
>  	      && ((OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FIRSTPRIVATE_POINTER
> @@ -2017,7 +2019,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
>  		       != GOMP_MAP_FIRSTPRIVATE_REFERENCE))
>  		  || TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
>  	      && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
> -	      && varpool_node::get_create (decl)->offloadable)
> +	      && varpool_node::get_create (decl)->offloadable
> +	      && !lookup_attribute ("omp declare target link",
> +				    DECL_ATTRIBUTES (decl)))

I wonder if Honza/Richi wouldn't prefer to have this info also
in cgraph, instead of looking up the attribute in each case.

> +      if (var.link_ptr_decl == NULL_TREE)
> +	addr = build_fold_addr_expr (var.decl);
> +      else
> +	{
> +	  /* For "omp declare target link" var use address of the pointer
> +	     instead of address of the var.  */
> +	  addr = build_fold_addr_expr (var.link_ptr_decl);
> +	  /* Most significant bit of the size marks such vars.  */
> +	  unsigned HOST_WIDE_INT isize = tree_to_uhwi (size);
> +	  isize |= 1ULL << (int_size_in_bytes (const_ptr_type_node) * 8 - 1);
> +	  size = wide_int_to_tree (const_ptr_type_node, isize);
> +
> +	  /* FIXME: Remove varpool node of var?  */

There is varpool_node::remove (), but not sure if at this point all the
references are already gone.

> +class pass_omp_target_link : public gimple_opt_pass
> +{
> +public:
> +  pass_omp_target_link (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_omp_target_link, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *fun)
> +    {
> +#ifdef ACCEL_COMPILER
> +      /* FIXME: Replace globals in target regions too or not?  */
> +      return lookup_attribute ("omp declare target",
> +			       DECL_ATTRIBUTES (fun->decl));

Certainly in "omp declare target entrypoint" regions too.

> +unsigned
> +pass_omp_target_link::execute (function *fun)
> +{
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, fun)
> +    {
> +      gimple_stmt_iterator gsi;
> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +	{
> +	  unsigned i;
> +	  gimple *stmt = gsi_stmt (gsi);
> +	  for (i = 0; i < gimple_num_ops (stmt); i++)
> +	    {
> +	      tree op = gimple_op (stmt, i);
> +	      tree var = NULL_TREE;
> +
> +	      if (!op)
> +		continue;
> +	      if (TREE_CODE (op) == VAR_DECL)
> +		var = op;
> +	      else if (TREE_CODE (op) == ADDR_EXPR)
> +		{
> +		  tree op1 = TREE_OPERAND (op, 0);
> +		  if (TREE_CODE (op1) == VAR_DECL)
> +		    var = op1;
> +		}
> +	      /* FIXME: Support arrays.  What else?  */

We need to support all the references to the variables.
So, I think this approach is not right.

> +
> +	      if (var && lookup_attribute ("omp declare target link",
> +					   DECL_ATTRIBUTES (var)))
> +		{
> +		  tree type = TREE_TYPE (var);
> +		  tree ptype = build_pointer_type (type);
> +
> +		  /* Find var in offload table.  */
> +		  omp_offload_var *table_entry = NULL;
> +		  for (unsigned j = 0; j < vec_safe_length (offload_vars); j++)
> +		    if ((*offload_vars)[j].decl == var)
> +		      {
> +			table_entry = &(*offload_vars)[j];
> +			break;
> +		      }

Plus this would be terribly expensive if there are many variables in
offload_vars.
So, what I think should be done instead is that you first somewhere, perhaps
when streaming in the decls from LTO in ACCEL_COMPILER or so, create
the artificial link ptr variables for the "omp declare target link"
global vars and
  SET_DECL_VALUE_EXPR (var, build_simple_mem_ref (link_ptr_var));
  DECL_HAS_VALUE_EXPR_P (var) = 1;
and then in this pass just walk_gimple_stmt each stmt, with a
callback that would check for VAR_DECLs with DECL_HAS_VALUE_EXPR_P set
and in that case check if they are "omp declare target link", and if found
signal to the caller that the stmt needs to be regimplified, then just
gimple_regimplify_operands those stmts.

> +		  gcc_assert (table_entry);
> +
> +		  /* Get or create artificial pointer for the var.  */
> +		  tree ptr_decl;
> +		  if (table_entry->link_ptr_decl != NULL_TREE)
> +		    ptr_decl = table_entry->link_ptr_decl;
> +		  else
> +		    {
> +		      /* FIXME: Create a new node instead of copying?
> +			 Which info to preserve?  */
> +		      ptr_decl = copy_node (var);

I think you want a new node instead of copying.  You don't really want to
copy anything, perhaps TREE_USED, and set DECL_NAME to something derived
from the original name.  Make the ptr DECL_ARTIFICIAL and perhaps
DECL_NAMELESS.

> diff --git a/libgomp/target.c b/libgomp/target.c
> index ef22329..195be43 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -78,6 +78,17 @@ static int num_devices;
>  /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
>  static int num_devices_openmp;
>  
> +/* FIXME: Quick and dirty prototype of keeping correspondence between host
> +   address of the object and target address of the artificial link pointer.
> +   Move it to gomp_device_descr, or where?  */
> +struct link_struct
> +{
> +  uintptr_t host_start;
> +  uintptr_t tgt_link_ptr;
> +};
> +static struct link_struct links[100];
> +static int link_num;

As for the representation, I think one possibility would be to say define
REFCOUNT_LINK (~(uintptr_t) 1)
and register at gomp_load_image_to_device time the link vars with that
refcount instead of REFCOUNT_INFINITY.  If k->refcount == REFCOUNT_LINK
then k->tgt_offset would be the pointer to the artificial pointer variable
instead of actual mapping; for say pointer lookup purposes
k->refcount == REFCOUNT_LINK would be treated as not mapped, and
gomp_map_vars if mapping something over that would simply temporarily
replace (remove the old splay tree key, add the new one) the REFCOUNT_LINK entry
with the new mapping (and store the pointer).  Then for the even when the
new mapping's refcount drops to zero we need to ensure that we readd the
REFCOUNT_LINK entry.  For that we need to store the old splay_tree_key
somewhere.  Either we can add it to splay_tree_key_s, but then it will be
around unconditionally for all entries, and splay_tree_node right now is
nicely power of 2-ish - 8 pointers.  Or stick it somewhere in
struct target_mem_desc, say splay_tree_key *link; and if the tgt has tgt->array
allocated and any of the mappings were previously REFCOUNT_LINK, then you could
either allocate that link array with not_found_cnt elements, or allocate
together with tgt->array and just point it after the last entry in
tgt->array.  tgt->link[i] would be NULL if tgt->array[i] splay_tree_node_s
did not replace REFCOUNT_LINK when created, and the old REFCOUNT_LINK entry
otherwise.  When do_unmap or exit_data, before splay_tree_remove you'd
find corresponding link entry (k should point to &k->tgt->array[X].key
for some X, so (splay_tree_node) k - k->tgt->array should be X and thus
splay_tree_key linkk = NULL;
if (k->tgt->link)
  linkk = k->tgt->link[(splay_tree_node) k - k->tgt->array];
before
  splay_tree_remove (&devicep->mem_map, k);
should hopefully give you the splay_tree_key to insert again.

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.1] Handle new form of #pragma omp declare target
  2015-07-17 13:43 [gomp4.1] Handle new form of #pragma omp declare target Jakub Jelinek
                   ` (2 preceding siblings ...)
  2015-10-27 21:15 ` [gomp4.1] Handle new form of #pragma omp declare target Ilya Verbin
@ 2015-11-23 11:33 ` Thomas Schwinge
  2015-11-23 11:41   ` Jakub Jelinek
  3 siblings, 1 reply; 48+ messages in thread
From: Thomas Schwinge @ 2015-11-23 11:33 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Ilya Verbin, James Norris

[-- Attachment #1: Type: text/plain, Size: 1100 bytes --]

Hi Jakub!

On Fri, 17 Jul 2015 15:05:59 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> [...] "omp declare target link" [...]

> This patch only marks them with the new attribute, [...]

> --- gcc/c/c-parser.c.jj	2015-07-16 18:09:25.000000000 +0200
> +++ gcc/c/c-parser.c	2015-07-17 14:11:08.553694975 +0200

>  static void
>  c_parser_omp_declare_target (c_parser *parser)
>  {
> [...]
> +  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> +    {
> +      tree t = OMP_CLAUSE_DECL (c), id;
> +      tree at1 = lookup_attribute ("omp declare target", DECL_ATTRIBUTES (t));
> +      tree at2 = lookup_attribute ("omp declare target link",
> +				   DECL_ATTRIBUTES (t));
> +      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LINK)
> +	{
> +	  id = get_identifier ("omp declare target link");
> +	  std::swap (at1, at2);
> +	}
> +      else
> +	id = get_identifier ("omp declare target");

Is it intentional that you didn't add "omp declare target link" to
gcc/c-family/c-common.c:c_common_attribute_table, next to the existing
"omp declare target"?


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.1] Handle new form of #pragma omp declare target
  2015-11-23 11:33 ` Thomas Schwinge
@ 2015-11-23 11:41   ` Jakub Jelinek
  0 siblings, 0 replies; 48+ messages in thread
From: Jakub Jelinek @ 2015-11-23 11:41 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches, Ilya Verbin, James Norris

On Mon, Nov 23, 2015 at 12:31:24PM +0100, Thomas Schwinge wrote:
> Hi Jakub!
> 
> On Fri, 17 Jul 2015 15:05:59 +0200, Jakub Jelinek <jakub@redhat.com> wrote:
> > [...] "omp declare target link" [...]
> 
> > This patch only marks them with the new attribute, [...]
> 
> > --- gcc/c/c-parser.c.jj	2015-07-16 18:09:25.000000000 +0200
> > +++ gcc/c/c-parser.c	2015-07-17 14:11:08.553694975 +0200
> 
> >  static void
> >  c_parser_omp_declare_target (c_parser *parser)
> >  {
> > [...]
> > +  for (tree c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> > +    {
> > +      tree t = OMP_CLAUSE_DECL (c), id;
> > +      tree at1 = lookup_attribute ("omp declare target", DECL_ATTRIBUTES (t));
> > +      tree at2 = lookup_attribute ("omp declare target link",
> > +				   DECL_ATTRIBUTES (t));
> > +      if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_LINK)
> > +	{
> > +	  id = get_identifier ("omp declare target link");
> > +	  std::swap (at1, at2);
> > +	}
> > +      else
> > +	id = get_identifier ("omp declare target");
> 
> Is it intentional that you didn't add "omp declare target link" to
> gcc/c-family/c-common.c:c_common_attribute_table, next to the existing
> "omp declare target"?

No.  But the link attribute support is still unfinished, Ilya is working on
the support.

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-11-19 15:31           ` Jakub Jelinek
@ 2015-11-27 16:51             ` Ilya Verbin
  2015-11-30 12:08               ` Jakub Jelinek
  0 siblings, 1 reply; 48+ messages in thread
From: Ilya Verbin @ 2015-11-27 16:51 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Biener; +Cc: Jan Hubicka, gcc-patches, Kirill Yukhin

On Thu, Nov 19, 2015 at 16:31:15 +0100, Jakub Jelinek wrote:
> On Mon, Nov 16, 2015 at 06:40:43PM +0300, Ilya Verbin wrote:
> > @@ -2009,7 +2010,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
> >  	  decl = OMP_CLAUSE_DECL (c);
> >  	  /* Global variables with "omp declare target" attribute
> >  	     don't need to be copied, the receiver side will use them
> > -	     directly.  */
> > +	     directly.  However, global variables with "omp declare target link"
> > +	     attribute need to be copied.  */
> >  	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
> >  	      && DECL_P (decl)
> >  	      && ((OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FIRSTPRIVATE_POINTER
> > @@ -2017,7 +2019,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
> >  		       != GOMP_MAP_FIRSTPRIVATE_REFERENCE))
> >  		  || TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
> >  	      && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
> > -	      && varpool_node::get_create (decl)->offloadable)
> > +	      && varpool_node::get_create (decl)->offloadable
> > +	      && !lookup_attribute ("omp declare target link",
> > +				    DECL_ATTRIBUTES (decl)))
> 
> I wonder if Honza/Richi wouldn't prefer to have this info also
> in cgraph, instead of looking up the attribute in each case.

So should I add a new flag into cgraph?
Also it is used in gimplify_adjust_omp_clauses.

> > +      if (var.link_ptr_decl == NULL_TREE)
> > +	addr = build_fold_addr_expr (var.decl);
> > +      else
> > +	{
> > +	  /* For "omp declare target link" var use address of the pointer
> > +	     instead of address of the var.  */
> > +	  addr = build_fold_addr_expr (var.link_ptr_decl);
> > +	  /* Most significant bit of the size marks such vars.  */
> > +	  unsigned HOST_WIDE_INT isize = tree_to_uhwi (size);
> > +	  isize |= 1ULL << (int_size_in_bytes (const_ptr_type_node) * 8 - 1);
> > +	  size = wide_int_to_tree (const_ptr_type_node, isize);
> > +
> > +	  /* FIXME: Remove varpool node of var?  */
> 
> There is varpool_node::remove (), but not sure if at this point all the
> references are already gone.

Actually removing varpool node here will not remove var from the target code, so
I've added a check in cgraphunit.c before assemble_decl ().

> > +class pass_omp_target_link : public gimple_opt_pass
> > +{
> > +public:
> > +  pass_omp_target_link (gcc::context *ctxt)
> > +    : gimple_opt_pass (pass_data_omp_target_link, ctxt)
> > +  {}
> > +
> > +  /* opt_pass methods: */
> > +  virtual bool gate (function *fun)
> > +    {
> > +#ifdef ACCEL_COMPILER
> > +      /* FIXME: Replace globals in target regions too or not?  */
> > +      return lookup_attribute ("omp declare target",
> > +			       DECL_ATTRIBUTES (fun->decl));
> 
> Certainly in "omp declare target entrypoint" regions too.

Done.

> > +unsigned
> > +pass_omp_target_link::execute (function *fun)
> > +{
> > +  basic_block bb;
> > +  FOR_EACH_BB_FN (bb, fun)
> > +    {
> > +      gimple_stmt_iterator gsi;
> > +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > +	{
> > +	  unsigned i;
> > +	  gimple *stmt = gsi_stmt (gsi);
> > +	  for (i = 0; i < gimple_num_ops (stmt); i++)
> > +	    {
> > +	      tree op = gimple_op (stmt, i);
> > +	      tree var = NULL_TREE;
> > +
> > +	      if (!op)
> > +		continue;
> > +	      if (TREE_CODE (op) == VAR_DECL)
> > +		var = op;
> > +	      else if (TREE_CODE (op) == ADDR_EXPR)
> > +		{
> > +		  tree op1 = TREE_OPERAND (op, 0);
> > +		  if (TREE_CODE (op1) == VAR_DECL)
> > +		    var = op1;
> > +		}
> > +	      /* FIXME: Support arrays.  What else?  */
> 
> We need to support all the references to the variables.
> So, I think this approach is not right.
> 
> > +
> > +	      if (var && lookup_attribute ("omp declare target link",
> > +					   DECL_ATTRIBUTES (var)))
> > +		{
> > +		  tree type = TREE_TYPE (var);
> > +		  tree ptype = build_pointer_type (type);
> > +
> > +		  /* Find var in offload table.  */
> > +		  omp_offload_var *table_entry = NULL;
> > +		  for (unsigned j = 0; j < vec_safe_length (offload_vars); j++)
> > +		    if ((*offload_vars)[j].decl == var)
> > +		      {
> > +			table_entry = &(*offload_vars)[j];
> > +			break;
> > +		      }
> 
> Plus this would be terribly expensive if there are many variables in
> offload_vars.
> So, what I think should be done instead is that you first somewhere, perhaps
> when streaming in the decls from LTO in ACCEL_COMPILER or so, create
> the artificial link ptr variables for the "omp declare target link"
> global vars and
>   SET_DECL_VALUE_EXPR (var, build_simple_mem_ref (link_ptr_var));
>   DECL_HAS_VALUE_EXPR_P (var) = 1;
> and then in this pass just walk_gimple_stmt each stmt, with a
> callback that would check for VAR_DECLs with DECL_HAS_VALUE_EXPR_P set
> and in that case check if they are "omp declare target link", and if found
> signal to the caller that the stmt needs to be regimplified, then just
> gimple_regimplify_operands those stmts.

Cool, it works :)  However I had to disable 2 checks in
varpool_node::assemble_decl for ACCEL_COMPILER.

> > +		  gcc_assert (table_entry);
> > +
> > +		  /* Get or create artificial pointer for the var.  */
> > +		  tree ptr_decl;
> > +		  if (table_entry->link_ptr_decl != NULL_TREE)
> > +		    ptr_decl = table_entry->link_ptr_decl;
> > +		  else
> > +		    {
> > +		      /* FIXME: Create a new node instead of copying?
> > +			 Which info to preserve?  */
> > +		      ptr_decl = copy_node (var);
> 
> I think you want a new node instead of copying.  You don't really want to
> copy anything, perhaps TREE_USED, and set DECL_NAME to something derived
> from the original name.  Make the ptr DECL_ARTIFICIAL and perhaps
> DECL_NAMELESS.

Done.

> > diff --git a/libgomp/target.c b/libgomp/target.c
> > index ef22329..195be43 100644
> > --- a/libgomp/target.c
> > +++ b/libgomp/target.c
> > @@ -78,6 +78,17 @@ static int num_devices;
> >  /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
> >  static int num_devices_openmp;
> >  
> > +/* FIXME: Quick and dirty prototype of keeping correspondence between host
> > +   address of the object and target address of the artificial link pointer.
> > +   Move it to gomp_device_descr, or where?  */
> > +struct link_struct
> > +{
> > +  uintptr_t host_start;
> > +  uintptr_t tgt_link_ptr;
> > +};
> > +static struct link_struct links[100];
> > +static int link_num;
> 
> As for the representation, I think one possibility would be to say define
> REFCOUNT_LINK (~(uintptr_t) 1)
> and register at gomp_load_image_to_device time the link vars with that
> refcount instead of REFCOUNT_INFINITY.  If k->refcount == REFCOUNT_LINK
> then k->tgt_offset would be the pointer to the artificial pointer variable
> instead of actual mapping; for say pointer lookup purposes
> k->refcount == REFCOUNT_LINK would be treated as not mapped, and
> gomp_map_vars if mapping something over that would simply temporarily
> replace (remove the old splay tree key, add the new one) the REFCOUNT_LINK entry
> with the new mapping (and store the pointer).  Then for the even when the
> new mapping's refcount drops to zero we need to ensure that we readd the
> REFCOUNT_LINK entry.  For that we need to store the old splay_tree_key
> somewhere.  Either we can add it to splay_tree_key_s, but then it will be
> around unconditionally for all entries, and splay_tree_node right now is
> nicely power of 2-ish - 8 pointers.  Or stick it somewhere in
> struct target_mem_desc, say splay_tree_key *link; and if the tgt has tgt->array
> allocated and any of the mappings were previously REFCOUNT_LINK, then you could
> either allocate that link array with not_found_cnt elements, or allocate
> together with tgt->array and just point it after the last entry in
> tgt->array.  tgt->link[i] would be NULL if tgt->array[i] splay_tree_node_s
> did not replace REFCOUNT_LINK when created, and the old REFCOUNT_LINK entry
> otherwise.  When do_unmap or exit_data, before splay_tree_remove you'd
> find corresponding link entry (k should point to &k->tgt->array[X].key
> for some X, so (splay_tree_node) k - k->tgt->array should be X and thus
> splay_tree_key linkk = NULL;
> if (k->tgt->link)
>   linkk = k->tgt->link[(splay_tree_node) k - k->tgt->array];
> before
>   splay_tree_remove (&devicep->mem_map, k);
> should hopefully give you the splay_tree_key to insert again.

I implemented the first approach, because the second seems more complicated.
Or should I implement the second?

make check-target-libgomp passed, bootstrap in progress.  Is it OK?


gcc/c-family/
	* c-common.c (c_common_attribute_table): Handle "omp declare target
	link" attribute.
gcc/
	* cgraphunit.c (output_in_order): Do not assemble "omp declare target
	link" variables in ACCEL_COMPILER.
	* gimplify.c (gimplify_adjust_omp_clauses): Do not remove mapping of
	"omp declare target link" variables.
	* lto/lto.c: Include stringpool.h and fold-const.h.
	(offload_handle_link_vars): New static function.
	(lto_main): Call offload_handle_link_vars.
	* omp-low.c (scan_sharing_clauses): Do not remove mapping of "omp
	declare target link" variables.
	(add_decls_addresses_to_decl_constructor): For "omp declare target link"
	variables output address of the artificial pointer instead of address of
	the variable.  Set most significant bit of the size to mark them.
	(pass_data_omp_target_link): New pass_data.
	(pass_omp_target_link): New class.
	(find_link_var_op): New static function.
	(make_pass_omp_target_link): New function.
	* passes.def: Add pass_omp_target_link.
	* tree-pass.h (make_pass_omp_target_link): Declare.
	* varpool.c (varpool_node::assemble_decl): Allow decls with VALUE_EXPR
	in ACCEL_COMPILER.
libgomp/
	* libgomp.h (REFCOUNT_LINK): Define.
	(struct splay_tree_key_s): Add link_key.
	* target.c (gomp_map_vars): Treat REFCOUNT_LINK objects as not mapped.
	Replace target address of the pointer with target address of newly
	mapped object in the splay tree.  Set link pointer on target to the
	device address of the mapped object.
	(gomp_unmap_vars): Restore target address of the pointer in the splay
	tree for REFCOUNT_LINK objects after unmapping.
	(gomp_load_image_to_device): Set refcount to REFCOUNT_LINK for "omp
	declare target link" objects.
	(gomp_exit_data): Restore target address of the pointer in the splay
	tree for REFCOUNT_LINK objects after unmapping.
	* testsuite/libgomp.c/target-link-1.c: New file.


diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index fe0a235..81defd6 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -822,6 +822,8 @@ const struct attribute_spec c_common_attribute_table[] =
 			      handle_simd_attribute, false },
   { "omp declare target",     0, 0, true, false, false,
 			      handle_omp_declare_target_attribute, false },
+  { "omp declare target link", 0, 0, true, false, false,
+			      handle_omp_declare_target_attribute, false },
   { "alloc_align",	      1, 1, false, true, true,
 			      handle_alloc_align_attribute, false },
   { "assume_aligned",	      1, 2, false, true, true,
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index f73d9a7..8bc70f0 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2204,6 +2204,13 @@ output_in_order (bool no_reorder)
 	  break;
 
 	case ORDER_VAR:
+#ifdef ACCEL_COMPILER
+	  /* Do not assemble "omp declare target link" vars.  */
+	  if (DECL_HAS_VALUE_EXPR_P (nodes[i].u.v->decl)
+	      && lookup_attribute ("omp declare target link",
+				   DECL_ATTRIBUTES (nodes[i].u.v->decl)))
+	    break;
+#endif
 	  nodes[i].u.v->assemble_decl ();
 	  break;
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index a3ed378..5a381da 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7700,7 +7700,9 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, tree *list_p,
 	  n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
 	  if ((ctx->region_type & ORT_TARGET) != 0
 	      && !(n->value & GOVD_SEEN)
-	      && GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)) == 0)
+	      && GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)) == 0
+	      && !lookup_attribute ("omp declare target link",
+				    DECL_ATTRIBUTES (decl)))
 	    {
 	      remove = true;
 	      /* For struct element mapping, if struct is never referenced
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index 2661491..58f8a68 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -49,6 +49,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "ipa-utils.h"
 #include "gomp-constants.h"
+#include "stringpool.h"
+#include "fold-const.h"
 
 
 /* Number of parallel tasks to run, -1 if we want to use GNU Make jobserver.  */
@@ -3223,6 +3225,37 @@ lto_init (void)
 #endif
 }
 
+/* Create artificial pointers for "omp declare target link" vars.  */
+
+static void
+offload_handle_link_vars (void)
+{
+#ifdef ACCEL_COMPILER
+  varpool_node *var;
+  FOR_EACH_VARIABLE (var)
+    if (lookup_attribute ("omp declare target link",
+			  DECL_ATTRIBUTES (var->decl)))
+      {
+	tree type = build_pointer_type (TREE_TYPE (var->decl));
+	tree link_ptr_var = make_node (VAR_DECL);
+	TREE_TYPE (link_ptr_var) = type;
+	TREE_USED (link_ptr_var) = 1;
+	TREE_STATIC (link_ptr_var) = 1;
+	DECL_MODE (link_ptr_var) = TYPE_MODE (type);
+	DECL_SIZE (link_ptr_var) = TYPE_SIZE (type);
+	DECL_SIZE_UNIT (link_ptr_var) = TYPE_SIZE_UNIT (type);
+	DECL_ARTIFICIAL (link_ptr_var) = 1;
+	tree var_name = DECL_ASSEMBLER_NAME (var->decl);
+	char *new_name
+	  = ACONCAT ((IDENTIFIER_POINTER (var_name), "_linkptr", NULL));
+	DECL_NAME (link_ptr_var) = get_identifier (new_name);
+	SET_DECL_ASSEMBLER_NAME (link_ptr_var, DECL_NAME (link_ptr_var));
+	SET_DECL_VALUE_EXPR (var->decl, build_simple_mem_ref (link_ptr_var));
+	DECL_HAS_VALUE_EXPR_P (var->decl) = 1;
+      }
+#endif
+}
+
 
 /* Main entry point for the GIMPLE front end.  This front end has
    three main personalities:
@@ -3271,6 +3304,8 @@ lto_main (void)
 
   if (!seen_error ())
     {
+      offload_handle_link_vars ();
+
       /* If WPA is enabled analyze the whole call graph and create an
 	 optimization plan.  Otherwise, read in all the function
 	 bodies and continue with optimization.  */
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 0d4c6e5..423b2d1 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2006,7 +2006,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	  decl = OMP_CLAUSE_DECL (c);
 	  /* Global variables with "omp declare target" attribute
 	     don't need to be copied, the receiver side will use them
-	     directly.  */
+	     directly.  However, global variables with "omp declare target link"
+	     attribute need to be copied.  */
 	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 	      && DECL_P (decl)
 	      && ((OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FIRSTPRIVATE_POINTER
@@ -2014,7 +2015,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 		       != GOMP_MAP_FIRSTPRIVATE_REFERENCE))
 		  || TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
 	      && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
-	      && varpool_node::get_create (decl)->offloadable)
+	      && varpool_node::get_create (decl)->offloadable
+	      && !lookup_attribute ("omp declare target link",
+				    DECL_ATTRIBUTES (decl)))
 	    break;
 	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 	      && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER)
@@ -18480,13 +18483,35 @@ add_decls_addresses_to_decl_constructor (vec<tree, va_gc> *v_decls,
   for (unsigned i = 0; i < len; i++)
     {
       tree it = (*v_decls)[i];
-      bool is_function = TREE_CODE (it) != VAR_DECL;
+      bool is_var = TREE_CODE (it) == VAR_DECL;
+      bool is_link_var
+	= is_var && DECL_HAS_VALUE_EXPR_P (it)
+	  && lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (it));
+
+      tree size = NULL_TREE;
+      if (is_var)
+	size = fold_convert (const_ptr_type_node, DECL_SIZE_UNIT (it));
+
+      tree addr;
+      if (!is_link_var)
+	addr = build_fold_addr_expr (it);
+      else
+	{
+	  tree value_expr = DECL_VALUE_EXPR (it);
+	  tree link_ptr_decl = TREE_OPERAND (value_expr, 0);
+	  varpool_node::finalize_decl (link_ptr_decl);
+	  /* For "omp declare target link" var use address of the pointer
+	     instead of address of the var.  */
+	  addr = build_fold_addr_expr (link_ptr_decl);
+	  /* Most significant bit of the size marks such vars.  */
+	  unsigned HOST_WIDE_INT isize = tree_to_uhwi (size);
+	  isize |= 1ULL << (int_size_in_bytes (const_ptr_type_node) * 8 - 1);
+	  size = wide_int_to_tree (const_ptr_type_node, isize);
+	}
 
-      CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, build_fold_addr_expr (it));
-      if (!is_function)
-	CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE,
-				fold_convert (const_ptr_type_node,
-					      DECL_SIZE_UNIT (it)));
+      CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, addr);
+      if (is_var)
+	CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, size);
     }
 }
 
@@ -19723,4 +19748,84 @@ make_pass_oacc_device_lower (gcc::context *ctxt)
   return new pass_oacc_device_lower (ctxt);
 }
 
+/* "omp declare target link" handling pass.  */
+
+namespace {
+
+const pass_data pass_data_omp_target_link =
+{
+  GIMPLE_PASS,			/* type */
+  "omptargetlink",		/* name */
+  OPTGROUP_NONE,		/* optinfo_flags */
+  TV_NONE,			/* tv_id */
+  PROP_ssa,			/* properties_required */
+  0,				/* properties_provided */
+  0,				/* properties_destroyed */
+  0,				/* todo_flags_start */
+  TODO_update_ssa,		/* todo_flags_finish */
+};
+
+class pass_omp_target_link : public gimple_opt_pass
+{
+public:
+  pass_omp_target_link (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_omp_target_link, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *fun)
+    {
+#ifdef ACCEL_COMPILER
+      tree attrs = DECL_ATTRIBUTES (fun->decl);
+      return lookup_attribute ("omp declare target", attrs)
+	     || lookup_attribute ("omp target entrypoint", attrs);
+#else
+      (void) fun;
+      return false;
+#endif
+    }
+
+  virtual unsigned execute (function *);
+};
+
+/* Callback for walk_gimple_stmt used to scan for link var operands.  */
+
+static tree
+find_link_var_op (tree *tp, int *walk_subtrees, void *)
+{
+  tree t = *tp;
+
+  if (TREE_CODE (t) == VAR_DECL && DECL_HAS_VALUE_EXPR_P (t)
+      && lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (t)))
+    {
+      *walk_subtrees = 0;
+      return t;
+    }
+
+  return NULL_TREE;
+}
+
+unsigned
+pass_omp_target_link::execute (function *fun)
+{
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      gimple_stmt_iterator gsi;
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	if (walk_gimple_stmt (&gsi, NULL, find_link_var_op, NULL))
+	  gimple_regimplify_operands (gsi_stmt (gsi), &gsi);
+    }
+
+  return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_omp_target_link (gcc::context *ctxt)
+{
+  return new pass_omp_target_link (ctxt);
+}
+
 #include "gt-omp-low.h"
diff --git a/gcc/passes.def b/gcc/passes.def
index 1702778..46932b2 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -153,6 +153,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_fixup_cfg);
   NEXT_PASS (pass_lower_eh_dispatch);
   NEXT_PASS (pass_oacc_device_lower);
+  NEXT_PASS (pass_omp_target_link);
   NEXT_PASS (pass_all_optimizations);
   PUSH_INSERT_PASSES_WITHIN (pass_all_optimizations)
       NEXT_PASS (pass_remove_cgraph_callee_edges);
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index dcd2d5e..f6eabe6 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -415,6 +415,7 @@ extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_omp_target_link (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_oacc_device_lower (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_object_sizes (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 36f19a6..cbd1e05 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -561,17 +561,21 @@ varpool_node::assemble_decl (void)
      are not real variables, but just info for debugging and codegen.
      Unfortunately at the moment emutls is not updating varpool correctly
      after turning real vars into value_expr vars.  */
+#ifndef ACCEL_COMPILER
   if (DECL_HAS_VALUE_EXPR_P (decl)
       && !targetm.have_tls)
     return false;
+#endif
 
   /* Hard register vars do not need to be output.  */
   if (DECL_HARD_REGISTER (decl))
     return false;
 
+#ifndef ACCEL_COMPILER
   gcc_checking_assert (!TREE_ASM_WRITTEN (decl)
 		       && TREE_CODE (decl) == VAR_DECL
 		       && !DECL_HAS_VALUE_EXPR_P (decl));
+#endif
 
   if (!in_other_partition
       && !DECL_EXTERNAL (decl))
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index c467f97..ea63248 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -817,6 +817,9 @@ struct target_mem_desc {
 
 /* Special value for refcount - infinity.  */
 #define REFCOUNT_INFINITY (~(uintptr_t) 0)
+/* Special value for refcount - tgt_offset contains target address of the
+   artificial pointer to "omp declare target link" object.  */
+#define REFCOUNT_LINK (~(uintptr_t) 1)
 
 struct splay_tree_key_s {
   /* Address of the host object.  */
@@ -831,6 +834,8 @@ struct splay_tree_key_s {
   uintptr_t refcount;
   /* Asynchronous reference count.  */
   uintptr_t async_refcount;
+  /* Pointer to the original mapping of "omp declare target link" object.  */
+  splay_tree_key link_key;
 };
 
 /* The comparison function.  */
diff --git a/libgomp/target.c b/libgomp/target.c
index cf9d0e6..dcbcaaf 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -453,7 +453,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
       else
 	n = splay_tree_lookup (mem_map, &cur_node);
-      if (n)
+      if (n && n->refcount != REFCOUNT_LINK)
 	gomp_map_vars_existing (devicep, n, &cur_node, &tgt->list[i],
 				kind & typemask);
       else
@@ -617,11 +617,19 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	    else
 	      k->host_end = k->host_start + sizeof (void *);
 	    splay_tree_key n = splay_tree_lookup (mem_map, k);
-	    if (n)
+	    if (n && n->refcount != REFCOUNT_LINK)
 	      gomp_map_vars_existing (devicep, n, k, &tgt->list[i],
 				      kind & typemask);
 	    else
 	      {
+		k->link_key = NULL;
+		if (n && n->refcount == REFCOUNT_LINK)
+		  {
+		    /* Replace target address of the pointer with target address
+		       of mapped object in the splay tree.  */
+		    splay_tree_remove (mem_map, n);
+		    k->link_key = n;
+		  }
 		size_t align = (size_t) 1 << (kind >> rshift);
 		tgt->list[i].key = k;
 		k->tgt = tgt;
@@ -741,6 +749,16 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		    gomp_fatal ("%s: unhandled kind 0x%.2x", __FUNCTION__,
 				kind);
 		  }
+
+		if (k->link_key)
+		  {
+		    /* Set link pointer on target to the device address of the
+		       mapped object.  */
+		    void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset);
+		    devicep->host2dev_func (devicep->target_id,
+					    (void *) n->tgt_offset,
+					    &tgt_addr, sizeof (void *));
+		  }
 		array++;
 	      }
 	  }
@@ -866,6 +884,9 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
       if (do_unmap)
 	{
 	  splay_tree_remove (&devicep->mem_map, k);
+	  if (k->link_key)
+	    splay_tree_insert (&devicep->mem_map,
+			       (splay_tree_node) k->link_key);
 	  if (k->tgt->refcount > 1)
 	    k->tgt->refcount--;
 	  else
@@ -1005,13 +1026,18 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   for (i = 0; i < num_vars; i++)
     {
       struct addr_pair *target_var = &target_table[num_funcs + i];
-      if (target_var->end - target_var->start
-	  != (uintptr_t) host_var_table[i * 2 + 1])
+      uintptr_t target_size = target_var->end - target_var->start;
+
+      /* Most significant bit of the size marks "omp declare target link"
+	 variables.  */
+      bool is_link = target_size & (1ULL << (sizeof (uintptr_t) * 8 - 1));
+
+      if (!is_link && (uintptr_t) host_var_table[i * 2 + 1] != target_size)
 	{
 	  gomp_mutex_unlock (&devicep->lock);
 	  if (is_register_lock)
 	    gomp_mutex_unlock (&register_lock);
-	  gomp_fatal ("Can't map target variables (size mismatch)");
+	  gomp_fatal ("Cannot map target variables (size mismatch)");
 	}
 
       splay_tree_key k = &array->key;
@@ -1019,7 +1045,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
       k->host_end = k->host_start + (uintptr_t) host_var_table[i * 2 + 1];
       k->tgt = tgt;
       k->tgt_offset = target_var->start;
-      k->refcount = REFCOUNT_INFINITY;
+      k->refcount = is_link ? REFCOUNT_LINK : REFCOUNT_INFINITY;
       k->async_refcount = 0;
       array->left = NULL;
       array->right = NULL;
@@ -1632,6 +1658,9 @@ gomp_exit_data (struct gomp_device_descr *devicep, size_t mapnum,
 	  if (k->refcount == 0)
 	    {
 	      splay_tree_remove (&devicep->mem_map, k);
+	      if (k->link_key)
+		splay_tree_insert (&devicep->mem_map,
+				   (splay_tree_node) k->link_key);
 	      if (k->tgt->refcount > 1)
 		k->tgt->refcount--;
 	      else
diff --git a/libgomp/testsuite/libgomp.c/target-link-1.c b/libgomp/testsuite/libgomp.c/target-link-1.c
new file mode 100644
index 0000000..681677c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-link-1.c
@@ -0,0 +1,63 @@
+struct S { int s, t; };
+
+int a = 1, b = 1;
+double c[27];
+struct S d = { 8888, 8888 };
+#pragma omp declare target link (a) to (b) link (c, d)
+
+int
+foo (void)
+{
+  return a++ + b++;
+}
+
+int
+bar (int n)
+{
+  int *p1 = &a;
+  int *p2 = &b;
+  c[n] += 2.0;
+  d.s -= 2;
+  d.t -= 2;
+  return *p1 + *p2 + d.s + d.t;
+}
+
+#pragma omp declare target (foo, bar)
+
+int
+main ()
+{
+  a = b = 2;
+  d.s = 17;
+  d.t = 18;
+
+  int res, n = 10;
+  #pragma omp target map (to: a, b, c, d) map (from: res)
+  {
+    res = foo () + foo ();
+    c[n] = 3.0;
+    res += bar (n);
+  }
+
+  int shared_mem = 0;
+  #pragma omp target map (alloc: shared_mem)
+    shared_mem = 1;
+
+  if ((shared_mem && res != (2 + 2) + (3 + 3) + (4 + 4 + 15 + 16))
+      || (!shared_mem && res != (2 + 1) + (3 + 2) + (4 + 3 + 15 + 16)))
+    __builtin_abort ();
+
+  #pragma omp target enter data map (to: c)
+  #pragma omp target update from (c)
+  res = (int) (c[n] + 0.5);
+  if ((shared_mem && res != 5) || (!shared_mem && res != 0))
+    __builtin_abort ();
+
+  #pragma omp target map (to: a, b) map (from: res)
+    res = foo ();
+
+  if ((shared_mem && res != 4 + 4) || (!shared_mem && res != 2 + 3))
+    __builtin_abort ();
+
+  return 0;
+}


  -- Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-11-27 16:51             ` Ilya Verbin
@ 2015-11-30 12:08               ` Jakub Jelinek
  2015-11-30 20:42                 ` Ilya Verbin
  0 siblings, 1 reply; 48+ messages in thread
From: Jakub Jelinek @ 2015-11-30 12:08 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: Richard Biener, Jan Hubicka, gcc-patches, Kirill Yukhin

On Fri, Nov 27, 2015 at 07:50:09PM +0300, Ilya Verbin wrote:
> On Thu, Nov 19, 2015 at 16:31:15 +0100, Jakub Jelinek wrote:
> > On Mon, Nov 16, 2015 at 06:40:43PM +0300, Ilya Verbin wrote:
> > > @@ -2009,7 +2010,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
> > >  	  decl = OMP_CLAUSE_DECL (c);
> > >  	  /* Global variables with "omp declare target" attribute
> > >  	     don't need to be copied, the receiver side will use them
> > > -	     directly.  */
> > > +	     directly.  However, global variables with "omp declare target link"
> > > +	     attribute need to be copied.  */
> > >  	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
> > >  	      && DECL_P (decl)
> > >  	      && ((OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FIRSTPRIVATE_POINTER
> > > @@ -2017,7 +2019,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
> > >  		       != GOMP_MAP_FIRSTPRIVATE_REFERENCE))
> > >  		  || TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
> > >  	      && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
> > > -	      && varpool_node::get_create (decl)->offloadable)
> > > +	      && varpool_node::get_create (decl)->offloadable
> > > +	      && !lookup_attribute ("omp declare target link",
> > > +				    DECL_ATTRIBUTES (decl)))
> > 
> > I wonder if Honza/Richi wouldn't prefer to have this info also
> > in cgraph, instead of looking up the attribute in each case.
> 
> So should I add a new flag into cgraph?
> Also it is used in gimplify_adjust_omp_clauses.

Richi said on IRC that lookup_attribute is ok, so let's keep it that way for
now.

> +	  /* Most significant bit of the size marks such vars.  */
> +	  unsigned HOST_WIDE_INT isize = tree_to_uhwi (size);
> +	  isize |= 1ULL << (int_size_in_bytes (const_ptr_type_node) * 8 - 1);

That supposedly should be BITS_PER_UNIT instead of 8.

> diff --git a/gcc/varpool.c b/gcc/varpool.c
> index 36f19a6..cbd1e05 100644
> --- a/gcc/varpool.c
> +++ b/gcc/varpool.c
> @@ -561,17 +561,21 @@ varpool_node::assemble_decl (void)
>       are not real variables, but just info for debugging and codegen.
>       Unfortunately at the moment emutls is not updating varpool correctly
>       after turning real vars into value_expr vars.  */
> +#ifndef ACCEL_COMPILER
>    if (DECL_HAS_VALUE_EXPR_P (decl)
>        && !targetm.have_tls)
>      return false;
> +#endif
>  
>    /* Hard register vars do not need to be output.  */
>    if (DECL_HARD_REGISTER (decl))
>      return false;
>  
> +#ifndef ACCEL_COMPILER
>    gcc_checking_assert (!TREE_ASM_WRITTEN (decl)
>  		       && TREE_CODE (decl) == VAR_DECL
>  		       && !DECL_HAS_VALUE_EXPR_P (decl));
> +#endif

This looks wrong, both of these clearly could affect anything with
DECL_HAS_VALUE_EXPR_P, not just the link vars.
So, if you need to handle the "omp declare target link" vars specially,
you should only handle those specially and nothing else.  And please try to
explain why.

> @@ -1005,13 +1026,18 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
>    for (i = 0; i < num_vars; i++)
>      {
>        struct addr_pair *target_var = &target_table[num_funcs + i];
> -      if (target_var->end - target_var->start
> -	  != (uintptr_t) host_var_table[i * 2 + 1])
> +      uintptr_t target_size = target_var->end - target_var->start;
> +
> +      /* Most significant bit of the size marks "omp declare target link"
> +	 variables.  */
> +      bool is_link = target_size & (1ULL << (sizeof (uintptr_t) * 8 - 1));

__CHAR_BIT__ here instead of 8?

> @@ -1019,7 +1045,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
>        k->host_end = k->host_start + (uintptr_t) host_var_table[i * 2 + 1];
>        k->tgt = tgt;
>        k->tgt_offset = target_var->start;
> -      k->refcount = REFCOUNT_INFINITY;
> +      k->refcount = is_link ? REFCOUNT_LINK : REFCOUNT_INFINITY;
>        k->async_refcount = 0;
>        array->left = NULL;
>        array->right = NULL;

Do we need to do anything in gomp_unload_image_from_device ?
I mean at least in questionable programs that for link vars don't decrement
the refcount of the var that replaced the link var to 0 first before
dlclosing the library.
At least host_var_table[j * 2 + 1] will have the MSB set, so we need to
handle it differently.  Perhaps for that case perform a lookup, and if we
get something which has link_map non-NULL, first perform as if there is
target exit data delete (var) on it first?

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-11-30 12:08               ` Jakub Jelinek
@ 2015-11-30 20:42                 ` Ilya Verbin
  2015-11-30 20:55                   ` Jakub Jelinek
  0 siblings, 1 reply; 48+ messages in thread
From: Ilya Verbin @ 2015-11-30 20:42 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Kirill Yukhin

On Mon, Nov 30, 2015 at 13:04:59 +0100, Jakub Jelinek wrote:
> On Fri, Nov 27, 2015 at 07:50:09PM +0300, Ilya Verbin wrote:
> > +	  /* Most significant bit of the size marks such vars.  */
> > +	  unsigned HOST_WIDE_INT isize = tree_to_uhwi (size);
> > +	  isize |= 1ULL << (int_size_in_bytes (const_ptr_type_node) * 8 - 1);
> 
> That supposedly should be BITS_PER_UNIT instead of 8.

Fixed.

> > diff --git a/gcc/varpool.c b/gcc/varpool.c
> > index 36f19a6..cbd1e05 100644
> > --- a/gcc/varpool.c
> > +++ b/gcc/varpool.c
> > @@ -561,17 +561,21 @@ varpool_node::assemble_decl (void)
> >       are not real variables, but just info for debugging and codegen.
> >       Unfortunately at the moment emutls is not updating varpool correctly
> >       after turning real vars into value_expr vars.  */
> > +#ifndef ACCEL_COMPILER
> >    if (DECL_HAS_VALUE_EXPR_P (decl)
> >        && !targetm.have_tls)
> >      return false;
> > +#endif
> >  
> >    /* Hard register vars do not need to be output.  */
> >    if (DECL_HARD_REGISTER (decl))
> >      return false;
> >  
> > +#ifndef ACCEL_COMPILER
> >    gcc_checking_assert (!TREE_ASM_WRITTEN (decl)
> >  		       && TREE_CODE (decl) == VAR_DECL
> >  		       && !DECL_HAS_VALUE_EXPR_P (decl));
> > +#endif
> 
> This looks wrong, both of these clearly could affect anything with
> DECL_HAS_VALUE_EXPR_P, not just the link vars.
> So, if you need to handle the "omp declare target link" vars specially,
> you should only handle those specially and nothing else.  And please try to
> explain why.

Actually these ifndefs are not needed, because assemble_decl never will be
called by accel compiler for original link vars.  I've added a check into
output_in_order, but missed a second place where assemble_decl is called -
symbol_table::output_variables.  So, fixed now.

> > @@ -1005,13 +1026,18 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
> >    for (i = 0; i < num_vars; i++)
> >      {
> >        struct addr_pair *target_var = &target_table[num_funcs + i];
> > -      if (target_var->end - target_var->start
> > -	  != (uintptr_t) host_var_table[i * 2 + 1])
> > +      uintptr_t target_size = target_var->end - target_var->start;
> > +
> > +      /* Most significant bit of the size marks "omp declare target link"
> > +	 variables.  */
> > +      bool is_link = target_size & (1ULL << (sizeof (uintptr_t) * 8 - 1));
> 
> __CHAR_BIT__ here instead of 8?

Fixed.

> > @@ -1019,7 +1045,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
> >        k->host_end = k->host_start + (uintptr_t) host_var_table[i * 2 + 1];
> >        k->tgt = tgt;
> >        k->tgt_offset = target_var->start;
> > -      k->refcount = REFCOUNT_INFINITY;
> > +      k->refcount = is_link ? REFCOUNT_LINK : REFCOUNT_INFINITY;
> >        k->async_refcount = 0;
> >        array->left = NULL;
> >        array->right = NULL;
> 
> Do we need to do anything in gomp_unload_image_from_device ?
> I mean at least in questionable programs that for link vars don't decrement
> the refcount of the var that replaced the link var to 0 first before
> dlclosing the library.
> At least host_var_table[j * 2 + 1] will have the MSB set, so we need to
> handle it differently.  Perhaps for that case perform a lookup, and if we
> get something which has link_map non-NULL, first perform as if there is
> target exit data delete (var) on it first?

You're right, it doesn't deallocate memory on the device if DSO leaves nonzero
refcount.  And currently host compiler doesn't set MSB in host_var_table, it's
set only by accel compiler.  But it's possible to do splay_tree_lookup for each
var to determine whether is it linked or not, like in the patch bellow.
Or do you prefer to set the bit in host compiler too?  It requires
lookup_attribute ("omp declare target link") for all vars in the table during
compilation, but allows to do splay_tree_lookup at run-time only for vars with
MSB set in host_var_table.
Unfortunately, calling gomp_exit_data from gomp_unload_image_from_device works
only for DSO, but it crashed when an executable leaves nonzero refcount, because
target device may be already uninitialized from plugin's __run_exit_handlers
(and it is in case of intelmic), so gomp_exit_data cannot run free_func.
Is it possible do add some atexit (...) to libgomp, which will set shutting_down
flag, and just do nothing in gomp_unload_image_from_device if it is set?


diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 369574f..b73caa1 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -822,6 +822,8 @@ const struct attribute_spec c_common_attribute_table[] =
 			      handle_simd_attribute, false },
   { "omp declare target",     0, 0, true, false, false,
 			      handle_omp_declare_target_attribute, false },
+  { "omp declare target link", 0, 0, true, false, false,
+			      handle_omp_declare_target_attribute, false },
   { "alloc_align",	      1, 1, false, true, true,
 			      handle_alloc_align_attribute, false },
   { "assume_aligned",	      1, 2, false, true, true,
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index f73d9a7..8bc70f0 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2204,6 +2204,13 @@ output_in_order (bool no_reorder)
 	  break;
 
 	case ORDER_VAR:
+#ifdef ACCEL_COMPILER
+	  /* Do not assemble "omp declare target link" vars.  */
+	  if (DECL_HAS_VALUE_EXPR_P (nodes[i].u.v->decl)
+	      && lookup_attribute ("omp declare target link",
+				   DECL_ATTRIBUTES (nodes[i].u.v->decl)))
+	    break;
+#endif
 	  nodes[i].u.v->assemble_decl ();
 	  break;
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 7fff12f..040e7d8 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7861,7 +7861,9 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 	  n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
 	  if ((ctx->region_type & ORT_TARGET) != 0
 	      && !(n->value & GOVD_SEEN)
-	      && GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)) == 0)
+	      && GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)) == 0
+	      && !lookup_attribute ("omp declare target link",
+				    DECL_ATTRIBUTES (decl)))
 	    {
 	      remove = true;
 	      /* For struct element mapping, if struct is never referenced
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index b1e2d6e..37aa197 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -49,6 +49,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "ipa-utils.h"
 #include "gomp-constants.h"
+#include "stringpool.h"
+#include "fold-const.h"
 
 
 /* Number of parallel tasks to run, -1 if we want to use GNU Make jobserver.  */
@@ -3218,6 +3220,37 @@ lto_init (void)
 #endif
 }
 
+/* Create artificial pointers for "omp declare target link" vars.  */
+
+static void
+offload_handle_link_vars (void)
+{
+#ifdef ACCEL_COMPILER
+  varpool_node *var;
+  FOR_EACH_VARIABLE (var)
+    if (lookup_attribute ("omp declare target link",
+			  DECL_ATTRIBUTES (var->decl)))
+      {
+	tree type = build_pointer_type (TREE_TYPE (var->decl));
+	tree link_ptr_var = make_node (VAR_DECL);
+	TREE_TYPE (link_ptr_var) = type;
+	TREE_USED (link_ptr_var) = 1;
+	TREE_STATIC (link_ptr_var) = 1;
+	DECL_MODE (link_ptr_var) = TYPE_MODE (type);
+	DECL_SIZE (link_ptr_var) = TYPE_SIZE (type);
+	DECL_SIZE_UNIT (link_ptr_var) = TYPE_SIZE_UNIT (type);
+	DECL_ARTIFICIAL (link_ptr_var) = 1;
+	tree var_name = DECL_ASSEMBLER_NAME (var->decl);
+	char *new_name
+	  = ACONCAT ((IDENTIFIER_POINTER (var_name), "_linkptr", NULL));
+	DECL_NAME (link_ptr_var) = get_identifier (new_name);
+	SET_DECL_ASSEMBLER_NAME (link_ptr_var, DECL_NAME (link_ptr_var));
+	SET_DECL_VALUE_EXPR (var->decl, build_simple_mem_ref (link_ptr_var));
+	DECL_HAS_VALUE_EXPR_P (var->decl) = 1;
+      }
+#endif
+}
+
 
 /* Main entry point for the GIMPLE front end.  This front end has
    three main personalities:
@@ -3266,6 +3299,8 @@ lto_main (void)
 
   if (!seen_error ())
     {
+      offload_handle_link_vars ();
+
       /* If WPA is enabled analyze the whole call graph and create an
 	 optimization plan.  Otherwise, read in all the function
 	 bodies and continue with optimization.  */
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index f17a828..d2f12dd 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2010,7 +2010,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 	  decl = OMP_CLAUSE_DECL (c);
 	  /* Global variables with "omp declare target" attribute
 	     don't need to be copied, the receiver side will use them
-	     directly.  */
+	     directly.  However, global variables with "omp declare target link"
+	     attribute need to be copied.  */
 	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 	      && DECL_P (decl)
 	      && ((OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FIRSTPRIVATE_POINTER
@@ -2018,7 +2019,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 		       != GOMP_MAP_FIRSTPRIVATE_REFERENCE))
 		  || TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
 	      && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
-	      && varpool_node::get_create (decl)->offloadable)
+	      && varpool_node::get_create (decl)->offloadable
+	      && !lookup_attribute ("omp declare target link",
+				    DECL_ATTRIBUTES (decl)))
 	    break;
 	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 	      && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER)
@@ -18485,13 +18488,36 @@ add_decls_addresses_to_decl_constructor (vec<tree, va_gc> *v_decls,
   for (unsigned i = 0; i < len; i++)
     {
       tree it = (*v_decls)[i];
-      bool is_function = TREE_CODE (it) != VAR_DECL;
+      bool is_var = TREE_CODE (it) == VAR_DECL;
+      bool is_link_var
+	= is_var && DECL_HAS_VALUE_EXPR_P (it)
+	  && lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (it));
+
+      tree size = NULL_TREE;
+      if (is_var)
+	size = fold_convert (const_ptr_type_node, DECL_SIZE_UNIT (it));
+
+      tree addr;
+      if (!is_link_var)
+	addr = build_fold_addr_expr (it);
+      else
+	{
+	  tree value_expr = DECL_VALUE_EXPR (it);
+	  tree link_ptr_decl = TREE_OPERAND (value_expr, 0);
+	  varpool_node::finalize_decl (link_ptr_decl);
+	  /* For "omp declare target link" var use address of the pointer
+	     instead of address of the var.  */
+	  addr = build_fold_addr_expr (link_ptr_decl);
+	  /* Most significant bit of the size marks such vars.  */
+	  unsigned HOST_WIDE_INT isize = tree_to_uhwi (size);
+	  isize |= 1ULL << (int_size_in_bytes (const_ptr_type_node)
+			    * BITS_PER_UNIT - 1);
+	  size = wide_int_to_tree (const_ptr_type_node, isize);
+	}
 
-      CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, build_fold_addr_expr (it));
-      if (!is_function)
-	CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE,
-				fold_convert (const_ptr_type_node,
-					      DECL_SIZE_UNIT (it)));
+      CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, addr);
+      if (is_var)
+	CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, size);
     }
 }
 
@@ -19728,4 +19754,84 @@ make_pass_oacc_device_lower (gcc::context *ctxt)
   return new pass_oacc_device_lower (ctxt);
 }
 
+/* "omp declare target link" handling pass.  */
+
+namespace {
+
+const pass_data pass_data_omp_target_link =
+{
+  GIMPLE_PASS,			/* type */
+  "omptargetlink",		/* name */
+  OPTGROUP_NONE,		/* optinfo_flags */
+  TV_NONE,			/* tv_id */
+  PROP_ssa,			/* properties_required */
+  0,				/* properties_provided */
+  0,				/* properties_destroyed */
+  0,				/* todo_flags_start */
+  TODO_update_ssa,		/* todo_flags_finish */
+};
+
+class pass_omp_target_link : public gimple_opt_pass
+{
+public:
+  pass_omp_target_link (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_omp_target_link, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *fun)
+    {
+#ifdef ACCEL_COMPILER
+      tree attrs = DECL_ATTRIBUTES (fun->decl);
+      return lookup_attribute ("omp declare target", attrs)
+	     || lookup_attribute ("omp target entrypoint", attrs);
+#else
+      (void) fun;
+      return false;
+#endif
+    }
+
+  virtual unsigned execute (function *);
+};
+
+/* Callback for walk_gimple_stmt used to scan for link var operands.  */
+
+static tree
+find_link_var_op (tree *tp, int *walk_subtrees, void *)
+{
+  tree t = *tp;
+
+  if (TREE_CODE (t) == VAR_DECL && DECL_HAS_VALUE_EXPR_P (t)
+      && lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (t)))
+    {
+      *walk_subtrees = 0;
+      return t;
+    }
+
+  return NULL_TREE;
+}
+
+unsigned
+pass_omp_target_link::execute (function *fun)
+{
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      gimple_stmt_iterator gsi;
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	if (walk_gimple_stmt (&gsi, NULL, find_link_var_op, NULL))
+	  gimple_regimplify_operands (gsi_stmt (gsi), &gsi);
+    }
+
+  return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_omp_target_link (gcc::context *ctxt)
+{
+  return new pass_omp_target_link (ctxt);
+}
+
 #include "gt-omp-low.h"
diff --git a/gcc/passes.def b/gcc/passes.def
index 28cb4c1..1f9b5d4 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -170,6 +170,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_fixup_cfg);
   NEXT_PASS (pass_lower_eh_dispatch);
   NEXT_PASS (pass_oacc_device_lower);
+  NEXT_PASS (pass_omp_target_link);
   NEXT_PASS (pass_all_optimizations);
   PUSH_INSERT_PASSES_WITHIN (pass_all_optimizations)
       NEXT_PASS (pass_remove_cgraph_callee_edges);
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 9704918..ce1f3eb 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -415,6 +415,7 @@ extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_omp_target_link (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_oacc_device_lower (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_object_sizes (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 36f19a6..f269826 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -738,6 +738,13 @@ symbol_table::output_variables (void)
       /* Handled in output_in_order.  */
       if (node->no_reorder)
 	continue;
+#ifdef ACCEL_COMPILER
+      /* Do not assemble "omp declare target link" vars.  */
+      if (DECL_HAS_VALUE_EXPR_P (node->decl)
+	  && lookup_attribute ("omp declare target link",
+			       DECL_ATTRIBUTES (node->decl)))
+	continue;
+#endif
       if (node->assemble_decl ())
         changed = true;
     }
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index c467f97..ea63248 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -817,6 +817,9 @@ struct target_mem_desc {
 
 /* Special value for refcount - infinity.  */
 #define REFCOUNT_INFINITY (~(uintptr_t) 0)
+/* Special value for refcount - tgt_offset contains target address of the
+   artificial pointer to "omp declare target link" object.  */
+#define REFCOUNT_LINK (~(uintptr_t) 1)
 
 struct splay_tree_key_s {
   /* Address of the host object.  */
@@ -831,6 +834,8 @@ struct splay_tree_key_s {
   uintptr_t refcount;
   /* Asynchronous reference count.  */
   uintptr_t async_refcount;
+  /* Pointer to the original mapping of "omp declare target link" object.  */
+  splay_tree_key link_key;
 };
 
 /* The comparison function.  */
diff --git a/libgomp/target.c b/libgomp/target.c
index cf9d0e6..bad78a0 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -453,7 +453,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
       else
 	n = splay_tree_lookup (mem_map, &cur_node);
-      if (n)
+      if (n && n->refcount != REFCOUNT_LINK)
 	gomp_map_vars_existing (devicep, n, &cur_node, &tgt->list[i],
 				kind & typemask);
       else
@@ -617,11 +617,19 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	    else
 	      k->host_end = k->host_start + sizeof (void *);
 	    splay_tree_key n = splay_tree_lookup (mem_map, k);
-	    if (n)
+	    if (n && n->refcount != REFCOUNT_LINK)
 	      gomp_map_vars_existing (devicep, n, k, &tgt->list[i],
 				      kind & typemask);
 	    else
 	      {
+		k->link_key = NULL;
+		if (n && n->refcount == REFCOUNT_LINK)
+		  {
+		    /* Replace target address of the pointer with target address
+		       of mapped object in the splay tree.  */
+		    splay_tree_remove (mem_map, n);
+		    k->link_key = n;
+		  }
 		size_t align = (size_t) 1 << (kind >> rshift);
 		tgt->list[i].key = k;
 		k->tgt = tgt;
@@ -741,6 +749,16 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		    gomp_fatal ("%s: unhandled kind 0x%.2x", __FUNCTION__,
 				kind);
 		  }
+
+		if (k->link_key)
+		  {
+		    /* Set link pointer on target to the device address of the
+		       mapped object.  */
+		    void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset);
+		    devicep->host2dev_func (devicep->target_id,
+					    (void *) n->tgt_offset,
+					    &tgt_addr, sizeof (void *));
+		  }
 		array++;
 	      }
 	  }
@@ -866,6 +884,9 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
       if (do_unmap)
 	{
 	  splay_tree_remove (&devicep->mem_map, k);
+	  if (k->link_key)
+	    splay_tree_insert (&devicep->mem_map,
+			       (splay_tree_node) k->link_key);
 	  if (k->tgt->refcount > 1)
 	    k->tgt->refcount--;
 	  else
@@ -937,6 +958,76 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum, void **hostaddrs,
   gomp_mutex_unlock (&devicep->lock);
 }
 
+static void
+gomp_exit_data (struct gomp_device_descr *devicep, size_t mapnum,
+		void **hostaddrs, size_t *sizes, unsigned short *kinds,
+		bool do_lock)
+{
+  const int typemask = 0xff;
+  size_t i;
+  if (do_lock)
+    gomp_mutex_lock (&devicep->lock);
+  for (i = 0; i < mapnum; i++)
+    {
+      struct splay_tree_key_s cur_node;
+      unsigned char kind = kinds[i] & typemask;
+      switch (kind)
+	{
+	case GOMP_MAP_FROM:
+	case GOMP_MAP_ALWAYS_FROM:
+	case GOMP_MAP_DELETE:
+	case GOMP_MAP_RELEASE:
+	case GOMP_MAP_ZERO_LEN_ARRAY_SECTION:
+	case GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION:
+	  cur_node.host_start = (uintptr_t) hostaddrs[i];
+	  cur_node.host_end = cur_node.host_start + sizes[i];
+	  splay_tree_key k = (kind == GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION
+			      || kind == GOMP_MAP_ZERO_LEN_ARRAY_SECTION)
+	    ? gomp_map_0len_lookup (&devicep->mem_map, &cur_node)
+	    : splay_tree_lookup (&devicep->mem_map, &cur_node);
+	  if (!k)
+	    continue;
+
+	  if (k->refcount > 0 && k->refcount != REFCOUNT_INFINITY)
+	    k->refcount--;
+	  if ((kind == GOMP_MAP_DELETE
+	       || kind == GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION)
+	      && k->refcount != REFCOUNT_INFINITY)
+	    k->refcount = 0;
+
+	  if ((kind == GOMP_MAP_FROM && k->refcount == 0)
+	      || kind == GOMP_MAP_ALWAYS_FROM)
+	    devicep->dev2host_func (devicep->target_id,
+				    (void *) cur_node.host_start,
+				    (void *) (k->tgt->tgt_start + k->tgt_offset
+					      + cur_node.host_start
+					      - k->host_start),
+				    cur_node.host_end - cur_node.host_start);
+	  if (k->refcount == 0)
+	    {
+	      splay_tree_remove (&devicep->mem_map, k);
+	      if (k->link_key)
+		splay_tree_insert (&devicep->mem_map,
+				   (splay_tree_node) k->link_key);
+	      if (k->tgt->refcount > 1)
+		k->tgt->refcount--;
+	      else
+		gomp_unmap_tgt (k->tgt);
+	    }
+
+	  break;
+	default:
+	  if (do_lock)
+	    gomp_mutex_unlock (&devicep->lock);
+	  gomp_fatal ("GOMP_target_enter_exit_data unhandled kind 0x%.2x",
+		      kind);
+	}
+    }
+
+  if (do_lock)
+    gomp_mutex_unlock (&devicep->lock);
+}
+
 /* Load image pointed by TARGET_DATA to the device, specified by DEVICEP.
    And insert to splay tree the mapping between addresses from HOST_TABLE and
    from loaded target image.  We rely in the host and device compiler
@@ -996,6 +1087,7 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
       k->tgt_offset = target_table[i].start;
       k->refcount = REFCOUNT_INFINITY;
       k->async_refcount = 0;
+      k->link_key = NULL;
       array->left = NULL;
       array->right = NULL;
       splay_tree_insert (&devicep->mem_map, array);
@@ -1005,13 +1097,19 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   for (i = 0; i < num_vars; i++)
     {
       struct addr_pair *target_var = &target_table[num_funcs + i];
-      if (target_var->end - target_var->start
-	  != (uintptr_t) host_var_table[i * 2 + 1])
+      uintptr_t target_size = target_var->end - target_var->start;
+
+      /* Most significant bit of the size marks "omp declare target link"
+	 variables.  */
+      bool is_link
+	= target_size & (1ULL << (sizeof (uintptr_t) * __CHAR_BIT__ - 1));
+
+      if (!is_link && (uintptr_t) host_var_table[i * 2 + 1] != target_size)
 	{
 	  gomp_mutex_unlock (&devicep->lock);
 	  if (is_register_lock)
 	    gomp_mutex_unlock (&register_lock);
-	  gomp_fatal ("Can't map target variables (size mismatch)");
+	  gomp_fatal ("Cannot map target variables (size mismatch)");
 	}
 
       splay_tree_key k = &array->key;
@@ -1019,8 +1117,9 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
       k->host_end = k->host_start + (uintptr_t) host_var_table[i * 2 + 1];
       k->tgt = tgt;
       k->tgt_offset = target_var->start;
-      k->refcount = REFCOUNT_INFINITY;
+      k->refcount = is_link ? REFCOUNT_LINK : REFCOUNT_INFINITY;
       k->async_refcount = 0;
+      k->link_key = NULL;
       array->left = NULL;
       array->right = NULL;
       splay_tree_insert (&devicep->mem_map, array);
@@ -1071,14 +1170,24 @@ gomp_unload_image_from_device (struct gomp_device_descr *devicep,
       splay_tree_remove (&devicep->mem_map, &k);
     }
 
+  bool has_link_var = false;
   for (j = 0; j < num_vars; j++)
     {
       k.host_start = (uintptr_t) host_var_table[j * 2];
       k.host_end = k.host_start + (uintptr_t) host_var_table[j * 2 + 1];
+      splay_tree_key n = splay_tree_lookup (&devicep->mem_map, &k);
+      if (n->link_key)
+	{
+	  n->link_key = NULL;
+	  has_link_var = true;
+	  unsigned short kind = GOMP_MAP_DELETE;
+	  gomp_exit_data (devicep, 1, &host_var_table[j * 2],
+			  (size_t *) &host_var_table[j * 2 + 1], &kind, false);
+	}
       splay_tree_remove (&devicep->mem_map, &k);
     }
 
-  if (node)
+  if (node && !has_link_var)
     {
       free (node->tgt);
       free (node);
@@ -1586,69 +1695,6 @@ GOMP_target_update_ext (int device, size_t mapnum, void **hostaddrs,
   gomp_update (devicep, mapnum, hostaddrs, sizes, kinds, true);
 }
 
-static void
-gomp_exit_data (struct gomp_device_descr *devicep, size_t mapnum,
-		void **hostaddrs, size_t *sizes, unsigned short *kinds)
-{
-  const int typemask = 0xff;
-  size_t i;
-  gomp_mutex_lock (&devicep->lock);
-  for (i = 0; i < mapnum; i++)
-    {
-      struct splay_tree_key_s cur_node;
-      unsigned char kind = kinds[i] & typemask;
-      switch (kind)
-	{
-	case GOMP_MAP_FROM:
-	case GOMP_MAP_ALWAYS_FROM:
-	case GOMP_MAP_DELETE:
-	case GOMP_MAP_RELEASE:
-	case GOMP_MAP_ZERO_LEN_ARRAY_SECTION:
-	case GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION:
-	  cur_node.host_start = (uintptr_t) hostaddrs[i];
-	  cur_node.host_end = cur_node.host_start + sizes[i];
-	  splay_tree_key k = (kind == GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION
-			      || kind == GOMP_MAP_ZERO_LEN_ARRAY_SECTION)
-	    ? gomp_map_0len_lookup (&devicep->mem_map, &cur_node)
-	    : splay_tree_lookup (&devicep->mem_map, &cur_node);
-	  if (!k)
-	    continue;
-
-	  if (k->refcount > 0 && k->refcount != REFCOUNT_INFINITY)
-	    k->refcount--;
-	  if ((kind == GOMP_MAP_DELETE
-	       || kind == GOMP_MAP_DELETE_ZERO_LEN_ARRAY_SECTION)
-	      && k->refcount != REFCOUNT_INFINITY)
-	    k->refcount = 0;
-
-	  if ((kind == GOMP_MAP_FROM && k->refcount == 0)
-	      || kind == GOMP_MAP_ALWAYS_FROM)
-	    devicep->dev2host_func (devicep->target_id,
-				    (void *) cur_node.host_start,
-				    (void *) (k->tgt->tgt_start + k->tgt_offset
-					      + cur_node.host_start
-					      - k->host_start),
-				    cur_node.host_end - cur_node.host_start);
-	  if (k->refcount == 0)
-	    {
-	      splay_tree_remove (&devicep->mem_map, k);
-	      if (k->tgt->refcount > 1)
-		k->tgt->refcount--;
-	      else
-		gomp_unmap_tgt (k->tgt);
-	    }
-
-	  break;
-	default:
-	  gomp_mutex_unlock (&devicep->lock);
-	  gomp_fatal ("GOMP_target_enter_exit_data unhandled kind 0x%.2x",
-		      kind);
-	}
-    }
-
-  gomp_mutex_unlock (&devicep->lock);
-}
-
 void
 GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs,
 			     size_t *sizes, unsigned short *kinds,
@@ -1718,7 +1764,7 @@ GOMP_target_enter_exit_data (int device, size_t mapnum, void **hostaddrs,
 	gomp_map_vars (devicep, 1, &hostaddrs[i], NULL, &sizes[i], &kinds[i],
 		       true, GOMP_MAP_VARS_ENTER_DATA);
   else
-    gomp_exit_data (devicep, mapnum, hostaddrs, sizes, kinds);
+    gomp_exit_data (devicep, mapnum, hostaddrs, sizes, kinds, true);
 }
 
 bool
@@ -1778,7 +1824,7 @@ gomp_target_task_fn (void *data)
 		       &ttask->kinds[i], true, GOMP_MAP_VARS_ENTER_DATA);
   else
     gomp_exit_data (devicep, ttask->mapnum, ttask->hostaddrs, ttask->sizes,
-		    ttask->kinds);
+		    ttask->kinds, true);
   return false;
 }
 
diff --git a/libgomp/testsuite/libgomp.c/target-link-1.c b/libgomp/testsuite/libgomp.c/target-link-1.c
new file mode 100644
index 0000000..681677c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-link-1.c
@@ -0,0 +1,63 @@
+struct S { int s, t; };
+
+int a = 1, b = 1;
+double c[27];
+struct S d = { 8888, 8888 };
+#pragma omp declare target link (a) to (b) link (c, d)
+
+int
+foo (void)
+{
+  return a++ + b++;
+}
+
+int
+bar (int n)
+{
+  int *p1 = &a;
+  int *p2 = &b;
+  c[n] += 2.0;
+  d.s -= 2;
+  d.t -= 2;
+  return *p1 + *p2 + d.s + d.t;
+}
+
+#pragma omp declare target (foo, bar)
+
+int
+main ()
+{
+  a = b = 2;
+  d.s = 17;
+  d.t = 18;
+
+  int res, n = 10;
+  #pragma omp target map (to: a, b, c, d) map (from: res)
+  {
+    res = foo () + foo ();
+    c[n] = 3.0;
+    res += bar (n);
+  }
+
+  int shared_mem = 0;
+  #pragma omp target map (alloc: shared_mem)
+    shared_mem = 1;
+
+  if ((shared_mem && res != (2 + 2) + (3 + 3) + (4 + 4 + 15 + 16))
+      || (!shared_mem && res != (2 + 1) + (3 + 2) + (4 + 3 + 15 + 16)))
+    __builtin_abort ();
+
+  #pragma omp target enter data map (to: c)
+  #pragma omp target update from (c)
+  res = (int) (c[n] + 0.5);
+  if ((shared_mem && res != 5) || (!shared_mem && res != 0))
+    __builtin_abort ();
+
+  #pragma omp target map (to: a, b) map (from: res)
+    res = foo ();
+
+  if ((shared_mem && res != 4 + 4) || (!shared_mem && res != 2 + 3))
+    __builtin_abort ();
+
+  return 0;
+}


  -- Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-11-30 20:42                 ` Ilya Verbin
@ 2015-11-30 20:55                   ` Jakub Jelinek
  2015-11-30 21:38                     ` Ilya Verbin
  2015-12-14 17:18                     ` [gomp4.5] Handle #pragma omp declare target link Ilya Verbin
  0 siblings, 2 replies; 48+ messages in thread
From: Jakub Jelinek @ 2015-11-30 20:55 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: gcc-patches, Kirill Yukhin

On Mon, Nov 30, 2015 at 11:29:34PM +0300, Ilya Verbin wrote:
> > This looks wrong, both of these clearly could affect anything with
> > DECL_HAS_VALUE_EXPR_P, not just the link vars.
> > So, if you need to handle the "omp declare target link" vars specially,
> > you should only handle those specially and nothing else.  And please try to
> > explain why.
> 
> Actually these ifndefs are not needed, because assemble_decl never will be
> called by accel compiler for original link vars.  I've added a check into
> output_in_order, but missed a second place where assemble_decl is called -
> symbol_table::output_variables.  So, fixed now.

Great.

> > Do we need to do anything in gomp_unload_image_from_device ?
> > I mean at least in questionable programs that for link vars don't decrement
> > the refcount of the var that replaced the link var to 0 first before
> > dlclosing the library.
> > At least host_var_table[j * 2 + 1] will have the MSB set, so we need to
> > handle it differently.  Perhaps for that case perform a lookup, and if we
> > get something which has link_map non-NULL, first perform as if there is
> > target exit data delete (var) on it first?
> 
> You're right, it doesn't deallocate memory on the device if DSO leaves nonzero
> refcount.  And currently host compiler doesn't set MSB in host_var_table, it's
> set only by accel compiler.  But it's possible to do splay_tree_lookup for each
> var to determine whether is it linked or not, like in the patch bellow.
> Or do you prefer to set the bit in host compiler too?  It requires
> lookup_attribute ("omp declare target link") for all vars in the table during
> compilation, but allows to do splay_tree_lookup at run-time only for vars with
> MSB set in host_var_table.
> Unfortunately, calling gomp_exit_data from gomp_unload_image_from_device works
> only for DSO, but it crashed when an executable leaves nonzero refcount, because
> target device may be already uninitialized from plugin's __run_exit_handlers
> (and it is in case of intelmic), so gomp_exit_data cannot run free_func.
> Is it possible do add some atexit (...) to libgomp, which will set shutting_down
> flag, and just do nothing in gomp_unload_image_from_device if it is set?

Sorry, I didn't mean you should call gomp_exit_data, what I meant was that
you perform the same action as would delete(var) do in that case.
Calling gomp_exit_data e.g. looks it up again etc.
Supposedly having the MSB in host table too is useful, so if you could
handle that, it would be nice.  And splay_tree_lookup only if the MSB is
set.
So,
    if (!host_data_has_msb_set)
      splay_tree_remove (&devicep->mem_map, &k);
    else
      {
        splay_tree_key n = splay_tree_lookup (&devicep->mem_map, &k);
        if (n->link_key)
	  {
	    n->refcount = 0;
	    n->link_key = NULL;
	    splay_tree_remove (&devicep->mem_map, n);
	    if (n->tgt->refcount > 1)
	      n->tgt->refcount--;
	    else
	      gomp_unmap_tgt (n->tgt);
	  }
	else
	  splay_tree_remove (&devicep->mem_map, n);
      }
or so.

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-11-30 20:55                   ` Jakub Jelinek
@ 2015-11-30 21:38                     ` Ilya Verbin
  2015-12-01  8:18                       ` Jakub Jelinek
  2015-12-14 17:18                     ` [gomp4.5] Handle #pragma omp declare target link Ilya Verbin
  1 sibling, 1 reply; 48+ messages in thread
From: Ilya Verbin @ 2015-11-30 21:38 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Kirill Yukhin

On Mon, Nov 30, 2015 at 21:49:02 +0100, Jakub Jelinek wrote:
> On Mon, Nov 30, 2015 at 11:29:34PM +0300, Ilya Verbin wrote:
> > You're right, it doesn't deallocate memory on the device if DSO leaves nonzero
> > refcount.  And currently host compiler doesn't set MSB in host_var_table, it's
> > set only by accel compiler.  But it's possible to do splay_tree_lookup for each
> > var to determine whether is it linked or not, like in the patch bellow.
> > Or do you prefer to set the bit in host compiler too?  It requires
> > lookup_attribute ("omp declare target link") for all vars in the table during
> > compilation, but allows to do splay_tree_lookup at run-time only for vars with
> > MSB set in host_var_table.
> > Unfortunately, calling gomp_exit_data from gomp_unload_image_from_device works
> > only for DSO, but it crashed when an executable leaves nonzero refcount, because
> > target device may be already uninitialized from plugin's __run_exit_handlers
> > (and it is in case of intelmic), so gomp_exit_data cannot run free_func.
> > Is it possible do add some atexit (...) to libgomp, which will set shutting_down
> > flag, and just do nothing in gomp_unload_image_from_device if it is set?
> 
> Sorry, I didn't mean you should call gomp_exit_data, what I meant was that
> you perform the same action as would delete(var) do in that case.
> Calling gomp_exit_data e.g. looks it up again etc.
> Supposedly having the MSB in host table too is useful, so if you could
> handle that, it would be nice.  And splay_tree_lookup only if the MSB is
> set.
> So,
>     if (!host_data_has_msb_set)
>       splay_tree_remove (&devicep->mem_map, &k);
>     else
>       {
>         splay_tree_key n = splay_tree_lookup (&devicep->mem_map, &k);
>         if (n->link_key)
> 	  {
> 	    n->refcount = 0;
> 	    n->link_key = NULL;
> 	    splay_tree_remove (&devicep->mem_map, n);
> 	    if (n->tgt->refcount > 1)
> 	      n->tgt->refcount--;
> 	    else
> 	      gomp_unmap_tgt (n->tgt);
> 	  }
> 	else
> 	  splay_tree_remove (&devicep->mem_map, n);
>       }
> or so.

Ok, but it doesn't solve the issue with doing it for the executable, because
gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized device.

  -- Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-11-30 21:38                     ` Ilya Verbin
@ 2015-12-01  8:18                       ` Jakub Jelinek
  2015-12-01  8:48                         ` Ilya Verbin
  0 siblings, 1 reply; 48+ messages in thread
From: Jakub Jelinek @ 2015-12-01  8:18 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: gcc-patches, Kirill Yukhin

On Mon, Nov 30, 2015 at 11:55:20PM +0300, Ilya Verbin wrote:
> Ok, but it doesn't solve the issue with doing it for the executable, because
> gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized device.

?? You mean that the
devicep->unload_image_func (devicep->target_id, version, target_data);
call deinitializes the device or something else (I mean, if there is some
other tgt, then it had to be initialized)?
If it is just that order, I wonder if you can't just move the
unload_image_func call after the splay_tree_remove loops (or even after the
node freeing call).

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-12-01  8:18                       ` Jakub Jelinek
@ 2015-12-01  8:48                         ` Ilya Verbin
  2015-12-01 13:16                           ` Jakub Jelinek
  0 siblings, 1 reply; 48+ messages in thread
From: Ilya Verbin @ 2015-12-01  8:48 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Kirill Yukhin


> On 01 Dec 2015, at 11:18, Jakub Jelinek <jakub@redhat.com> wrote:
> 
>> On Mon, Nov 30, 2015 at 11:55:20PM +0300, Ilya Verbin wrote:
>> Ok, but it doesn't solve the issue with doing it for the executable, because
>> gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized device.
> 
> ?? You mean that the
> devicep->unload_image_func (devicep->target_id, version, target_data);
> call deinitializes the device or something else (I mean, if there is some
> other tgt, then it had to be initialized)?

No, I mean that it can be deinitialized from plugin's __run_exit_handlers (see my last mail with the patch).

  -- Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-12-01  8:48                         ` Ilya Verbin
@ 2015-12-01 13:16                           ` Jakub Jelinek
  2015-12-01 17:30                             ` Ilya Verbin
  0 siblings, 1 reply; 48+ messages in thread
From: Jakub Jelinek @ 2015-12-01 13:16 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: gcc-patches, Kirill Yukhin

On Tue, Dec 01, 2015 at 11:48:51AM +0300, Ilya Verbin wrote:
> 
> > On 01 Dec 2015, at 11:18, Jakub Jelinek <jakub@redhat.com> wrote:
> > 
> >> On Mon, Nov 30, 2015 at 11:55:20PM +0300, Ilya Verbin wrote:
> >> Ok, but it doesn't solve the issue with doing it for the executable, because
> >> gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized device.
> > 
> > ?? You mean that the
> > devicep->unload_image_func (devicep->target_id, version, target_data);
> > call deinitializes the device or something else (I mean, if there is some
> > other tgt, then it had to be initialized)?
> 
> No, I mean that it can be deinitialized from plugin's __run_exit_handlers (see my last mail with the patch).

Then the bug is that you have too many atexit registered handlers that
perform some finalization, better would be to have a single one that
performs everything in order.

Anyway, the other option is in the atexit handlers (liboffloadmic and/or the
intelmic plugin) to set some flag and ignore free_func calls when the flag
is set or something like that.

Note library destructors can also use OpenMP code in them, similarly C++
dtors etc., so when you at some point finalize certain device, you should
arrange for newer events on the device to be ignored and new offloadings to
go to host fallback.

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-12-01 13:16                           ` Jakub Jelinek
@ 2015-12-01 17:30                             ` Ilya Verbin
  2015-12-01 19:05                               ` Jakub Jelinek
  0 siblings, 1 reply; 48+ messages in thread
From: Ilya Verbin @ 2015-12-01 17:30 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Kirill Yukhin

On Tue, Dec 01, 2015 at 14:15:59 +0100, Jakub Jelinek wrote:
> On Tue, Dec 01, 2015 at 11:48:51AM +0300, Ilya Verbin wrote:
> > > On 01 Dec 2015, at 11:18, Jakub Jelinek <jakub@redhat.com> wrote:
> > >> On Mon, Nov 30, 2015 at 11:55:20PM +0300, Ilya Verbin wrote:
> > >> Ok, but it doesn't solve the issue with doing it for the executable, because
> > >> gomp_unmap_tgt (n->tgt) will want to run free_func on uninitialized device.
> > > 
> > > ?? You mean that the
> > > devicep->unload_image_func (devicep->target_id, version, target_data);
> > > call deinitializes the device or something else (I mean, if there is some
> > > other tgt, then it had to be initialized)?
> > 
> > No, I mean that it can be deinitialized from plugin's __run_exit_handlers (see my last mail with the patch).
> 
> Then the bug is that you have too many atexit registered handlers that
> perform some finalization, better would be to have a single one that
> performs everything in order.
> 
> Anyway, the other option is in the atexit handlers (liboffloadmic and/or the
> intelmic plugin) to set some flag and ignore free_func calls when the flag
> is set or something like that.
> 
> Note library destructors can also use OpenMP code in them, similarly C++
> dtors etc., so when you at some point finalize certain device, you should
> arrange for newer events on the device to be ignored and new offloadings to
> go to host fallback.

So I guess the decision to do host fallback should be made in resolve_device,
rather than in plugins (in free_func and all others).  Is this patch OK?
make check-target-libgomp pass both using emul and hw, offloading from dlopened
libs also works fine.


libgomp/
	* target.c (finalized): New static variable.
	(resolve_device): Do nothing when finalized is true.
	(GOMP_offload_register_ver): Likewise.
	(GOMP_offload_unregister_ver): Likewise.
	(gomp_target_fini): New static function.
	(gomp_target_init): Call gomp_target_fini at exit.
liboffloadmic/
	* plugin/libgomp-plugin-intelmic.cpp (unregister_main_image): Remove.
	(register_main_image): Do not call unregister_main_image at exit.
	(GOMP_OFFLOAD_fini_device): Allow for OpenMP.  Unregister main image.


diff --git a/libgomp/target.c b/libgomp/target.c
index cf9d0e6..320178e 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -78,6 +78,10 @@ static int num_devices;
 /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
 static int num_devices_openmp;
 
+/* True when offloading runtime is finalized.  */
+static bool finalized;
+
+
 /* Similar to gomp_realloc, but release register_lock before gomp_fatal.  */
 
 static void *
@@ -108,6 +112,9 @@ gomp_get_num_devices (void)
 static struct gomp_device_descr *
 resolve_device (int device_id)
 {
+  if (finalized)
+    return NULL;
+
   if (device_id == GOMP_DEVICE_ICV)
     {
       struct gomp_task_icv *icv = gomp_icv (false);
@@ -1095,6 +1102,9 @@ GOMP_offload_register_ver (unsigned version, const void *host_table,
 {
   int i;
 
+  if (finalized)
+    return;
+
   if (GOMP_VERSION_LIB (version) > GOMP_VERSION)
     gomp_fatal ("Library too old for offload (version %u < %u)",
 		GOMP_VERSION, GOMP_VERSION_LIB (version));
@@ -1143,6 +1153,9 @@ GOMP_offload_unregister_ver (unsigned version, const void *host_table,
 {
   int i;
 
+  if (finalized)
+    return;
+
   gomp_mutex_lock (&register_lock);
 
   /* Unload image from all initialized devices.  */
@@ -2282,6 +2295,24 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   return 0;
 }
 
+/* This function finalizes the runtime needed for offloading and all initialized
+   devices.  */
+
+static void
+gomp_target_fini (void)
+{
+  finalized = true;
+
+  int i;
+  for (i = 0; i < num_devices; i++)
+    {
+      struct gomp_device_descr *devicep = &devices[i];
+      gomp_mutex_lock (&devicep->lock);
+      gomp_fini_device (devicep);
+      gomp_mutex_unlock (&devicep->lock);
+    }
+}
+
 /* This function initializes the runtime needed for offloading.
    It parses the list of offload targets and tries to load the plugins for
    these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -2387,6 +2418,9 @@ gomp_target_init (void)
       if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
 	goacc_register (&devices[i]);
     }
+
+  if (atexit (gomp_target_fini) != 0)
+    gomp_fatal ("atexit failed");
 }
 
 #else /* PLUGIN_SUPPORT */
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index f8c1725..68f7b2c 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -231,12 +231,6 @@ offload (const char *file, uint64_t line, int device, const char *name,
 }
 
 static void
-unregister_main_image ()
-{
-  __offload_unregister_image (&main_target_image);
-}
-
-static void
 register_main_image ()
 {
   /* Do not check the return value, because old versions of liboffloadmic did
@@ -246,12 +240,6 @@ register_main_image ()
   /* liboffloadmic will call GOMP_PLUGIN_target_task_completion when
      asynchronous task on target is completed.  */
   __offload_register_task_callback (GOMP_PLUGIN_target_task_completion);
-
-  if (atexit (unregister_main_image) != 0)
-    {
-      fprintf (stderr, "%s: atexit failed\n", __FILE__);
-      exit (1);
-    }
 }
 
 /* liboffloadmic loads and runs offload_target_main on all available devices
@@ -269,8 +257,9 @@ extern "C" void
 GOMP_OFFLOAD_fini_device (int device)
 {
   TRACE ("(device = %d)", device);
-  /* Unreachable for GOMP_OFFLOAD_CAP_OPENMP_400.  */
-  abort ();
+
+  /* liboffloadmic will finalize target processes on all available devices.  */
+  __offload_unregister_image (&main_target_image);
 }
 
 static void


  -- Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-12-01 17:30                             ` Ilya Verbin
@ 2015-12-01 19:05                               ` Jakub Jelinek
  2015-12-08 14:46                                 ` Ilya Verbin
  0 siblings, 1 reply; 48+ messages in thread
From: Jakub Jelinek @ 2015-12-01 19:05 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: gcc-patches, Kirill Yukhin

On Tue, Dec 01, 2015 at 08:29:27PM +0300, Ilya Verbin wrote:
> libgomp/
> 	* target.c (finalized): New static variable.
> 	(resolve_device): Do nothing when finalized is true.
> 	(GOMP_offload_register_ver): Likewise.
> 	(GOMP_offload_unregister_ver): Likewise.
> 	(gomp_target_fini): New static function.
> 	(gomp_target_init): Call gomp_target_fini at exit.
> liboffloadmic/
> 	* plugin/libgomp-plugin-intelmic.cpp (unregister_main_image): Remove.
> 	(register_main_image): Do not call unregister_main_image at exit.
> 	(GOMP_OFFLOAD_fini_device): Allow for OpenMP.  Unregister main image.
> 
> diff --git a/libgomp/target.c b/libgomp/target.c
> index cf9d0e6..320178e 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -78,6 +78,10 @@ static int num_devices;
>  /* Number of GOMP_OFFLOAD_CAP_OPENMP_400 devices.  */
>  static int num_devices_openmp;
>  
> +/* True when offloading runtime is finalized.  */
> +static bool finalized;


> +
> +
>  /* Similar to gomp_realloc, but release register_lock before gomp_fatal.  */
>  
>  static void *
> @@ -108,6 +112,9 @@ gomp_get_num_devices (void)
>  static struct gomp_device_descr *
>  resolve_device (int device_id)
>  {
> +  if (finalized)
> +    return NULL;
> +

This is racy, tsan would tell you so.
Instead of a global var, I'd just change the devicep->is_initialized 
field from bool into a 3 state field (perhaps enum), with states
uninitialized, initialized, finalized, and then say in resolve_device,

  gomp_mutex_lock (&devices[device_id].lock);
  if (devices[device_id].state == GOMP_DEVICE_UNINITIALIZED)
    gomp_init_device (&devices[device_id]);
  else if (devices[device_id].state == GOMP_DEVICE_FINALIZED)
    {
      gomp_mutex_unlock (&devices[device_id].lock);
      return NULL;
    }
  gomp_mutex_unlock (&devices[device_id].lock);

Though, of course, that is incomplete, because resolve_device takes one
lock, gomp_get_target_fn_addr another one, gomp_map_vars yet another one.
So I think either we want to rewrite the locking, such that say
resolve_device returns a locked device and then you perform stuff on the
locked device (disadvantage is that gomp_map_vars will call gomp_malloc
with the lock held, which can take some time to allocate the memory),
or there needs to be the possibility that gomp_map_vars rechecks if the
device has not been finalized after taking the lock and returns to the
caller if the device has been finalized in between resolve_device and
gomp_map_vars.

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-12-01 19:05                               ` Jakub Jelinek
@ 2015-12-08 14:46                                 ` Ilya Verbin
  2015-12-11 17:27                                   ` Jakub Jelinek
  0 siblings, 1 reply; 48+ messages in thread
From: Ilya Verbin @ 2015-12-08 14:46 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Kirill Yukhin, Thomas Schwinge

On Tue, Dec 01, 2015 at 20:05:04 +0100, Jakub Jelinek wrote:
> This is racy, tsan would tell you so.
> Instead of a global var, I'd just change the devicep->is_initialized 
> field from bool into a 3 state field (perhaps enum), with states
> uninitialized, initialized, finalized, and then say in resolve_device,
> 
>   gomp_mutex_lock (&devices[device_id].lock);
>   if (devices[device_id].state == GOMP_DEVICE_UNINITIALIZED)
>     gomp_init_device (&devices[device_id]);
>   else if (devices[device_id].state == GOMP_DEVICE_FINALIZED)
>     {
>       gomp_mutex_unlock (&devices[device_id].lock);
>       return NULL;
>     }
>   gomp_mutex_unlock (&devices[device_id].lock);
> 
> Though, of course, that is incomplete, because resolve_device takes one
> lock, gomp_get_target_fn_addr another one, gomp_map_vars yet another one.
> So I think either we want to rewrite the locking, such that say
> resolve_device returns a locked device and then you perform stuff on the
> locked device (disadvantage is that gomp_map_vars will call gomp_malloc
> with the lock held, which can take some time to allocate the memory),
> or there needs to be the possibility that gomp_map_vars rechecks if the
> device has not been finalized after taking the lock and returns to the
> caller if the device has been finalized in between resolve_device and
> gomp_map_vars.

This patch implements the second approach.  Is it OK?
Bootstrap and make check-target-libgomp passed.


libgomp/
	* libgomp.h (gomp_device_state): New enum.
	(struct gomp_device_descr): Replace is_initialized with state.
	(gomp_fini_device): Remove declaration.
	* oacc-host.c (host_dispatch): Use state instead of is_initialized.
	* oacc-init.c (acc_init_1): Use state instead of is_initialized.
	(acc_shutdown_1): Likewise.  Inline gomp_fini_device.
	(acc_set_device_type): Use state instead of is_initialized.
	(acc_set_device_num): Likewise.
	* target.c (resolve_device): Use state instead of is_initialized.
	Do not initialize finalized device.
	(gomp_map_vars): Do nothing if device is finalized.
	(gomp_unmap_vars): Likewise.
	(gomp_update): Likewise.
	(GOMP_offload_register_ver): Use state instead of is_initialized.
	(GOMP_offload_unregister_ver): Likewise.
	(gomp_init_device): Likewise.
	(gomp_unload_device): Likewise.
	(gomp_fini_device): Remove.
	(gomp_get_target_fn_addr): Do nothing if device is finalized.
	(GOMP_target): Go to host fallback if device is finalized.
	(GOMP_target_ext): Likewise.
	(gomp_exit_data): Do nothing if device is finalized.
	(gomp_target_task_fn): Go to host fallback if device is finalized.
	(gomp_target_fini): New static function.
	(gomp_target_init): Use state instead of is_initialized.
	Call gomp_target_fini at exit.
liboffloadmic/
	* plugin/libgomp-plugin-intelmic.cpp (unregister_main_image): Remove.
	(register_main_image): Do not call unregister_main_image at exit.
	(GOMP_OFFLOAD_fini_device): Allow for OpenMP.  Unregister main image.


diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index c467f97..9d9949f 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -888,6 +888,14 @@ typedef struct acc_dispatch_t
   } cuda;
 } acc_dispatch_t;
 
+/* Various state of the accelerator device.  */
+enum gomp_device_state
+{
+  GOMP_DEVICE_UNINITIALIZED,
+  GOMP_DEVICE_INITIALIZED,
+  GOMP_DEVICE_FINALIZED
+};
+
 /* This structure describes accelerator device.
    It contains name of the corresponding libgomp plugin, function handlers for
    interaction with the device, ID-number of the device, and information about
@@ -933,8 +941,10 @@ struct gomp_device_descr
   /* Mutex for the mutable data.  */
   gomp_mutex_t lock;
 
-  /* Set to true when device is initialized.  */
-  bool is_initialized;
+  /* Current state of the device.  OpenACC allows to move from INITIALIZED state
+     back to UNINITIALIZED state.  OpenMP allows only to move from INITIALIZED
+     to FINALIZED state (at program shutdown).  */
+  enum gomp_device_state state;
 
   /* OpenACC-specific data and functions.  */
   /* This is mutable because of its mutable data_environ and target_data
@@ -962,7 +972,6 @@ extern void gomp_copy_from_async (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
 extern void gomp_free_memmap (struct splay_tree_s *);
-extern void gomp_fini_device (struct gomp_device_descr *);
 extern void gomp_unload_device (struct gomp_device_descr *);
 
 /* work.c */
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 9874804..d289b38 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -222,7 +222,7 @@ static struct gomp_device_descr host_dispatch =
 
     .mem_map = { NULL },
     /* .lock initilized in goacc_host_init.  */
-    .is_initialized = false,
+    .state = GOMP_DEVICE_UNINITIALIZED,
 
     .openacc = {
       .data_environ = NULL,
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 9a9a0b0..c4f7b67 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -225,7 +225,7 @@ acc_init_1 (acc_device_t d)
   acc_dev = &base_dev[goacc_device_num];
 
   gomp_mutex_lock (&acc_dev->lock);
-  if (acc_dev->is_initialized)
+  if (acc_dev->state == GOMP_DEVICE_INITIALIZED)
     {
       gomp_mutex_unlock (&acc_dev->lock);
       gomp_fatal ("device already active");
@@ -306,10 +306,11 @@ acc_shutdown_1 (acc_device_t d)
     {
       struct gomp_device_descr *acc_dev = &base_dev[i];
       gomp_mutex_lock (&acc_dev->lock);
-      if (acc_dev->is_initialized)
+      if (acc_dev->state == GOMP_DEVICE_INITIALIZED)
         {
 	  devices_active = true;
-	  gomp_fini_device (acc_dev);
+	  acc_dev->fini_device_func (acc_dev->target_id);
+	  acc_dev->state = GOMP_DEVICE_UNINITIALIZED;
 	}
       gomp_mutex_unlock (&acc_dev->lock);
     }
@@ -506,7 +507,7 @@ acc_set_device_type (acc_device_t d)
   acc_dev = &base_dev[goacc_device_num];
 
   gomp_mutex_lock (&acc_dev->lock);
-  if (!acc_dev->is_initialized)
+  if (acc_dev->state == GOMP_DEVICE_UNINITIALIZED)
     gomp_init_device (acc_dev);
   gomp_mutex_unlock (&acc_dev->lock);
 
@@ -608,7 +609,7 @@ acc_set_device_num (int ord, acc_device_t d)
       acc_dev = &base_dev[ord];
 
       gomp_mutex_lock (&acc_dev->lock);
-      if (!acc_dev->is_initialized)
+      if (acc_dev->state == GOMP_DEVICE_UNINITIALIZED)
         gomp_init_device (acc_dev);
       gomp_mutex_unlock (&acc_dev->lock);
 
diff --git a/libgomp/target.c b/libgomp/target.c
index cf9d0e6..be96a9e 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -118,8 +118,13 @@ resolve_device (int device_id)
     return NULL;
 
   gomp_mutex_lock (&devices[device_id].lock);
-  if (!devices[device_id].is_initialized)
+  if (devices[device_id].state == GOMP_DEVICE_UNINITIALIZED)
     gomp_init_device (&devices[device_id]);
+  else if (devices[device_id].state == GOMP_DEVICE_FINALIZED)
+    {
+      gomp_mutex_unlock (&devices[device_id].lock);
+      return NULL;
+    }
   gomp_mutex_unlock (&devices[device_id].lock);
 
   return &devices[device_id];
@@ -356,6 +361,11 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
     }
 
   gomp_mutex_lock (&devicep->lock);
+  if (devicep->state == GOMP_DEVICE_FINALIZED)
+    {
+      gomp_mutex_unlock (&devicep->lock);
+      return NULL;
+    }
 
   for (i = 0; i < mapnum; i++)
     {
@@ -834,6 +844,11 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
     }
 
   gomp_mutex_lock (&devicep->lock);
+  if (devicep->state == GOMP_DEVICE_FINALIZED)
+    {
+      gomp_mutex_unlock (&devicep->lock);
+      return;
+    }
 
   size_t i;
   for (i = 0; i < tgt->list_count; i++)
@@ -896,6 +911,12 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum, void **hostaddrs,
     return;
 
   gomp_mutex_lock (&devicep->lock);
+  if (devicep->state == GOMP_DEVICE_FINALIZED)
+    {
+      gomp_mutex_unlock (&devicep->lock);
+      return;
+    }
+
   for (i = 0; i < mapnum; i++)
     if (sizes[i])
       {
@@ -1106,7 +1127,8 @@ GOMP_offload_register_ver (unsigned version, const void *host_table,
     {
       struct gomp_device_descr *devicep = &devices[i];
       gomp_mutex_lock (&devicep->lock);
-      if (devicep->type == target_type && devicep->is_initialized)
+      if (devicep->type == target_type
+	  && devicep->state == GOMP_DEVICE_INITIALIZED)
 	gomp_load_image_to_device (devicep, version,
 				   host_table, target_data, true);
       gomp_mutex_unlock (&devicep->lock);
@@ -1150,7 +1172,8 @@ GOMP_offload_unregister_ver (unsigned version, const void *host_table,
     {
       struct gomp_device_descr *devicep = &devices[i];
       gomp_mutex_lock (&devicep->lock);
-      if (devicep->type == target_type && devicep->is_initialized)
+      if (devicep->type == target_type
+	  && devicep->state == GOMP_DEVICE_INITIALIZED)
 	gomp_unload_image_from_device (devicep, version,
 				       host_table, target_data);
       gomp_mutex_unlock (&devicep->lock);
@@ -1193,13 +1216,13 @@ gomp_init_device (struct gomp_device_descr *devicep)
 				   false);
     }
 
-  devicep->is_initialized = true;
+  devicep->state = GOMP_DEVICE_INITIALIZED;
 }
 
 attribute_hidden void
 gomp_unload_device (struct gomp_device_descr *devicep)
 {
-  if (devicep->is_initialized)
+  if (devicep->state == GOMP_DEVICE_INITIALIZED)
     {
       unsigned i;
       
@@ -1231,18 +1254,6 @@ gomp_free_memmap (struct splay_tree_s *mem_map)
     }
 }
 
-/* This function de-initializes the target device, specified by DEVICEP.
-   DEVICEP must be locked on entry, and remains locked on return.  */
-
-attribute_hidden void
-gomp_fini_device (struct gomp_device_descr *devicep)
-{
-  if (devicep->is_initialized)
-    devicep->fini_device_func (devicep->target_id);
-
-  devicep->is_initialized = false;
-}
-
 /* Host fallback for GOMP_target{,_ext} routines.  */
 
 static void
@@ -1310,6 +1321,12 @@ gomp_get_target_fn_addr (struct gomp_device_descr *devicep,
   else
     {
       gomp_mutex_lock (&devicep->lock);
+      if (devicep->state == GOMP_DEVICE_FINALIZED)
+	{
+	  gomp_mutex_unlock (&devicep->lock);
+	  return NULL;
+	}
+
       struct splay_tree_key_s k;
       k.host_start = (uintptr_t) host_fn;
       k.host_end = k.host_start + 1;
@@ -1339,12 +1356,12 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
 {
   struct gomp_device_descr *devicep = resolve_device (device);
 
+  void *fn_addr;
   if (devicep == NULL
-      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || !(fn_addr = gomp_get_target_fn_addr (devicep, fn)))
     return gomp_target_fallback (fn, hostaddrs);
 
-  void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
-
   struct target_mem_desc *tgt_vars
     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
 		     GOMP_MAP_VARS_TARGET);
@@ -1430,15 +1447,15 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
 	gomp_task_maybe_wait_for_dependencies (depend);
     }
 
+  void *fn_addr;
   if (devicep == NULL
-      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || !(fn_addr = gomp_get_target_fn_addr (devicep, fn)))
     {
       gomp_target_fallback_firstprivate (fn, mapnum, hostaddrs, sizes, kinds);
       return;
     }
 
-  void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
-
   struct target_mem_desc *tgt_vars
     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,
 		     GOMP_MAP_VARS_TARGET);
@@ -1593,6 +1610,12 @@ gomp_exit_data (struct gomp_device_descr *devicep, size_t mapnum,
   const int typemask = 0xff;
   size_t i;
   gomp_mutex_lock (&devicep->lock);
+  if (devicep->state == GOMP_DEVICE_FINALIZED)
+    {
+      gomp_mutex_unlock (&devicep->lock);
+      return;
+    }
+
   for (i = 0; i < mapnum; i++)
     {
       struct splay_tree_key_s cur_node;
@@ -1729,8 +1752,10 @@ gomp_target_task_fn (void *data)
 
   if (ttask->fn != NULL)
     {
+      void *fn_addr;
       if (devicep == NULL
-	  || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+	  || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+	  || !(fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn)))
 	{
 	  ttask->state = GOMP_TARGET_TASK_FALLBACK;
 	  gomp_target_fallback_firstprivate (ttask->fn, ttask->mapnum,
@@ -1745,7 +1770,6 @@ gomp_target_task_fn (void *data)
 	  return false;
 	}
 
-      void *fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn);
       ttask->tgt
 	= gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs, NULL,
 			 ttask->sizes, ttask->kinds, true,
@@ -2282,6 +2306,25 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   return 0;
 }
 
+/* This function finalizes all initialized devices.  */
+
+static void
+gomp_target_fini (void)
+{
+  int i;
+  for (i = 0; i < num_devices; i++)
+    {
+      struct gomp_device_descr *devicep = &devices[i];
+      gomp_mutex_lock (&devicep->lock);
+      if (devicep->state == GOMP_DEVICE_INITIALIZED)
+	{
+	  devicep->fini_device_func (devicep->target_id);
+	  devicep->state = GOMP_DEVICE_FINALIZED;
+	}
+      gomp_mutex_unlock (&devicep->lock);
+    }
+}
+
 /* This function initializes the runtime needed for offloading.
    It parses the list of offload targets and tries to load the plugins for
    these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -2341,7 +2384,7 @@ gomp_target_init (void)
 		/* current_device.capabilities has already been set.  */
 		current_device.type = current_device.get_type_func ();
 		current_device.mem_map.root = NULL;
-		current_device.is_initialized = false;
+		current_device.state = GOMP_DEVICE_UNINITIALIZED;
 		current_device.openacc.data_environ = NULL;
 		for (i = 0; i < new_num_devices; i++)
 		  {
@@ -2387,6 +2430,9 @@ gomp_target_init (void)
       if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
 	goacc_register (&devices[i]);
     }
+
+  if (atexit (gomp_target_fini) != 0)
+    gomp_fatal ("atexit failed");
 }
 
 #else /* PLUGIN_SUPPORT */
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index f8c1725..68f7b2c 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -231,12 +231,6 @@ offload (const char *file, uint64_t line, int device, const char *name,
 }
 
 static void
-unregister_main_image ()
-{
-  __offload_unregister_image (&main_target_image);
-}
-
-static void
 register_main_image ()
 {
   /* Do not check the return value, because old versions of liboffloadmic did
@@ -246,12 +240,6 @@ register_main_image ()
   /* liboffloadmic will call GOMP_PLUGIN_target_task_completion when
      asynchronous task on target is completed.  */
   __offload_register_task_callback (GOMP_PLUGIN_target_task_completion);
-
-  if (atexit (unregister_main_image) != 0)
-    {
-      fprintf (stderr, "%s: atexit failed\n", __FILE__);
-      exit (1);
-    }
 }
 
 /* liboffloadmic loads and runs offload_target_main on all available devices
@@ -269,8 +257,9 @@ extern "C" void
 GOMP_OFFLOAD_fini_device (int device)
 {
   TRACE ("(device = %d)", device);
-  /* Unreachable for GOMP_OFFLOAD_CAP_OPENMP_400.  */
-  abort ();
+
+  /* liboffloadmic will finalize target processes on all available devices.  */
+  __offload_unregister_image (&main_target_image);
 }
 
 static void


  -- Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-12-08 14:46                                 ` Ilya Verbin
@ 2015-12-11 17:27                                   ` Jakub Jelinek
  2015-12-11 17:46                                     ` Ilya Verbin
  2015-12-14 16:48                                     ` Ilya Verbin
  0 siblings, 2 replies; 48+ messages in thread
From: Jakub Jelinek @ 2015-12-11 17:27 UTC (permalink / raw)
  To: Ilya Verbin, Thomas Schwinge; +Cc: gcc-patches, Kirill Yukhin, Thomas Schwinge

On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote:
> --- a/libgomp/oacc-init.c
> +++ b/libgomp/oacc-init.c
> @@ -306,10 +306,11 @@ acc_shutdown_1 (acc_device_t d)
>      {
>        struct gomp_device_descr *acc_dev = &base_dev[i];
>        gomp_mutex_lock (&acc_dev->lock);
> -      if (acc_dev->is_initialized)
> +      if (acc_dev->state == GOMP_DEVICE_INITIALIZED)
>          {
>  	  devices_active = true;
> -	  gomp_fini_device (acc_dev);
> +	  acc_dev->fini_device_func (acc_dev->target_id);
> +	  acc_dev->state = GOMP_DEVICE_UNINITIALIZED;
>  	}
>        gomp_mutex_unlock (&acc_dev->lock);
>      }

I'd bet you want to set state here to GOMP_DEVICE_FINALIZED too,
but I'd leave that to the OpenACC folks to do that incrementally
once they test it and/or decide what to do.

> @@ -356,6 +361,11 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
>      }
>  
>    gomp_mutex_lock (&devicep->lock);
> +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> +    {
> +      gomp_mutex_unlock (&devicep->lock);

You need to free (tgt); here I think to avoid leaking memory.

> +      return NULL;
> +    }
>  
>    for (i = 0; i < mapnum; i++)
>      {
> @@ -834,6 +844,11 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
>      }
>  
>    gomp_mutex_lock (&devicep->lock);
> +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> +    {
> +      gomp_mutex_unlock (&devicep->lock);
> +      return;

Supposedly you want at least free (tgt->array); free (tgt); here.
Plus the question is if the mappings shouldn't be removed from the splay tree
before that.

> +/* This function finalizes all initialized devices.  */
> +
> +static void
> +gomp_target_fini (void)
> +{
> +  int i;
> +  for (i = 0; i < num_devices; i++)
> +    {
> +      struct gomp_device_descr *devicep = &devices[i];
> +      gomp_mutex_lock (&devicep->lock);
> +      if (devicep->state == GOMP_DEVICE_INITIALIZED)
> +	{
> +	  devicep->fini_device_func (devicep->target_id);
> +	  devicep->state = GOMP_DEVICE_FINALIZED;
> +	}
> +      gomp_mutex_unlock (&devicep->lock);
> +    }
> +}

The question is what will this do if there are async target tasks still
running on some of the devices at this point (forgotten #pragma omp taskwait
or similar if target nowait regions are started outside of parallel region,
or exit inside of parallel, etc.  But perhaps it can be handled incrementally.
Also there is the question that the 
So I think the patch is ok with the above mentioned changes.

What is the state of the link clause implementation patch?  Does it depend
on this?

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-12-11 17:27                                   ` Jakub Jelinek
@ 2015-12-11 17:46                                     ` Ilya Verbin
  2015-12-14 16:48                                     ` Ilya Verbin
  1 sibling, 0 replies; 48+ messages in thread
From: Ilya Verbin @ 2015-12-11 17:46 UTC (permalink / raw)
  To: Jakub Jelinek, Thomas Schwinge; +Cc: gcc-patches, Kirill Yukhin

On Fri, Dec 11, 2015 at 18:27:13 +0100, Jakub Jelinek wrote:
> On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote:
> > --- a/libgomp/oacc-init.c
> > +++ b/libgomp/oacc-init.c
> > @@ -306,10 +306,11 @@ acc_shutdown_1 (acc_device_t d)
> >      {
> >        struct gomp_device_descr *acc_dev = &base_dev[i];
> >        gomp_mutex_lock (&acc_dev->lock);
> > -      if (acc_dev->is_initialized)
> > +      if (acc_dev->state == GOMP_DEVICE_INITIALIZED)
> >          {
> >  	  devices_active = true;
> > -	  gomp_fini_device (acc_dev);
> > +	  acc_dev->fini_device_func (acc_dev->target_id);
> > +	  acc_dev->state = GOMP_DEVICE_UNINITIALIZED;
> >  	}
> >        gomp_mutex_unlock (&acc_dev->lock);
> >      }
> 
> I'd bet you want to set state here to GOMP_DEVICE_FINALIZED too,
> but I'd leave that to the OpenACC folks to do that incrementally
> once they test it and/or decide what to do.

libgomp/testsuite/libgomp.oacc-c-c++-common/lib-5.c contains a call to acc_init,
next acc_shutdown, and acc_init again, so I guess that OpenACC allows to
initialize the device again after acc_shutdown, but GOMP_DEVICE_FINALIZED means
that it's terminally finalized.

> > @@ -356,6 +361,11 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
> >      }
> >  
> >    gomp_mutex_lock (&devicep->lock);
> > +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> > +    {
> > +      gomp_mutex_unlock (&devicep->lock);
> 
> You need to free (tgt); here I think to avoid leaking memory.
> 
> > +      return NULL;
> > +    }
> >  
> >    for (i = 0; i < mapnum; i++)
> >      {
> > @@ -834,6 +844,11 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
> >      }
> >  
> >    gomp_mutex_lock (&devicep->lock);
> > +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> > +    {
> > +      gomp_mutex_unlock (&devicep->lock);
> > +      return;
> 
> Supposedly you want at least free (tgt->array); free (tgt); here.
> Plus the question is if the mappings shouldn't be removed from the splay tree
> before that.
> 
> > +/* This function finalizes all initialized devices.  */
> > +
> > +static void
> > +gomp_target_fini (void)
> > +{
> > +  int i;
> > +  for (i = 0; i < num_devices; i++)
> > +    {
> > +      struct gomp_device_descr *devicep = &devices[i];
> > +      gomp_mutex_lock (&devicep->lock);
> > +      if (devicep->state == GOMP_DEVICE_INITIALIZED)
> > +	{
> > +	  devicep->fini_device_func (devicep->target_id);
> > +	  devicep->state = GOMP_DEVICE_FINALIZED;
> > +	}
> > +      gomp_mutex_unlock (&devicep->lock);
> > +    }
> > +}
> 
> The question is what will this do if there are async target tasks still
> running on some of the devices at this point (forgotten #pragma omp taskwait
> or similar if target nowait regions are started outside of parallel region,
> or exit inside of parallel, etc.  But perhaps it can be handled incrementally.
> Also there is the question that the 
> So I think the patch is ok with the above mentioned changes.
> 
> What is the state of the link clause implementation patch?  Does it depend
> on this?

It's ready, but it depends on this.  I will retest and resend "link" patch after
checking-in "init/fini" patch.

  -- Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-12-11 17:27                                   ` Jakub Jelinek
  2015-12-11 17:46                                     ` Ilya Verbin
@ 2015-12-14 16:48                                     ` Ilya Verbin
  2015-12-16 12:30                                       ` gomp_target_fini (was: [gomp4.5] Handle #pragma omp declare target link) Thomas Schwinge
  1 sibling, 1 reply; 48+ messages in thread
From: Ilya Verbin @ 2015-12-14 16:48 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Kirill Yukhin

On Fri, Dec 11, 2015 at 18:27:13 +0100, Jakub Jelinek wrote:
> On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote:
> > @@ -356,6 +361,11 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
> >      }
> >  
> >    gomp_mutex_lock (&devicep->lock);
> > +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> > +    {
> > +      gomp_mutex_unlock (&devicep->lock);
> 
> You need to free (tgt); here I think to avoid leaking memory.

Done.

> > +      return NULL;
> > +    }
> >  
> >    for (i = 0; i < mapnum; i++)
> >      {
> > @@ -834,6 +844,11 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
> >      }
> >  
> >    gomp_mutex_lock (&devicep->lock);
> > +  if (devicep->state == GOMP_DEVICE_FINALIZED)
> > +    {
> > +      gomp_mutex_unlock (&devicep->lock);
> > +      return;
> 
> Supposedly you want at least free (tgt->array); free (tgt); here.

Done.

> Plus the question is if the mappings shouldn't be removed from the splay tree
> before that.

This code can be executed only at program shutdown, so I think that removing
from the splay tree isn't necessary here, it will only consume time.
Besides, we do not remove at shutdown those vars, which have non-zero refcount.

> > +/* This function finalizes all initialized devices.  */
> > +
> > +static void
> > +gomp_target_fini (void)
> > +{
> > +  int i;
> > +  for (i = 0; i < num_devices; i++)
> > +    {
> > +      struct gomp_device_descr *devicep = &devices[i];
> > +      gomp_mutex_lock (&devicep->lock);
> > +      if (devicep->state == GOMP_DEVICE_INITIALIZED)
> > +	{
> > +	  devicep->fini_device_func (devicep->target_id);
> > +	  devicep->state = GOMP_DEVICE_FINALIZED;
> > +	}
> > +      gomp_mutex_unlock (&devicep->lock);
> > +    }
> > +}
> 
> The question is what will this do if there are async target tasks still
> running on some of the devices at this point (forgotten #pragma omp taskwait
> or similar if target nowait regions are started outside of parallel region,
> or exit inside of parallel, etc.  But perhaps it can be handled incrementally.
> Also there is the question that the 
> So I think the patch is ok with the above mentioned changes.

Here is what I've committed to trunk.


libgomp/
	* libgomp.h (gomp_device_state): New enum.
	(struct gomp_device_descr): Replace is_initialized with state.
	(gomp_fini_device): Remove declaration.
	* oacc-host.c (host_dispatch): Use state instead of is_initialized.
	* oacc-init.c (acc_init_1): Use state instead of is_initialized.
	(acc_shutdown_1): Likewise.  Inline gomp_fini_device.
	(acc_set_device_type): Use state instead of is_initialized.
	(acc_set_device_num): Likewise.
	* target.c (resolve_device): Use state instead of is_initialized.
	Do not initialize finalized device.
	(gomp_map_vars): Do nothing if device is finalized.
	(gomp_unmap_vars): Likewise.
	(gomp_update): Likewise.
	(GOMP_offload_register_ver): Use state instead of is_initialized.
	(GOMP_offload_unregister_ver): Likewise.
	(gomp_init_device): Likewise.
	(gomp_unload_device): Likewise.
	(gomp_fini_device): Remove.
	(gomp_get_target_fn_addr): Do nothing if device is finalized.
	(GOMP_target): Go to host fallback if device is finalized.
	(GOMP_target_ext): Likewise.
	(gomp_exit_data): Do nothing if device is finalized.
	(gomp_target_task_fn): Go to host fallback if device is finalized.
	(gomp_target_fini): New static function.
	(gomp_target_init): Use state instead of is_initialized.
	Call gomp_target_fini at exit.
liboffloadmic/
	* plugin/libgomp-plugin-intelmic.cpp (unregister_main_image): Remove.
	(register_main_image): Do not call unregister_main_image at exit.
	(GOMP_OFFLOAD_fini_device): Allow for OpenMP.  Unregister main image.


diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index c467f97..9d9949f 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -888,6 +888,14 @@ typedef struct acc_dispatch_t
   } cuda;
 } acc_dispatch_t;
 
+/* Various state of the accelerator device.  */
+enum gomp_device_state
+{
+  GOMP_DEVICE_UNINITIALIZED,
+  GOMP_DEVICE_INITIALIZED,
+  GOMP_DEVICE_FINALIZED
+};
+
 /* This structure describes accelerator device.
    It contains name of the corresponding libgomp plugin, function handlers for
    interaction with the device, ID-number of the device, and information about
@@ -933,8 +941,10 @@ struct gomp_device_descr
   /* Mutex for the mutable data.  */
   gomp_mutex_t lock;
 
-  /* Set to true when device is initialized.  */
-  bool is_initialized;
+  /* Current state of the device.  OpenACC allows to move from INITIALIZED state
+     back to UNINITIALIZED state.  OpenMP allows only to move from INITIALIZED
+     to FINALIZED state (at program shutdown).  */
+  enum gomp_device_state state;
 
   /* OpenACC-specific data and functions.  */
   /* This is mutable because of its mutable data_environ and target_data
@@ -962,7 +972,6 @@ extern void gomp_copy_from_async (struct target_mem_desc *);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
 extern void gomp_free_memmap (struct splay_tree_s *);
-extern void gomp_fini_device (struct gomp_device_descr *);
 extern void gomp_unload_device (struct gomp_device_descr *);
 
 /* work.c */
diff --git a/libgomp/oacc-host.c b/libgomp/oacc-host.c
index 9874804..d289b38 100644
--- a/libgomp/oacc-host.c
+++ b/libgomp/oacc-host.c
@@ -222,7 +222,7 @@ static struct gomp_device_descr host_dispatch =
 
     .mem_map = { NULL },
     /* .lock initilized in goacc_host_init.  */
-    .is_initialized = false,
+    .state = GOMP_DEVICE_UNINITIALIZED,
 
     .openacc = {
       .data_environ = NULL,
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 9a9a0b0..c4f7b67 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -225,7 +225,7 @@ acc_init_1 (acc_device_t d)
   acc_dev = &base_dev[goacc_device_num];
 
   gomp_mutex_lock (&acc_dev->lock);
-  if (acc_dev->is_initialized)
+  if (acc_dev->state == GOMP_DEVICE_INITIALIZED)
     {
       gomp_mutex_unlock (&acc_dev->lock);
       gomp_fatal ("device already active");
@@ -306,10 +306,11 @@ acc_shutdown_1 (acc_device_t d)
     {
       struct gomp_device_descr *acc_dev = &base_dev[i];
       gomp_mutex_lock (&acc_dev->lock);
-      if (acc_dev->is_initialized)
+      if (acc_dev->state == GOMP_DEVICE_INITIALIZED)
         {
 	  devices_active = true;
-	  gomp_fini_device (acc_dev);
+	  acc_dev->fini_device_func (acc_dev->target_id);
+	  acc_dev->state = GOMP_DEVICE_UNINITIALIZED;
 	}
       gomp_mutex_unlock (&acc_dev->lock);
     }
@@ -506,7 +507,7 @@ acc_set_device_type (acc_device_t d)
   acc_dev = &base_dev[goacc_device_num];
 
   gomp_mutex_lock (&acc_dev->lock);
-  if (!acc_dev->is_initialized)
+  if (acc_dev->state == GOMP_DEVICE_UNINITIALIZED)
     gomp_init_device (acc_dev);
   gomp_mutex_unlock (&acc_dev->lock);
 
@@ -608,7 +609,7 @@ acc_set_device_num (int ord, acc_device_t d)
       acc_dev = &base_dev[ord];
 
       gomp_mutex_lock (&acc_dev->lock);
-      if (!acc_dev->is_initialized)
+      if (acc_dev->state == GOMP_DEVICE_UNINITIALIZED)
         gomp_init_device (acc_dev);
       gomp_mutex_unlock (&acc_dev->lock);
 
diff --git a/libgomp/target.c b/libgomp/target.c
index cf9d0e6..932b176 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -118,8 +118,13 @@ resolve_device (int device_id)
     return NULL;
 
   gomp_mutex_lock (&devices[device_id].lock);
-  if (!devices[device_id].is_initialized)
+  if (devices[device_id].state == GOMP_DEVICE_UNINITIALIZED)
     gomp_init_device (&devices[device_id]);
+  else if (devices[device_id].state == GOMP_DEVICE_FINALIZED)
+    {
+      gomp_mutex_unlock (&devices[device_id].lock);
+      return NULL;
+    }
   gomp_mutex_unlock (&devices[device_id].lock);
 
   return &devices[device_id];
@@ -356,6 +361,12 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
     }
 
   gomp_mutex_lock (&devicep->lock);
+  if (devicep->state == GOMP_DEVICE_FINALIZED)
+    {
+      gomp_mutex_unlock (&devicep->lock);
+      free (tgt);
+      return NULL;
+    }
 
   for (i = 0; i < mapnum; i++)
     {
@@ -834,6 +845,13 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
     }
 
   gomp_mutex_lock (&devicep->lock);
+  if (devicep->state == GOMP_DEVICE_FINALIZED)
+    {
+      gomp_mutex_unlock (&devicep->lock);
+      free (tgt->array);
+      free (tgt);
+      return;
+    }
 
   size_t i;
   for (i = 0; i < tgt->list_count; i++)
@@ -896,6 +914,12 @@ gomp_update (struct gomp_device_descr *devicep, size_t mapnum, void **hostaddrs,
     return;
 
   gomp_mutex_lock (&devicep->lock);
+  if (devicep->state == GOMP_DEVICE_FINALIZED)
+    {
+      gomp_mutex_unlock (&devicep->lock);
+      return;
+    }
+
   for (i = 0; i < mapnum; i++)
     if (sizes[i])
       {
@@ -1106,7 +1130,8 @@ GOMP_offload_register_ver (unsigned version, const void *host_table,
     {
       struct gomp_device_descr *devicep = &devices[i];
       gomp_mutex_lock (&devicep->lock);
-      if (devicep->type == target_type && devicep->is_initialized)
+      if (devicep->type == target_type
+	  && devicep->state == GOMP_DEVICE_INITIALIZED)
 	gomp_load_image_to_device (devicep, version,
 				   host_table, target_data, true);
       gomp_mutex_unlock (&devicep->lock);
@@ -1150,7 +1175,8 @@ GOMP_offload_unregister_ver (unsigned version, const void *host_table,
     {
       struct gomp_device_descr *devicep = &devices[i];
       gomp_mutex_lock (&devicep->lock);
-      if (devicep->type == target_type && devicep->is_initialized)
+      if (devicep->type == target_type
+	  && devicep->state == GOMP_DEVICE_INITIALIZED)
 	gomp_unload_image_from_device (devicep, version,
 				       host_table, target_data);
       gomp_mutex_unlock (&devicep->lock);
@@ -1193,13 +1219,13 @@ gomp_init_device (struct gomp_device_descr *devicep)
 				   false);
     }
 
-  devicep->is_initialized = true;
+  devicep->state = GOMP_DEVICE_INITIALIZED;
 }
 
 attribute_hidden void
 gomp_unload_device (struct gomp_device_descr *devicep)
 {
-  if (devicep->is_initialized)
+  if (devicep->state == GOMP_DEVICE_INITIALIZED)
     {
       unsigned i;
       
@@ -1231,18 +1257,6 @@ gomp_free_memmap (struct splay_tree_s *mem_map)
     }
 }
 
-/* This function de-initializes the target device, specified by DEVICEP.
-   DEVICEP must be locked on entry, and remains locked on return.  */
-
-attribute_hidden void
-gomp_fini_device (struct gomp_device_descr *devicep)
-{
-  if (devicep->is_initialized)
-    devicep->fini_device_func (devicep->target_id);
-
-  devicep->is_initialized = false;
-}
-
 /* Host fallback for GOMP_target{,_ext} routines.  */
 
 static void
@@ -1310,6 +1324,12 @@ gomp_get_target_fn_addr (struct gomp_device_descr *devicep,
   else
     {
       gomp_mutex_lock (&devicep->lock);
+      if (devicep->state == GOMP_DEVICE_FINALIZED)
+	{
+	  gomp_mutex_unlock (&devicep->lock);
+	  return NULL;
+	}
+
       struct splay_tree_key_s k;
       k.host_start = (uintptr_t) host_fn;
       k.host_end = k.host_start + 1;
@@ -1339,12 +1359,12 @@ GOMP_target (int device, void (*fn) (void *), const void *unused,
 {
   struct gomp_device_descr *devicep = resolve_device (device);
 
+  void *fn_addr;
   if (devicep == NULL
-      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || !(fn_addr = gomp_get_target_fn_addr (devicep, fn)))
     return gomp_target_fallback (fn, hostaddrs);
 
-  void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
-
   struct target_mem_desc *tgt_vars
     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, false,
 		     GOMP_MAP_VARS_TARGET);
@@ -1430,15 +1450,15 @@ GOMP_target_ext (int device, void (*fn) (void *), size_t mapnum,
 	gomp_task_maybe_wait_for_dependencies (depend);
     }
 
+  void *fn_addr;
   if (devicep == NULL
-      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+      || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+      || !(fn_addr = gomp_get_target_fn_addr (devicep, fn)))
     {
       gomp_target_fallback_firstprivate (fn, mapnum, hostaddrs, sizes, kinds);
       return;
     }
 
-  void *fn_addr = gomp_get_target_fn_addr (devicep, fn);
-
   struct target_mem_desc *tgt_vars
     = gomp_map_vars (devicep, mapnum, hostaddrs, NULL, sizes, kinds, true,
 		     GOMP_MAP_VARS_TARGET);
@@ -1593,6 +1613,12 @@ gomp_exit_data (struct gomp_device_descr *devicep, size_t mapnum,
   const int typemask = 0xff;
   size_t i;
   gomp_mutex_lock (&devicep->lock);
+  if (devicep->state == GOMP_DEVICE_FINALIZED)
+    {
+      gomp_mutex_unlock (&devicep->lock);
+      return;
+    }
+
   for (i = 0; i < mapnum; i++)
     {
       struct splay_tree_key_s cur_node;
@@ -1729,8 +1755,10 @@ gomp_target_task_fn (void *data)
 
   if (ttask->fn != NULL)
     {
+      void *fn_addr;
       if (devicep == NULL
-	  || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400))
+	  || !(devicep->capabilities & GOMP_OFFLOAD_CAP_OPENMP_400)
+	  || !(fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn)))
 	{
 	  ttask->state = GOMP_TARGET_TASK_FALLBACK;
 	  gomp_target_fallback_firstprivate (ttask->fn, ttask->mapnum,
@@ -1745,7 +1773,6 @@ gomp_target_task_fn (void *data)
 	  return false;
 	}
 
-      void *fn_addr = gomp_get_target_fn_addr (devicep, ttask->fn);
       ttask->tgt
 	= gomp_map_vars (devicep, ttask->mapnum, ttask->hostaddrs, NULL,
 			 ttask->sizes, ttask->kinds, true,
@@ -2282,6 +2309,25 @@ gomp_load_plugin_for_device (struct gomp_device_descr *device,
   return 0;
 }
 
+/* This function finalizes all initialized devices.  */
+
+static void
+gomp_target_fini (void)
+{
+  int i;
+  for (i = 0; i < num_devices; i++)
+    {
+      struct gomp_device_descr *devicep = &devices[i];
+      gomp_mutex_lock (&devicep->lock);
+      if (devicep->state == GOMP_DEVICE_INITIALIZED)
+	{
+	  devicep->fini_device_func (devicep->target_id);
+	  devicep->state = GOMP_DEVICE_FINALIZED;
+	}
+      gomp_mutex_unlock (&devicep->lock);
+    }
+}
+
 /* This function initializes the runtime needed for offloading.
    It parses the list of offload targets and tries to load the plugins for
    these targets.  On return, the variables NUM_DEVICES and NUM_DEVICES_OPENMP
@@ -2341,7 +2387,7 @@ gomp_target_init (void)
 		/* current_device.capabilities has already been set.  */
 		current_device.type = current_device.get_type_func ();
 		current_device.mem_map.root = NULL;
-		current_device.is_initialized = false;
+		current_device.state = GOMP_DEVICE_UNINITIALIZED;
 		current_device.openacc.data_environ = NULL;
 		for (i = 0; i < new_num_devices; i++)
 		  {
@@ -2387,6 +2433,9 @@ gomp_target_init (void)
       if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
 	goacc_register (&devices[i]);
     }
+
+  if (atexit (gomp_target_fini) != 0)
+    gomp_fatal ("atexit failed");
 }
 
 #else /* PLUGIN_SUPPORT */
diff --git a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
index f8c1725..68f7b2c 100644
--- a/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
+++ b/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp
@@ -231,12 +231,6 @@ offload (const char *file, uint64_t line, int device, const char *name,
 }
 
 static void
-unregister_main_image ()
-{
-  __offload_unregister_image (&main_target_image);
-}
-
-static void
 register_main_image ()
 {
   /* Do not check the return value, because old versions of liboffloadmic did
@@ -246,12 +240,6 @@ register_main_image ()
   /* liboffloadmic will call GOMP_PLUGIN_target_task_completion when
      asynchronous task on target is completed.  */
   __offload_register_task_callback (GOMP_PLUGIN_target_task_completion);
-
-  if (atexit (unregister_main_image) != 0)
-    {
-      fprintf (stderr, "%s: atexit failed\n", __FILE__);
-      exit (1);
-    }
 }
 
 /* liboffloadmic loads and runs offload_target_main on all available devices
@@ -269,8 +257,9 @@ extern "C" void
 GOMP_OFFLOAD_fini_device (int device)
 {
   TRACE ("(device = %d)", device);
-  /* Unreachable for GOMP_OFFLOAD_CAP_OPENMP_400.  */
-  abort ();
+
+  /* liboffloadmic will finalize target processes on all available devices.  */
+  __offload_unregister_image (&main_target_image);
 }
 
 static void


  -- Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-11-30 20:55                   ` Jakub Jelinek
  2015-11-30 21:38                     ` Ilya Verbin
@ 2015-12-14 17:18                     ` Ilya Verbin
  2015-12-15  8:42                       ` Jakub Jelinek
                                         ` (2 more replies)
  1 sibling, 3 replies; 48+ messages in thread
From: Ilya Verbin @ 2015-12-14 17:18 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Kirill Yukhin

On Mon, Nov 30, 2015 at 21:49:02 +0100, Jakub Jelinek wrote:
> On Mon, Nov 30, 2015 at 11:29:34PM +0300, Ilya Verbin wrote:
> > > This looks wrong, both of these clearly could affect anything with
> > > DECL_HAS_VALUE_EXPR_P, not just the link vars.
> > > So, if you need to handle the "omp declare target link" vars specially,
> > > you should only handle those specially and nothing else.  And please try to
> > > explain why.
> > 
> > Actually these ifndefs are not needed, because assemble_decl never will be
> > called by accel compiler for original link vars.  I've added a check into
> > output_in_order, but missed a second place where assemble_decl is called -
> > symbol_table::output_variables.  So, fixed now.
> 
> Great.
> 
> > > Do we need to do anything in gomp_unload_image_from_device ?
> > > I mean at least in questionable programs that for link vars don't decrement
> > > the refcount of the var that replaced the link var to 0 first before
> > > dlclosing the library.
> > > At least host_var_table[j * 2 + 1] will have the MSB set, so we need to
> > > handle it differently.  Perhaps for that case perform a lookup, and if we
> > > get something which has link_map non-NULL, first perform as if there is
> > > target exit data delete (var) on it first?
> > 
> > You're right, it doesn't deallocate memory on the device if DSO leaves nonzero
> > refcount.  And currently host compiler doesn't set MSB in host_var_table, it's
> > set only by accel compiler.  But it's possible to do splay_tree_lookup for each
> > var to determine whether is it linked or not, like in the patch bellow.
> > Or do you prefer to set the bit in host compiler too?  It requires
> > lookup_attribute ("omp declare target link") for all vars in the table during
> > compilation, but allows to do splay_tree_lookup at run-time only for vars with
> > MSB set in host_var_table.
> > Unfortunately, calling gomp_exit_data from gomp_unload_image_from_device works
> > only for DSO, but it crashed when an executable leaves nonzero refcount, because
> > target device may be already uninitialized from plugin's __run_exit_handlers
> > (and it is in case of intelmic), so gomp_exit_data cannot run free_func.
> > Is it possible do add some atexit (...) to libgomp, which will set shutting_down
> > flag, and just do nothing in gomp_unload_image_from_device if it is set?
> 
> Sorry, I didn't mean you should call gomp_exit_data, what I meant was that
> you perform the same action as would delete(var) do in that case.
> Calling gomp_exit_data e.g. looks it up again etc.
> Supposedly having the MSB in host table too is useful, so if you could
> handle that, it would be nice.  And splay_tree_lookup only if the MSB is
> set.
> So,
>     if (!host_data_has_msb_set)
>       splay_tree_remove (&devicep->mem_map, &k);
>     else
>       {
>         splay_tree_key n = splay_tree_lookup (&devicep->mem_map, &k);
>         if (n->link_key)
> 	  {
> 	    n->refcount = 0;
> 	    n->link_key = NULL;
> 	    splay_tree_remove (&devicep->mem_map, n);
> 	    if (n->tgt->refcount > 1)
> 	      n->tgt->refcount--;
> 	    else
> 	      gomp_unmap_tgt (n->tgt);
> 	  }
> 	else
> 	  splay_tree_remove (&devicep->mem_map, n);
>       }
> or so.

Here is an updated patch.  Now MSB is set in both tables, and
gomp_unload_image_from_device is changed.  I've verified using simple DSO
testcase, that memory on target is freed after dlclose.
bootstrap and make check on x86_64-linux passed.


gcc/c-family/
	* c-common.c (c_common_attribute_table): Handle "omp declare target
	link" attribute.
gcc/
	* cgraphunit.c (output_in_order): Do not assemble "omp declare target
	link" variables in ACCEL_COMPILER.
	* gimplify.c (gimplify_adjust_omp_clauses): Do not remove mapping of
	"omp declare target link" variables.
	* lto/lto.c: Include stringpool.h and fold-const.h.
	(offload_handle_link_vars): New static function.
	(lto_main): Call offload_handle_link_vars.
	* omp-low.c (scan_sharing_clauses): Do not remove mapping of "omp
	declare target link" variables.
	(add_decls_addresses_to_decl_constructor): For "omp declare target link"
	variables output address of the artificial pointer instead of address of
	the variable.  Set most significant bit of the size to mark them.
	(pass_data_omp_target_link): New pass_data.
	(pass_omp_target_link): New class.
	(find_link_var_op): New static function.
	(make_pass_omp_target_link): New function.
	* passes.def: Add pass_omp_target_link.
	* tree-pass.h (make_pass_omp_target_link): Declare.
	* varpool.c (symbol_table::output_variables): Do not assemble "omp
	declare target link" variables in ACCEL_COMPILER.
libgomp/
	* libgomp.h (REFCOUNT_LINK): Define.
	(struct splay_tree_key_s): Add link_key.
	* target.c (gomp_map_vars): Treat REFCOUNT_LINK objects as not mapped.
	Replace target address of the pointer with target address of newly
	mapped object in the splay tree.  Set link pointer on target to the
	device address of the mapped object.
	(gomp_unmap_vars): Restore target address of the pointer in the splay
	tree for REFCOUNT_LINK objects after unmapping.
	(gomp_load_image_to_device): Set refcount to REFCOUNT_LINK for "omp
	declare target link" objects.
	(gomp_unload_image_from_device): Replace j with i.  Force unmap of all
	"omp declare target link" objects, which were mapped for the image.
	(gomp_exit_data): Restore target address of the pointer in the splay
	tree for REFCOUNT_LINK objects after unmapping.
	* testsuite/libgomp.c/target-link-1.c: New file.


diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 9bc02fc..4250cdf 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -821,6 +821,8 @@ const struct attribute_spec c_common_attribute_table[] =
 			      handle_simd_attribute, false },
   { "omp declare target",     0, 0, true, false, false,
 			      handle_omp_declare_target_attribute, false },
+  { "omp declare target link", 0, 0, true, false, false,
+			      handle_omp_declare_target_attribute, false },
   { "alloc_align",	      1, 1, false, true, true,
 			      handle_alloc_align_attribute, false },
   { "assume_aligned",	      1, 2, false, true, true,
diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 3d86c36..8443cb0 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2210,6 +2210,13 @@ output_in_order (bool no_reorder)
 	  break;
 
 	case ORDER_VAR:
+#ifdef ACCEL_COMPILER
+	  /* Do not assemble "omp declare target link" vars.  */
+	  if (DECL_HAS_VALUE_EXPR_P (nodes[i].u.v->decl)
+	      && lookup_attribute ("omp declare target link",
+				   DECL_ATTRIBUTES (nodes[i].u.v->decl)))
+	    break;
+#endif
 	  nodes[i].u.v->assemble_decl ();
 	  break;
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 80c6bf2..438efba 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7910,7 +7910,9 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
 	  n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
 	  if ((ctx->region_type & ORT_TARGET) != 0
 	      && !(n->value & GOVD_SEEN)
-	      && GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)) == 0)
+	      && GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)) == 0
+	      && !lookup_attribute ("omp declare target link",
+				    DECL_ATTRIBUTES (decl)))
 	    {
 	      remove = true;
 	      /* For struct element mapping, if struct is never referenced
diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
index fcf7caf..5fd50dc 100644
--- a/gcc/lto/lto.c
+++ b/gcc/lto/lto.c
@@ -50,6 +50,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-utils.h"
 #include "gomp-constants.h"
 #include "lto-symtab.h"
+#include "stringpool.h"
+#include "fold-const.h"
 
 
 /* Number of parallel tasks to run, -1 if we want to use GNU Make jobserver.  */
@@ -3226,6 +3228,37 @@ lto_init (void)
 #endif
 }
 
+/* Create artificial pointers for "omp declare target link" vars.  */
+
+static void
+offload_handle_link_vars (void)
+{
+#ifdef ACCEL_COMPILER
+  varpool_node *var;
+  FOR_EACH_VARIABLE (var)
+    if (lookup_attribute ("omp declare target link",
+			  DECL_ATTRIBUTES (var->decl)))
+      {
+	tree type = build_pointer_type (TREE_TYPE (var->decl));
+	tree link_ptr_var = make_node (VAR_DECL);
+	TREE_TYPE (link_ptr_var) = type;
+	TREE_USED (link_ptr_var) = 1;
+	TREE_STATIC (link_ptr_var) = 1;
+	DECL_MODE (link_ptr_var) = TYPE_MODE (type);
+	DECL_SIZE (link_ptr_var) = TYPE_SIZE (type);
+	DECL_SIZE_UNIT (link_ptr_var) = TYPE_SIZE_UNIT (type);
+	DECL_ARTIFICIAL (link_ptr_var) = 1;
+	tree var_name = DECL_ASSEMBLER_NAME (var->decl);
+	char *new_name
+	  = ACONCAT ((IDENTIFIER_POINTER (var_name), "_linkptr", NULL));
+	DECL_NAME (link_ptr_var) = get_identifier (new_name);
+	SET_DECL_ASSEMBLER_NAME (link_ptr_var, DECL_NAME (link_ptr_var));
+	SET_DECL_VALUE_EXPR (var->decl, build_simple_mem_ref (link_ptr_var));
+	DECL_HAS_VALUE_EXPR_P (var->decl) = 1;
+      }
+#endif
+}
+
 
 /* Main entry point for the GIMPLE front end.  This front end has
    three main personalities:
@@ -3274,6 +3307,8 @@ lto_main (void)
 
   if (!seen_error ())
     {
+      offload_handle_link_vars ();
+
       /* If WPA is enabled analyze the whole call graph and create an
 	 optimization plan.  Otherwise, read in all the function
 	 bodies and continue with optimization.  */
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 5643480..676b1df 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -2026,7 +2026,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 	  decl = OMP_CLAUSE_DECL (c);
 	  /* Global variables with "omp declare target" attribute
 	     don't need to be copied, the receiver side will use them
-	     directly.  */
+	     directly.  However, global variables with "omp declare target link"
+	     attribute need to be copied.  */
 	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 	      && DECL_P (decl)
 	      && ((OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FIRSTPRIVATE_POINTER
@@ -2034,7 +2035,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
 		       != GOMP_MAP_FIRSTPRIVATE_REFERENCE))
 		  || TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
 	      && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
-	      && varpool_node::get_create (decl)->offloadable)
+	      && varpool_node::get_create (decl)->offloadable
+	      && !lookup_attribute ("omp declare target link",
+				    DECL_ATTRIBUTES (decl)))
 	    break;
 	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
 	      && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER)
@@ -18588,13 +18591,45 @@ add_decls_addresses_to_decl_constructor (vec<tree, va_gc> *v_decls,
   for (unsigned i = 0; i < len; i++)
     {
       tree it = (*v_decls)[i];
-      bool is_function = TREE_CODE (it) != VAR_DECL;
+      bool is_var = TREE_CODE (it) == VAR_DECL;
+      bool is_link_var
+	= is_var
+#ifdef ACCEL_COMPILER
+	  && DECL_HAS_VALUE_EXPR_P (it)
+#endif
+	  && lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (it));
 
-      CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, build_fold_addr_expr (it));
-      if (!is_function)
-	CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE,
-				fold_convert (const_ptr_type_node,
-					      DECL_SIZE_UNIT (it)));
+      tree size = NULL_TREE;
+      if (is_var)
+	size = fold_convert (const_ptr_type_node, DECL_SIZE_UNIT (it));
+
+      tree addr;
+      if (!is_link_var)
+	addr = build_fold_addr_expr (it);
+      else
+	{
+#ifdef ACCEL_COMPILER
+	  /* For "omp declare target link" vars add address of the pointer to
+	     the target table, instead of address of the var.  */
+	  tree value_expr = DECL_VALUE_EXPR (it);
+	  tree link_ptr_decl = TREE_OPERAND (value_expr, 0);
+	  varpool_node::finalize_decl (link_ptr_decl);
+	  addr = build_fold_addr_expr (link_ptr_decl);
+#else
+	  addr = build_fold_addr_expr (it);
+#endif
+
+	  /* Most significant bit of the size marks "omp declare target link"
+	     vars in host and target tables.  */
+	  unsigned HOST_WIDE_INT isize = tree_to_uhwi (size);
+	  isize |= 1ULL << (int_size_in_bytes (const_ptr_type_node)
+			    * BITS_PER_UNIT - 1);
+	  size = wide_int_to_tree (const_ptr_type_node, isize);
+	}
+
+      CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, addr);
+      if (is_var)
+	CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, size);
     }
 }
 
@@ -19831,4 +19866,84 @@ make_pass_oacc_device_lower (gcc::context *ctxt)
   return new pass_oacc_device_lower (ctxt);
 }
 
+/* "omp declare target link" handling pass.  */
+
+namespace {
+
+const pass_data pass_data_omp_target_link =
+{
+  GIMPLE_PASS,			/* type */
+  "omptargetlink",		/* name */
+  OPTGROUP_NONE,		/* optinfo_flags */
+  TV_NONE,			/* tv_id */
+  PROP_ssa,			/* properties_required */
+  0,				/* properties_provided */
+  0,				/* properties_destroyed */
+  0,				/* todo_flags_start */
+  TODO_update_ssa,		/* todo_flags_finish */
+};
+
+class pass_omp_target_link : public gimple_opt_pass
+{
+public:
+  pass_omp_target_link (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_omp_target_link, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *fun)
+    {
+#ifdef ACCEL_COMPILER
+      tree attrs = DECL_ATTRIBUTES (fun->decl);
+      return lookup_attribute ("omp declare target", attrs)
+	     || lookup_attribute ("omp target entrypoint", attrs);
+#else
+      (void) fun;
+      return false;
+#endif
+    }
+
+  virtual unsigned execute (function *);
+};
+
+/* Callback for walk_gimple_stmt used to scan for link var operands.  */
+
+static tree
+find_link_var_op (tree *tp, int *walk_subtrees, void *)
+{
+  tree t = *tp;
+
+  if (TREE_CODE (t) == VAR_DECL && DECL_HAS_VALUE_EXPR_P (t)
+      && lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (t)))
+    {
+      *walk_subtrees = 0;
+      return t;
+    }
+
+  return NULL_TREE;
+}
+
+unsigned
+pass_omp_target_link::execute (function *fun)
+{
+  basic_block bb;
+  FOR_EACH_BB_FN (bb, fun)
+    {
+      gimple_stmt_iterator gsi;
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	if (walk_gimple_stmt (&gsi, NULL, find_link_var_op, NULL))
+	  gimple_regimplify_operands (gsi_stmt (gsi), &gsi);
+    }
+
+  return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_omp_target_link (gcc::context *ctxt)
+{
+  return new pass_omp_target_link (ctxt);
+}
+
 #include "gt-omp-low.h"
diff --git a/gcc/passes.def b/gcc/passes.def
index 43ce3d5..c72b38b 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -170,6 +170,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_fixup_cfg);
   NEXT_PASS (pass_lower_eh_dispatch);
   NEXT_PASS (pass_oacc_device_lower);
+  NEXT_PASS (pass_omp_target_link);
   NEXT_PASS (pass_all_optimizations);
   PUSH_INSERT_PASSES_WITHIN (pass_all_optimizations)
       NEXT_PASS (pass_remove_cgraph_callee_edges);
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index e1cbce9..a13a865 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -417,6 +417,7 @@ extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_omp_target_link (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_oacc_device_lower (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_object_sizes (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 5e4fcbf..d0101a1 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -748,6 +748,13 @@ symbol_table::output_variables (void)
       /* Handled in output_in_order.  */
       if (node->no_reorder)
 	continue;
+#ifdef ACCEL_COMPILER
+      /* Do not assemble "omp declare target link" vars.  */
+      if (DECL_HAS_VALUE_EXPR_P (node->decl)
+	  && lookup_attribute ("omp declare target link",
+			       DECL_ATTRIBUTES (node->decl)))
+	continue;
+#endif
       if (node->assemble_decl ())
         changed = true;
     }
diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 9d9949f..73aa513 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -817,6 +817,9 @@ struct target_mem_desc {
 
 /* Special value for refcount - infinity.  */
 #define REFCOUNT_INFINITY (~(uintptr_t) 0)
+/* Special value for refcount - tgt_offset contains target address of the
+   artificial pointer to "omp declare target link" object.  */
+#define REFCOUNT_LINK (~(uintptr_t) 1)
 
 struct splay_tree_key_s {
   /* Address of the host object.  */
@@ -831,6 +834,8 @@ struct splay_tree_key_s {
   uintptr_t refcount;
   /* Asynchronous reference count.  */
   uintptr_t async_refcount;
+  /* Pointer to the original mapping of "omp declare target link" object.  */
+  splay_tree_key link_key;
 };
 
 /* The comparison function.  */
diff --git a/libgomp/target.c b/libgomp/target.c
index 932b176..1ab30f7 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -464,7 +464,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	}
       else
 	n = splay_tree_lookup (mem_map, &cur_node);
-      if (n)
+      if (n && n->refcount != REFCOUNT_LINK)
 	gomp_map_vars_existing (devicep, n, &cur_node, &tgt->list[i],
 				kind & typemask);
       else
@@ -628,11 +628,19 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 	    else
 	      k->host_end = k->host_start + sizeof (void *);
 	    splay_tree_key n = splay_tree_lookup (mem_map, k);
-	    if (n)
+	    if (n && n->refcount != REFCOUNT_LINK)
 	      gomp_map_vars_existing (devicep, n, k, &tgt->list[i],
 				      kind & typemask);
 	    else
 	      {
+		k->link_key = NULL;
+		if (n && n->refcount == REFCOUNT_LINK)
+		  {
+		    /* Replace target address of the pointer with target address
+		       of mapped object in the splay tree.  */
+		    splay_tree_remove (mem_map, n);
+		    k->link_key = n;
+		  }
 		size_t align = (size_t) 1 << (kind >> rshift);
 		tgt->list[i].key = k;
 		k->tgt = tgt;
@@ -752,6 +760,16 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
 		    gomp_fatal ("%s: unhandled kind 0x%.2x", __FUNCTION__,
 				kind);
 		  }
+
+		if (k->link_key)
+		  {
+		    /* Set link pointer on target to the device address of the
+		       mapped object.  */
+		    void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset);
+		    devicep->host2dev_func (devicep->target_id,
+					    (void *) n->tgt_offset,
+					    &tgt_addr, sizeof (void *));
+		  }
 		array++;
 	      }
 	  }
@@ -884,6 +902,9 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
       if (do_unmap)
 	{
 	  splay_tree_remove (&devicep->mem_map, k);
+	  if (k->link_key)
+	    splay_tree_insert (&devicep->mem_map,
+			       (splay_tree_node) k->link_key);
 	  if (k->tgt->refcount > 1)
 	    k->tgt->refcount--;
 	  else
@@ -1020,31 +1041,40 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
       k->tgt_offset = target_table[i].start;
       k->refcount = REFCOUNT_INFINITY;
       k->async_refcount = 0;
+      k->link_key = NULL;
       array->left = NULL;
       array->right = NULL;
       splay_tree_insert (&devicep->mem_map, array);
       array++;
     }
 
+  /* Most significant bit of the size in host and target tables marks
+     "omp declare target link" variables.  */
+  const uintptr_t link_bit = 1ULL << (sizeof (uintptr_t) * __CHAR_BIT__ - 1);
+  const uintptr_t size_mask = ~link_bit;
+
   for (i = 0; i < num_vars; i++)
     {
       struct addr_pair *target_var = &target_table[num_funcs + i];
-      if (target_var->end - target_var->start
-	  != (uintptr_t) host_var_table[i * 2 + 1])
+      uintptr_t target_size = target_var->end - target_var->start;
+
+      if ((uintptr_t) host_var_table[i * 2 + 1] != target_size)
 	{
 	  gomp_mutex_unlock (&devicep->lock);
 	  if (is_register_lock)
 	    gomp_mutex_unlock (&register_lock);
-	  gomp_fatal ("Can't map target variables (size mismatch)");
+	  gomp_fatal ("Cannot map target variables (size mismatch)");
 	}
 
       splay_tree_key k = &array->key;
       k->host_start = (uintptr_t) host_var_table[i * 2];
-      k->host_end = k->host_start + (uintptr_t) host_var_table[i * 2 + 1];
+      k->host_end
+	= k->host_start + (size_mask & (uintptr_t) host_var_table[i * 2 + 1]);
       k->tgt = tgt;
       k->tgt_offset = target_var->start;
-      k->refcount = REFCOUNT_INFINITY;
+      k->refcount = target_size & link_bit ? REFCOUNT_LINK : REFCOUNT_INFINITY;
       k->async_refcount = 0;
+      k->link_key = NULL;
       array->left = NULL;
       array->right = NULL;
       splay_tree_insert (&devicep->mem_map, array);
@@ -1072,7 +1102,6 @@ gomp_unload_image_from_device (struct gomp_device_descr *devicep,
   int num_funcs = host_funcs_end - host_func_table;
   int num_vars  = (host_vars_end - host_var_table) / 2;
 
-  unsigned j;
   struct splay_tree_key_s k;
   splay_tree_key node = NULL;
 
@@ -1088,21 +1117,46 @@ gomp_unload_image_from_device (struct gomp_device_descr *devicep,
   devicep->unload_image_func (devicep->target_id, version, target_data);
 
   /* Remove mappings from splay tree.  */
-  for (j = 0; j < num_funcs; j++)
+  int i;
+  for (i = 0; i < num_funcs; i++)
     {
-      k.host_start = (uintptr_t) host_func_table[j];
+      k.host_start = (uintptr_t) host_func_table[i];
       k.host_end = k.host_start + 1;
       splay_tree_remove (&devicep->mem_map, &k);
     }
 
-  for (j = 0; j < num_vars; j++)
+  /* Most significant bit of the size in host and target tables marks
+     "omp declare target link" variables.  */
+  const uintptr_t link_bit = 1ULL << (sizeof (uintptr_t) * __CHAR_BIT__ - 1);
+  const uintptr_t size_mask = ~link_bit;
+  bool is_tgt_unmapped = false;
+
+  for (i = 0; i < num_vars; i++)
     {
-      k.host_start = (uintptr_t) host_var_table[j * 2];
-      k.host_end = k.host_start + (uintptr_t) host_var_table[j * 2 + 1];
-      splay_tree_remove (&devicep->mem_map, &k);
+      k.host_start = (uintptr_t) host_var_table[i * 2];
+      k.host_end
+	= k.host_start + (size_mask & (uintptr_t) host_var_table[i * 2 + 1]);
+
+      if (!(link_bit & (uintptr_t) host_var_table[i * 2 + 1]))
+	splay_tree_remove (&devicep->mem_map, &k);
+      else
+	{
+	  splay_tree_key n = splay_tree_lookup (&devicep->mem_map, &k);
+	  splay_tree_remove (&devicep->mem_map, n);
+	  if (n->link_key)
+	    {
+	      if (n->tgt->refcount > 1)
+		n->tgt->refcount--;
+	      else
+		{
+		  is_tgt_unmapped = true;
+		  gomp_unmap_tgt (n->tgt);
+		}
+	    }
+	}
     }
 
-  if (node)
+  if (node && !is_tgt_unmapped)
     {
       free (node->tgt);
       free (node);
@@ -1658,6 +1712,9 @@ gomp_exit_data (struct gomp_device_descr *devicep, size_t mapnum,
 	  if (k->refcount == 0)
 	    {
 	      splay_tree_remove (&devicep->mem_map, k);
+	      if (k->link_key)
+		splay_tree_insert (&devicep->mem_map,
+				   (splay_tree_node) k->link_key);
 	      if (k->tgt->refcount > 1)
 		k->tgt->refcount--;
 	      else
diff --git a/libgomp/testsuite/libgomp.c/target-link-1.c b/libgomp/testsuite/libgomp.c/target-link-1.c
new file mode 100644
index 0000000..681677c
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c/target-link-1.c
@@ -0,0 +1,63 @@
+struct S { int s, t; };
+
+int a = 1, b = 1;
+double c[27];
+struct S d = { 8888, 8888 };
+#pragma omp declare target link (a) to (b) link (c, d)
+
+int
+foo (void)
+{
+  return a++ + b++;
+}
+
+int
+bar (int n)
+{
+  int *p1 = &a;
+  int *p2 = &b;
+  c[n] += 2.0;
+  d.s -= 2;
+  d.t -= 2;
+  return *p1 + *p2 + d.s + d.t;
+}
+
+#pragma omp declare target (foo, bar)
+
+int
+main ()
+{
+  a = b = 2;
+  d.s = 17;
+  d.t = 18;
+
+  int res, n = 10;
+  #pragma omp target map (to: a, b, c, d) map (from: res)
+  {
+    res = foo () + foo ();
+    c[n] = 3.0;
+    res += bar (n);
+  }
+
+  int shared_mem = 0;
+  #pragma omp target map (alloc: shared_mem)
+    shared_mem = 1;
+
+  if ((shared_mem && res != (2 + 2) + (3 + 3) + (4 + 4 + 15 + 16))
+      || (!shared_mem && res != (2 + 1) + (3 + 2) + (4 + 3 + 15 + 16)))
+    __builtin_abort ();
+
+  #pragma omp target enter data map (to: c)
+  #pragma omp target update from (c)
+  res = (int) (c[n] + 0.5);
+  if ((shared_mem && res != 5) || (!shared_mem && res != 0))
+    __builtin_abort ();
+
+  #pragma omp target map (to: a, b) map (from: res)
+    res = foo ();
+
+  if ((shared_mem && res != 4 + 4) || (!shared_mem && res != 2 + 3))
+    __builtin_abort ();
+
+  return 0;
+}


  -- Ilya

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-12-14 17:18                     ` [gomp4.5] Handle #pragma omp declare target link Ilya Verbin
@ 2015-12-15  8:42                       ` Jakub Jelinek
  2015-12-16 16:21                       ` Thomas Schwinge
  2019-06-26 16:23                       ` [gomp4.5] Handle #pragma omp declare target link Thomas Schwinge
  2 siblings, 0 replies; 48+ messages in thread
From: Jakub Jelinek @ 2015-12-15  8:42 UTC (permalink / raw)
  To: Ilya Verbin; +Cc: gcc-patches, Kirill Yukhin

On Mon, Dec 14, 2015 at 08:17:33PM +0300, Ilya Verbin wrote:
> Here is an updated patch.  Now MSB is set in both tables, and
> gomp_unload_image_from_device is changed.  I've verified using simple DSO
> testcase, that memory on target is freed after dlclose.
> bootstrap and make check on x86_64-linux passed.
> 
> gcc/c-family/
> 	* c-common.c (c_common_attribute_table): Handle "omp declare target
> 	link" attribute.
> gcc/
> 	* cgraphunit.c (output_in_order): Do not assemble "omp declare target
> 	link" variables in ACCEL_COMPILER.
> 	* gimplify.c (gimplify_adjust_omp_clauses): Do not remove mapping of
> 	"omp declare target link" variables.
> 	* lto/lto.c: Include stringpool.h and fold-const.h.
> 	(offload_handle_link_vars): New static function.
> 	(lto_main): Call offload_handle_link_vars.

lto/ has its own ChangeLog file, so please move the entry there and remove
the lto/ prefix.

Ok with that change, thanks.

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* gomp_target_fini (was: [gomp4.5] Handle #pragma omp declare target link)
  2015-12-14 16:48                                     ` Ilya Verbin
@ 2015-12-16 12:30                                       ` Thomas Schwinge
  2015-12-23 11:05                                         ` gomp_target_fini Thomas Schwinge
  2016-01-21 15:24                                         ` gomp_target_fini Bernd Schmidt
  0 siblings, 2 replies; 48+ messages in thread
From: Thomas Schwinge @ 2015-12-16 12:30 UTC (permalink / raw)
  To: Ilya Verbin, Jakub Jelinek, Chung-Lin Tang, James Norris
  Cc: gcc-patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 7009 bytes --]

Hi!

On Mon, 14 Dec 2015 19:47:36 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> On Fri, Dec 11, 2015 at 18:27:13 +0100, Jakub Jelinek wrote:
> > On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote:
> > > +/* This function finalizes all initialized devices.  */
> > > +
> > > +static void
> > > +gomp_target_fini (void)
> > > +{
> > > +  [...]
> > 
> > The question is what will this do if there are async target tasks still
> > running on some of the devices at this point (forgotten #pragma omp taskwait
> > or similar if target nowait regions are started outside of parallel region,
> > or exit inside of parallel, etc.  But perhaps it can be handled incrementally.
> > Also there is the question that the 
> > So I think the patch is ok with the above mentioned changes.
> 
> Here is what I've committed to trunk.

> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h
> @@ -888,6 +888,14 @@ typedef struct acc_dispatch_t
>    } cuda;
>  } acc_dispatch_t;
>  
> +/* Various state of the accelerator device.  */
> +enum gomp_device_state
> +{
> +  GOMP_DEVICE_UNINITIALIZED,
> +  GOMP_DEVICE_INITIALIZED,
> +  GOMP_DEVICE_FINALIZED
> +};
> +
>  /* This structure describes accelerator device.
>     It contains name of the corresponding libgomp plugin, function handlers for
>     interaction with the device, ID-number of the device, and information about
> @@ -933,8 +941,10 @@ struct gomp_device_descr
>    /* Mutex for the mutable data.  */
>    gomp_mutex_t lock;
>  
> -  /* Set to true when device is initialized.  */
> -  bool is_initialized;
> +  /* Current state of the device.  OpenACC allows to move from INITIALIZED state
> +     back to UNINITIALIZED state.  OpenMP allows only to move from INITIALIZED
> +     to FINALIZED state (at program shutdown).  */
> +  enum gomp_device_state state;

(ACK, but I assume we'll want to make sure that an OpenACC device is
never re-initialized if we're in/after the libgomp finalization phase.)


The issue mentioned above: "exit inside of parallel" is actually a
problem for nvptx offloading: the libgomp.oacc-c-c++-common/abort-1.c,
libgomp.oacc-c-c++-common/abort-3.c, and libgomp.oacc-fortran/abort-1.f90
test cases now run into annoying "WARNING: program timed out".  Here is
what's happening, as I understand it: in
libgomp/plugin/plugin-nvptx.c:nvptx_exec, the cuStreamSynchronize call
returns CUDA_ERROR_LAUNCH_FAILED, upon which we call GOMP_PLUGIN_fatal.

> --- a/libgomp/target.c
> +++ b/libgomp/target.c

> +/* This function finalizes all initialized devices.  */
> +
> +static void
> +gomp_target_fini (void)
> +{
> +  int i;
> +  for (i = 0; i < num_devices; i++)
> +    {
> +      struct gomp_device_descr *devicep = &devices[i];
> +      gomp_mutex_lock (&devicep->lock);
> +      if (devicep->state == GOMP_DEVICE_INITIALIZED)
> +	{
> +	  devicep->fini_device_func (devicep->target_id);
> +	  devicep->state = GOMP_DEVICE_FINALIZED;
> +	}
> +      gomp_mutex_unlock (&devicep->lock);
> +    }
> +}

> @@ -2387,6 +2433,9 @@ gomp_target_init (void)
>        if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
>  	goacc_register (&devices[i]);
>      }
> +
> +  if (atexit (gomp_target_fini) != 0)
> +    gomp_fatal ("atexit failed");
>  }

Now, with the above change installed, GOMP_PLUGIN_fatal will trigger the
atexit handler, gomp_target_fini, which, with the device lock held, will
call back into the plugin, GOMP_OFFLOAD_fini_device, which will try to
clean up.

Because of the earlier CUDA_ERROR_LAUNCH_FAILED, the associated CUDA
context is now in an inconsistent state, see
<https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TYPES.html>:

    CUDA_ERROR_LAUNCH_FAILED = 719
        An exception occurred on the device while executing a
        kernel. Common causes include dereferencing an invalid device
        pointer and accessing out of bounds shared memory. The context
        cannot be used, so it must be destroyed (and a new one should be
        created). All existing device memory allocations from this
        context are invalid and must be reconstructed if the program is
        to continue using CUDA.

Thus, any cuMemFreeHost invocations that are run during clean-up will now
also/still return CUDA_ERROR_LAUNCH_FAILED, due to which we'll again call
GOMP_PLUGIN_fatal, which again will trigger the same or another
(GOMP_offload_unregister_ver) atexit handler, which will then deadlock
trying to lock the device again, which is still locked.

(Jim, I wonder: after the first CUDA_ERROR_LAUNCH_FAILED and similar
errors, should we destroy the context right away, or toggle a boolean
flag to mark it as unusable, and use that as an indication to avoid the
follow-on failures of cuMemFreeHost just described above, for example?)

<http://pubs.opengroup.org/onlinepubs/9699919799/functions/atexit.html>
tells us:

    Since the behavior is undefined if the exit() function is called more
    than once, portable applications calling atexit() must ensure that the
    exit() function is not called at normal process termination when all
    functions registered by the atexit() function are called.

... which we're violating here, at least in the nvptx plugin.  I have not
analyzed the intermic one.

As it happens, Chung-Lin has been working in that area:
<http://news.gmane.org/find-root.php?message_id=%3C55DF1452.9050501%40codesourcery.com%3E>,
which he recently re-posted:
<http://news.gmane.org/find-root.php?message_id=%3C566EE49A.3050403%40codesourcery.com%3E>,
<http://news.gmane.org/find-root.php?message_id=%3C566EC310.8000403%40codesourcery.com%3E>,
<http://news.gmane.org/find-root.php?message_id=%3C566EC324.9050505%40codesourcery.com%3E>.
I have not analyzed whether his changes would completely resolve the
problem just described, but at least conceptually they seem to be a step
into the right direction?  (Jakub?)

Now, to resolve the immediate problem, what is the right thing for us to
do?  Is the following simple change OK, or is there a reason to still run
atexit handlers if terminating under error conditions?

commit b1733e8f9df6ae7d6828e2194df1b314772701c5
Author: Thomas Schwinge <thomas@codesourcery.com>
Date:   Wed Dec 16 13:10:39 2015 +0100

    Avoid deadlocks in libgomp due to competing atexit handlers
    
    	libgomp/
    	* error.c (gomp_vfatal): Call _exit instead of exit.
---
 libgomp/error.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git libgomp/error.c libgomp/error.c
index 094c24a..1ef7491 100644
--- libgomp/error.c
+++ libgomp/error.c
@@ -34,6 +34,7 @@
 #include <stdarg.h>
 #include <stdio.h>
 #include <stdlib.h>
+#include <unistd.h>
 
 
 #undef gomp_vdebug
@@ -77,7 +78,7 @@ void
 gomp_vfatal (const char *fmt, va_list list)
 {
   gomp_verror (fmt, list);
-  exit (EXIT_FAILURE);
+  _exit (EXIT_FAILURE);
 }
 
 void


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-12-14 17:18                     ` [gomp4.5] Handle #pragma omp declare target link Ilya Verbin
  2015-12-15  8:42                       ` Jakub Jelinek
@ 2015-12-16 16:21                       ` Thomas Schwinge
  2016-01-07 18:57                         ` [gomp4] Fix use of declare'd vars by routine procedures James Norris
  2019-06-26 16:23                       ` [gomp4.5] Handle #pragma omp declare target link Thomas Schwinge
  2 siblings, 1 reply; 48+ messages in thread
From: Thomas Schwinge @ 2015-12-16 16:21 UTC (permalink / raw)
  To: Ilya Verbin, Jakub Jelinek, James Norris; +Cc: gcc-patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 25795 bytes --]

Hi!

On Mon, 14 Dec 2015 20:17:33 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> [updated patch]

This regresses libgomp.oacc-c-c++-common/declare-4.c compilation for
nvptx offloading:

    spawn [...]/build-gcc/gcc/xgcc -B[...]/build-gcc/gcc/ [...]/source-gcc/libgomp/testsuite/libgomp.oacc-c/../libgomp.oacc-c-c++-common/declare-4.c -B[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp/ -B[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp/.libs -I[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp -I[...]/source-gcc/libgomp/testsuite/../../include -I[...]/source-gcc/libgomp/testsuite/.. -fmessage-length=0 -fno-diagnostics-show-caret -fdiagnostics-color=never -B/libexec/gcc/x86_64-pc-linux-gnu/6.0.0 -B/bin -B[...]/build-gcc/gcc/accel/x86_64-intelmicemul-linux-gnu/fake_install/libexec/gcc/x86_64-pc-linux-gnu/6.0.0 -B[...]/build-gcc/gcc/accel/x86_64-intelmicemul-linux-gnu/fake_install/bin -fopenacc -I[...]/source-gcc/libgomp/testsuite/libgomp.oacc-c-c++-common -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none -O2 -L[...]/build-gcc/x86_64-pc-linux-gnu/./libgomp/.libs -lm -o ./declare-4.exe
    ptxas /tmp/ccLXqNjE.o, line 50; error   : State space mismatch between instruction and address in instruction 'ld'
    ptxas /tmp/ccLXqNjE.o, line 50; error   : Unknown symbol 'b_linkptr'
    ptxas /tmp/ccLXqNjE.o, line 50; error   : Label expected for forward reference of 'b_linkptr'
    ptxas fatal   : Ptx assembly aborted due to errors
    nvptx-as: ptxas returned 255 exit status
    mkoffload: fatal error: [...]/build-gcc/gcc/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
    compilation terminated.

That "b_linkptr" symbol is not declared/referenced in the test case
itself, libgomp/testsuite/libgomp.oacc-c-c++-common/declare-4.c:

    /* { dg-do run  { target openacc_nvidia_accel_selected } } */
    
    #include <stdlib.h>
    #include <openacc.h>
    
    float b;
    #pragma acc declare link (b)
    
    #pragma acc routine
    int
    func (int a)
    {
      b = a + 1;
    
      return b;
    }
    
    int
    main (int argc, char **argv)
    {
      float a;
    
      a = 2.0;
    
    #pragma acc parallel copy (a)
      {
        b = a;
        a = 1.0;
        a = a + b;
      }
    
      if (a != 3.0)
        abort ();
    
      a = func (a);
    
      if (a != 4.0)
        abort ();
    
      return 0;
    }

..., but I see that the "b_linkptr" identifier is generated for "b" in
the new gcc/lto/lto.c:offload_handle_link_vars based on whether attribute
"omp declare target link" is set, so maybe we fail to set that one as
appropriate?  Jim, as the main author of the OpenACC declare
implementation, would you please have a look?  I have not yet studied in
detail the thread, starting at
<http://news.gmane.org/find-root.php?message_id=%3C20150717130559.GI1780%40tucnak.redhat.com%3E>,
that resulted in the trunk r231655 commit:

> gcc/c-family/
> 	* c-common.c (c_common_attribute_table): Handle "omp declare target
> 	link" attribute.
> gcc/
> 	* cgraphunit.c (output_in_order): Do not assemble "omp declare target
> 	link" variables in ACCEL_COMPILER.
> 	* gimplify.c (gimplify_adjust_omp_clauses): Do not remove mapping of
> 	"omp declare target link" variables.
> 	* lto/lto.c: Include stringpool.h and fold-const.h.
> 	(offload_handle_link_vars): New static function.
> 	(lto_main): Call offload_handle_link_vars.
> 	* omp-low.c (scan_sharing_clauses): Do not remove mapping of "omp
> 	declare target link" variables.
> 	(add_decls_addresses_to_decl_constructor): For "omp declare target link"
> 	variables output address of the artificial pointer instead of address of
> 	the variable.  Set most significant bit of the size to mark them.
> 	(pass_data_omp_target_link): New pass_data.
> 	(pass_omp_target_link): New class.
> 	(find_link_var_op): New static function.
> 	(make_pass_omp_target_link): New function.
> 	* passes.def: Add pass_omp_target_link.
> 	* tree-pass.h (make_pass_omp_target_link): Declare.
> 	* varpool.c (symbol_table::output_variables): Do not assemble "omp
> 	declare target link" variables in ACCEL_COMPILER.
> libgomp/
> 	* libgomp.h (REFCOUNT_LINK): Define.
> 	(struct splay_tree_key_s): Add link_key.
> 	* target.c (gomp_map_vars): Treat REFCOUNT_LINK objects as not mapped.
> 	Replace target address of the pointer with target address of newly
> 	mapped object in the splay tree.  Set link pointer on target to the
> 	device address of the mapped object.
> 	(gomp_unmap_vars): Restore target address of the pointer in the splay
> 	tree for REFCOUNT_LINK objects after unmapping.
> 	(gomp_load_image_to_device): Set refcount to REFCOUNT_LINK for "omp
> 	declare target link" objects.
> 	(gomp_unload_image_from_device): Replace j with i.  Force unmap of all
> 	"omp declare target link" objects, which were mapped for the image.
> 	(gomp_exit_data): Restore target address of the pointer in the splay
> 	tree for REFCOUNT_LINK objects after unmapping.
> 	* testsuite/libgomp.c/target-link-1.c: New file.
> 
> 
> diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
> index 9bc02fc..4250cdf 100644
> --- a/gcc/c-family/c-common.c
> +++ b/gcc/c-family/c-common.c
> @@ -821,6 +821,8 @@ const struct attribute_spec c_common_attribute_table[] =
>  			      handle_simd_attribute, false },
>    { "omp declare target",     0, 0, true, false, false,
>  			      handle_omp_declare_target_attribute, false },
> +  { "omp declare target link", 0, 0, true, false, false,
> +			      handle_omp_declare_target_attribute, false },
>    { "alloc_align",	      1, 1, false, true, true,
>  			      handle_alloc_align_attribute, false },
>    { "assume_aligned",	      1, 2, false, true, true,
> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
> index 3d86c36..8443cb0 100644
> --- a/gcc/cgraphunit.c
> +++ b/gcc/cgraphunit.c
> @@ -2210,6 +2210,13 @@ output_in_order (bool no_reorder)
>  	  break;
>  
>  	case ORDER_VAR:
> +#ifdef ACCEL_COMPILER
> +	  /* Do not assemble "omp declare target link" vars.  */
> +	  if (DECL_HAS_VALUE_EXPR_P (nodes[i].u.v->decl)
> +	      && lookup_attribute ("omp declare target link",
> +				   DECL_ATTRIBUTES (nodes[i].u.v->decl)))
> +	    break;
> +#endif
>  	  nodes[i].u.v->assemble_decl ();
>  	  break;
>  
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 80c6bf2..438efba 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -7910,7 +7910,9 @@ gimplify_adjust_omp_clauses (gimple_seq *pre_p, gimple_seq body, tree *list_p,
>  	  n = splay_tree_lookup (ctx->variables, (splay_tree_key) decl);
>  	  if ((ctx->region_type & ORT_TARGET) != 0
>  	      && !(n->value & GOVD_SEEN)
> -	      && GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)) == 0)
> +	      && GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)) == 0
> +	      && !lookup_attribute ("omp declare target link",
> +				    DECL_ATTRIBUTES (decl)))
>  	    {
>  	      remove = true;
>  	      /* For struct element mapping, if struct is never referenced
> diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c
> index fcf7caf..5fd50dc 100644
> --- a/gcc/lto/lto.c
> +++ b/gcc/lto/lto.c
> @@ -50,6 +50,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "ipa-utils.h"
>  #include "gomp-constants.h"
>  #include "lto-symtab.h"
> +#include "stringpool.h"
> +#include "fold-const.h"
>  
>  
>  /* Number of parallel tasks to run, -1 if we want to use GNU Make jobserver.  */
> @@ -3226,6 +3228,37 @@ lto_init (void)
>  #endif
>  }
>  
> +/* Create artificial pointers for "omp declare target link" vars.  */
> +
> +static void
> +offload_handle_link_vars (void)
> +{
> +#ifdef ACCEL_COMPILER
> +  varpool_node *var;
> +  FOR_EACH_VARIABLE (var)
> +    if (lookup_attribute ("omp declare target link",
> +			  DECL_ATTRIBUTES (var->decl)))
> +      {
> +	tree type = build_pointer_type (TREE_TYPE (var->decl));
> +	tree link_ptr_var = make_node (VAR_DECL);
> +	TREE_TYPE (link_ptr_var) = type;
> +	TREE_USED (link_ptr_var) = 1;
> +	TREE_STATIC (link_ptr_var) = 1;
> +	DECL_MODE (link_ptr_var) = TYPE_MODE (type);
> +	DECL_SIZE (link_ptr_var) = TYPE_SIZE (type);
> +	DECL_SIZE_UNIT (link_ptr_var) = TYPE_SIZE_UNIT (type);
> +	DECL_ARTIFICIAL (link_ptr_var) = 1;
> +	tree var_name = DECL_ASSEMBLER_NAME (var->decl);
> +	char *new_name
> +	  = ACONCAT ((IDENTIFIER_POINTER (var_name), "_linkptr", NULL));
> +	DECL_NAME (link_ptr_var) = get_identifier (new_name);
> +	SET_DECL_ASSEMBLER_NAME (link_ptr_var, DECL_NAME (link_ptr_var));
> +	SET_DECL_VALUE_EXPR (var->decl, build_simple_mem_ref (link_ptr_var));
> +	DECL_HAS_VALUE_EXPR_P (var->decl) = 1;
> +      }
> +#endif
> +}
> +
>  
>  /* Main entry point for the GIMPLE front end.  This front end has
>     three main personalities:
> @@ -3274,6 +3307,8 @@ lto_main (void)
>  
>    if (!seen_error ())
>      {
> +      offload_handle_link_vars ();
> +
>        /* If WPA is enabled analyze the whole call graph and create an
>  	 optimization plan.  Otherwise, read in all the function
>  	 bodies and continue with optimization.  */
> diff --git a/gcc/omp-low.c b/gcc/omp-low.c
> index 5643480..676b1df 100644
> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -2026,7 +2026,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
>  	  decl = OMP_CLAUSE_DECL (c);
>  	  /* Global variables with "omp declare target" attribute
>  	     don't need to be copied, the receiver side will use them
> -	     directly.  */
> +	     directly.  However, global variables with "omp declare target link"
> +	     attribute need to be copied.  */
>  	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
>  	      && DECL_P (decl)
>  	      && ((OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FIRSTPRIVATE_POINTER
> @@ -2034,7 +2035,9 @@ scan_sharing_clauses (tree clauses, omp_context *ctx,
>  		       != GOMP_MAP_FIRSTPRIVATE_REFERENCE))
>  		  || TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
>  	      && is_global_var (maybe_lookup_decl_in_outer_ctx (decl, ctx))
> -	      && varpool_node::get_create (decl)->offloadable)
> +	      && varpool_node::get_create (decl)->offloadable
> +	      && !lookup_attribute ("omp declare target link",
> +				    DECL_ATTRIBUTES (decl)))
>  	    break;
>  	  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
>  	      && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER)
> @@ -18588,13 +18591,45 @@ add_decls_addresses_to_decl_constructor (vec<tree, va_gc> *v_decls,
>    for (unsigned i = 0; i < len; i++)
>      {
>        tree it = (*v_decls)[i];
> -      bool is_function = TREE_CODE (it) != VAR_DECL;
> +      bool is_var = TREE_CODE (it) == VAR_DECL;
> +      bool is_link_var
> +	= is_var
> +#ifdef ACCEL_COMPILER
> +	  && DECL_HAS_VALUE_EXPR_P (it)
> +#endif
> +	  && lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (it));
>  
> -      CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, build_fold_addr_expr (it));
> -      if (!is_function)
> -	CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE,
> -				fold_convert (const_ptr_type_node,
> -					      DECL_SIZE_UNIT (it)));
> +      tree size = NULL_TREE;
> +      if (is_var)
> +	size = fold_convert (const_ptr_type_node, DECL_SIZE_UNIT (it));
> +
> +      tree addr;
> +      if (!is_link_var)
> +	addr = build_fold_addr_expr (it);
> +      else
> +	{
> +#ifdef ACCEL_COMPILER
> +	  /* For "omp declare target link" vars add address of the pointer to
> +	     the target table, instead of address of the var.  */
> +	  tree value_expr = DECL_VALUE_EXPR (it);
> +	  tree link_ptr_decl = TREE_OPERAND (value_expr, 0);
> +	  varpool_node::finalize_decl (link_ptr_decl);
> +	  addr = build_fold_addr_expr (link_ptr_decl);
> +#else
> +	  addr = build_fold_addr_expr (it);
> +#endif
> +
> +	  /* Most significant bit of the size marks "omp declare target link"
> +	     vars in host and target tables.  */
> +	  unsigned HOST_WIDE_INT isize = tree_to_uhwi (size);
> +	  isize |= 1ULL << (int_size_in_bytes (const_ptr_type_node)
> +			    * BITS_PER_UNIT - 1);
> +	  size = wide_int_to_tree (const_ptr_type_node, isize);
> +	}
> +
> +      CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, addr);
> +      if (is_var)
> +	CONSTRUCTOR_APPEND_ELT (v_ctor, NULL_TREE, size);
>      }
>  }
>  
> @@ -19831,4 +19866,84 @@ make_pass_oacc_device_lower (gcc::context *ctxt)
>    return new pass_oacc_device_lower (ctxt);
>  }
>  
> +/* "omp declare target link" handling pass.  */
> +
> +namespace {
> +
> +const pass_data pass_data_omp_target_link =
> +{
> +  GIMPLE_PASS,			/* type */
> +  "omptargetlink",		/* name */
> +  OPTGROUP_NONE,		/* optinfo_flags */
> +  TV_NONE,			/* tv_id */
> +  PROP_ssa,			/* properties_required */
> +  0,				/* properties_provided */
> +  0,				/* properties_destroyed */
> +  0,				/* todo_flags_start */
> +  TODO_update_ssa,		/* todo_flags_finish */
> +};
> +
> +class pass_omp_target_link : public gimple_opt_pass
> +{
> +public:
> +  pass_omp_target_link (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_omp_target_link, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *fun)
> +    {
> +#ifdef ACCEL_COMPILER
> +      tree attrs = DECL_ATTRIBUTES (fun->decl);
> +      return lookup_attribute ("omp declare target", attrs)
> +	     || lookup_attribute ("omp target entrypoint", attrs);
> +#else
> +      (void) fun;
> +      return false;
> +#endif
> +    }
> +
> +  virtual unsigned execute (function *);
> +};
> +
> +/* Callback for walk_gimple_stmt used to scan for link var operands.  */
> +
> +static tree
> +find_link_var_op (tree *tp, int *walk_subtrees, void *)
> +{
> +  tree t = *tp;
> +
> +  if (TREE_CODE (t) == VAR_DECL && DECL_HAS_VALUE_EXPR_P (t)
> +      && lookup_attribute ("omp declare target link", DECL_ATTRIBUTES (t)))
> +    {
> +      *walk_subtrees = 0;
> +      return t;
> +    }
> +
> +  return NULL_TREE;
> +}
> +
> +unsigned
> +pass_omp_target_link::execute (function *fun)
> +{
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, fun)
> +    {
> +      gimple_stmt_iterator gsi;
> +      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +	if (walk_gimple_stmt (&gsi, NULL, find_link_var_op, NULL))
> +	  gimple_regimplify_operands (gsi_stmt (gsi), &gsi);
> +    }
> +
> +  return 0;
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_omp_target_link (gcc::context *ctxt)
> +{
> +  return new pass_omp_target_link (ctxt);
> +}
> +
>  #include "gt-omp-low.h"
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 43ce3d5..c72b38b 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -170,6 +170,7 @@ along with GCC; see the file COPYING3.  If not see
>    NEXT_PASS (pass_fixup_cfg);
>    NEXT_PASS (pass_lower_eh_dispatch);
>    NEXT_PASS (pass_oacc_device_lower);
> +  NEXT_PASS (pass_omp_target_link);
>    NEXT_PASS (pass_all_optimizations);
>    PUSH_INSERT_PASSES_WITHIN (pass_all_optimizations)
>        NEXT_PASS (pass_remove_cgraph_callee_edges);
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index e1cbce9..a13a865 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -417,6 +417,7 @@ extern gimple_opt_pass *make_pass_lower_omp (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_diagnose_omp_blocks (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_expand_omp (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_expand_omp_ssa (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_omp_target_link (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_oacc_device_lower (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_object_sizes (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_strlen (gcc::context *ctxt);
> diff --git a/gcc/varpool.c b/gcc/varpool.c
> index 5e4fcbf..d0101a1 100644
> --- a/gcc/varpool.c
> +++ b/gcc/varpool.c
> @@ -748,6 +748,13 @@ symbol_table::output_variables (void)
>        /* Handled in output_in_order.  */
>        if (node->no_reorder)
>  	continue;
> +#ifdef ACCEL_COMPILER
> +      /* Do not assemble "omp declare target link" vars.  */
> +      if (DECL_HAS_VALUE_EXPR_P (node->decl)
> +	  && lookup_attribute ("omp declare target link",
> +			       DECL_ATTRIBUTES (node->decl)))
> +	continue;
> +#endif
>        if (node->assemble_decl ())
>          changed = true;
>      }
> diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
> index 9d9949f..73aa513 100644
> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h
> @@ -817,6 +817,9 @@ struct target_mem_desc {
>  
>  /* Special value for refcount - infinity.  */
>  #define REFCOUNT_INFINITY (~(uintptr_t) 0)
> +/* Special value for refcount - tgt_offset contains target address of the
> +   artificial pointer to "omp declare target link" object.  */
> +#define REFCOUNT_LINK (~(uintptr_t) 1)
>  
>  struct splay_tree_key_s {
>    /* Address of the host object.  */
> @@ -831,6 +834,8 @@ struct splay_tree_key_s {
>    uintptr_t refcount;
>    /* Asynchronous reference count.  */
>    uintptr_t async_refcount;
> +  /* Pointer to the original mapping of "omp declare target link" object.  */
> +  splay_tree_key link_key;
>  };
>  
>  /* The comparison function.  */
> diff --git a/libgomp/target.c b/libgomp/target.c
> index 932b176..1ab30f7 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -464,7 +464,7 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
>  	}
>        else
>  	n = splay_tree_lookup (mem_map, &cur_node);
> -      if (n)
> +      if (n && n->refcount != REFCOUNT_LINK)
>  	gomp_map_vars_existing (devicep, n, &cur_node, &tgt->list[i],
>  				kind & typemask);
>        else
> @@ -628,11 +628,19 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
>  	    else
>  	      k->host_end = k->host_start + sizeof (void *);
>  	    splay_tree_key n = splay_tree_lookup (mem_map, k);
> -	    if (n)
> +	    if (n && n->refcount != REFCOUNT_LINK)
>  	      gomp_map_vars_existing (devicep, n, k, &tgt->list[i],
>  				      kind & typemask);
>  	    else
>  	      {
> +		k->link_key = NULL;
> +		if (n && n->refcount == REFCOUNT_LINK)
> +		  {
> +		    /* Replace target address of the pointer with target address
> +		       of mapped object in the splay tree.  */
> +		    splay_tree_remove (mem_map, n);
> +		    k->link_key = n;
> +		  }
>  		size_t align = (size_t) 1 << (kind >> rshift);
>  		tgt->list[i].key = k;
>  		k->tgt = tgt;
> @@ -752,6 +760,16 @@ gomp_map_vars (struct gomp_device_descr *devicep, size_t mapnum,
>  		    gomp_fatal ("%s: unhandled kind 0x%.2x", __FUNCTION__,
>  				kind);
>  		  }
> +
> +		if (k->link_key)
> +		  {
> +		    /* Set link pointer on target to the device address of the
> +		       mapped object.  */
> +		    void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset);
> +		    devicep->host2dev_func (devicep->target_id,
> +					    (void *) n->tgt_offset,
> +					    &tgt_addr, sizeof (void *));
> +		  }
>  		array++;
>  	      }
>  	  }
> @@ -884,6 +902,9 @@ gomp_unmap_vars (struct target_mem_desc *tgt, bool do_copyfrom)
>        if (do_unmap)
>  	{
>  	  splay_tree_remove (&devicep->mem_map, k);
> +	  if (k->link_key)
> +	    splay_tree_insert (&devicep->mem_map,
> +			       (splay_tree_node) k->link_key);
>  	  if (k->tgt->refcount > 1)
>  	    k->tgt->refcount--;
>  	  else
> @@ -1020,31 +1041,40 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
>        k->tgt_offset = target_table[i].start;
>        k->refcount = REFCOUNT_INFINITY;
>        k->async_refcount = 0;
> +      k->link_key = NULL;
>        array->left = NULL;
>        array->right = NULL;
>        splay_tree_insert (&devicep->mem_map, array);
>        array++;
>      }
>  
> +  /* Most significant bit of the size in host and target tables marks
> +     "omp declare target link" variables.  */
> +  const uintptr_t link_bit = 1ULL << (sizeof (uintptr_t) * __CHAR_BIT__ - 1);
> +  const uintptr_t size_mask = ~link_bit;
> +
>    for (i = 0; i < num_vars; i++)
>      {
>        struct addr_pair *target_var = &target_table[num_funcs + i];
> -      if (target_var->end - target_var->start
> -	  != (uintptr_t) host_var_table[i * 2 + 1])
> +      uintptr_t target_size = target_var->end - target_var->start;
> +
> +      if ((uintptr_t) host_var_table[i * 2 + 1] != target_size)
>  	{
>  	  gomp_mutex_unlock (&devicep->lock);
>  	  if (is_register_lock)
>  	    gomp_mutex_unlock (&register_lock);
> -	  gomp_fatal ("Can't map target variables (size mismatch)");
> +	  gomp_fatal ("Cannot map target variables (size mismatch)");
>  	}
>  
>        splay_tree_key k = &array->key;
>        k->host_start = (uintptr_t) host_var_table[i * 2];
> -      k->host_end = k->host_start + (uintptr_t) host_var_table[i * 2 + 1];
> +      k->host_end
> +	= k->host_start + (size_mask & (uintptr_t) host_var_table[i * 2 + 1]);
>        k->tgt = tgt;
>        k->tgt_offset = target_var->start;
> -      k->refcount = REFCOUNT_INFINITY;
> +      k->refcount = target_size & link_bit ? REFCOUNT_LINK : REFCOUNT_INFINITY;
>        k->async_refcount = 0;
> +      k->link_key = NULL;
>        array->left = NULL;
>        array->right = NULL;
>        splay_tree_insert (&devicep->mem_map, array);
> @@ -1072,7 +1102,6 @@ gomp_unload_image_from_device (struct gomp_device_descr *devicep,
>    int num_funcs = host_funcs_end - host_func_table;
>    int num_vars  = (host_vars_end - host_var_table) / 2;
>  
> -  unsigned j;
>    struct splay_tree_key_s k;
>    splay_tree_key node = NULL;
>  
> @@ -1088,21 +1117,46 @@ gomp_unload_image_from_device (struct gomp_device_descr *devicep,
>    devicep->unload_image_func (devicep->target_id, version, target_data);
>  
>    /* Remove mappings from splay tree.  */
> -  for (j = 0; j < num_funcs; j++)
> +  int i;
> +  for (i = 0; i < num_funcs; i++)
>      {
> -      k.host_start = (uintptr_t) host_func_table[j];
> +      k.host_start = (uintptr_t) host_func_table[i];
>        k.host_end = k.host_start + 1;
>        splay_tree_remove (&devicep->mem_map, &k);
>      }
>  
> -  for (j = 0; j < num_vars; j++)
> +  /* Most significant bit of the size in host and target tables marks
> +     "omp declare target link" variables.  */
> +  const uintptr_t link_bit = 1ULL << (sizeof (uintptr_t) * __CHAR_BIT__ - 1);
> +  const uintptr_t size_mask = ~link_bit;
> +  bool is_tgt_unmapped = false;
> +
> +  for (i = 0; i < num_vars; i++)
>      {
> -      k.host_start = (uintptr_t) host_var_table[j * 2];
> -      k.host_end = k.host_start + (uintptr_t) host_var_table[j * 2 + 1];
> -      splay_tree_remove (&devicep->mem_map, &k);
> +      k.host_start = (uintptr_t) host_var_table[i * 2];
> +      k.host_end
> +	= k.host_start + (size_mask & (uintptr_t) host_var_table[i * 2 + 1]);
> +
> +      if (!(link_bit & (uintptr_t) host_var_table[i * 2 + 1]))
> +	splay_tree_remove (&devicep->mem_map, &k);
> +      else
> +	{
> +	  splay_tree_key n = splay_tree_lookup (&devicep->mem_map, &k);
> +	  splay_tree_remove (&devicep->mem_map, n);
> +	  if (n->link_key)
> +	    {
> +	      if (n->tgt->refcount > 1)
> +		n->tgt->refcount--;
> +	      else
> +		{
> +		  is_tgt_unmapped = true;
> +		  gomp_unmap_tgt (n->tgt);
> +		}
> +	    }
> +	}
>      }
>  
> -  if (node)
> +  if (node && !is_tgt_unmapped)
>      {
>        free (node->tgt);
>        free (node);
> @@ -1658,6 +1712,9 @@ gomp_exit_data (struct gomp_device_descr *devicep, size_t mapnum,
>  	  if (k->refcount == 0)
>  	    {
>  	      splay_tree_remove (&devicep->mem_map, k);
> +	      if (k->link_key)
> +		splay_tree_insert (&devicep->mem_map,
> +				   (splay_tree_node) k->link_key);
>  	      if (k->tgt->refcount > 1)
>  		k->tgt->refcount--;
>  	      else
> diff --git a/libgomp/testsuite/libgomp.c/target-link-1.c b/libgomp/testsuite/libgomp.c/target-link-1.c
> new file mode 100644
> index 0000000..681677c
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c/target-link-1.c
> @@ -0,0 +1,63 @@
> +struct S { int s, t; };
> +
> +int a = 1, b = 1;
> +double c[27];
> +struct S d = { 8888, 8888 };
> +#pragma omp declare target link (a) to (b) link (c, d)
> +
> +int
> +foo (void)
> +{
> +  return a++ + b++;
> +}
> +
> +int
> +bar (int n)
> +{
> +  int *p1 = &a;
> +  int *p2 = &b;
> +  c[n] += 2.0;
> +  d.s -= 2;
> +  d.t -= 2;
> +  return *p1 + *p2 + d.s + d.t;
> +}
> +
> +#pragma omp declare target (foo, bar)
> +
> +int
> +main ()
> +{
> +  a = b = 2;
> +  d.s = 17;
> +  d.t = 18;
> +
> +  int res, n = 10;
> +  #pragma omp target map (to: a, b, c, d) map (from: res)
> +  {
> +    res = foo () + foo ();
> +    c[n] = 3.0;
> +    res += bar (n);
> +  }
> +
> +  int shared_mem = 0;
> +  #pragma omp target map (alloc: shared_mem)
> +    shared_mem = 1;
> +
> +  if ((shared_mem && res != (2 + 2) + (3 + 3) + (4 + 4 + 15 + 16))
> +      || (!shared_mem && res != (2 + 1) + (3 + 2) + (4 + 3 + 15 + 16)))
> +    __builtin_abort ();
> +
> +  #pragma omp target enter data map (to: c)
> +  #pragma omp target update from (c)
> +  res = (int) (c[n] + 0.5);
> +  if ((shared_mem && res != 5) || (!shared_mem && res != 0))
> +    __builtin_abort ();
> +
> +  #pragma omp target map (to: a, b) map (from: res)
> +    res = foo ();
> +
> +  if ((shared_mem && res != 4 + 4) || (!shared_mem && res != 2 + 3))
> +    __builtin_abort ();
> +
> +  return 0;
> +}


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: gomp_target_fini
  2015-12-16 12:30                                       ` gomp_target_fini (was: [gomp4.5] Handle #pragma omp declare target link) Thomas Schwinge
@ 2015-12-23 11:05                                         ` Thomas Schwinge
  2016-01-11 10:40                                           ` gomp_target_fini Thomas Schwinge
  2016-01-21 15:24                                         ` gomp_target_fini Bernd Schmidt
  1 sibling, 1 reply; 48+ messages in thread
From: Thomas Schwinge @ 2015-12-23 11:05 UTC (permalink / raw)
  To: gcc-patches, Ilya Verbin, Jakub Jelinek
  Cc: Kirill Yukhin, Chung-Lin Tang, James Norris

[-- Attachment #1: Type: text/plain, Size: 7418 bytes --]

Hi!

Ping.

On Wed, 16 Dec 2015 13:30:21 +0100, I wrote:
> On Mon, 14 Dec 2015 19:47:36 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> > On Fri, Dec 11, 2015 at 18:27:13 +0100, Jakub Jelinek wrote:
> > > On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote:
> > > > +/* This function finalizes all initialized devices.  */
> > > > +
> > > > +static void
> > > > +gomp_target_fini (void)
> > > > +{
> > > > +  [...]
> > > 
> > > The question is what will this do if there are async target tasks still
> > > running on some of the devices at this point (forgotten #pragma omp taskwait
> > > or similar if target nowait regions are started outside of parallel region,
> > > or exit inside of parallel, etc.  But perhaps it can be handled incrementally.
> > > Also there is the question that the 
> > > So I think the patch is ok with the above mentioned changes.
> > 
> > Here is what I've committed to trunk.
> 
> > --- a/libgomp/libgomp.h
> > +++ b/libgomp/libgomp.h
> > @@ -888,6 +888,14 @@ typedef struct acc_dispatch_t
> >    } cuda;
> >  } acc_dispatch_t;
> >  
> > +/* Various state of the accelerator device.  */
> > +enum gomp_device_state
> > +{
> > +  GOMP_DEVICE_UNINITIALIZED,
> > +  GOMP_DEVICE_INITIALIZED,
> > +  GOMP_DEVICE_FINALIZED
> > +};
> > +
> >  /* This structure describes accelerator device.
> >     It contains name of the corresponding libgomp plugin, function handlers for
> >     interaction with the device, ID-number of the device, and information about
> > @@ -933,8 +941,10 @@ struct gomp_device_descr
> >    /* Mutex for the mutable data.  */
> >    gomp_mutex_t lock;
> >  
> > -  /* Set to true when device is initialized.  */
> > -  bool is_initialized;
> > +  /* Current state of the device.  OpenACC allows to move from INITIALIZED state
> > +     back to UNINITIALIZED state.  OpenMP allows only to move from INITIALIZED
> > +     to FINALIZED state (at program shutdown).  */
> > +  enum gomp_device_state state;
> 
> (ACK, but I assume we'll want to make sure that an OpenACC device is
> never re-initialized if we're in/after the libgomp finalization phase.)
> 
> 
> The issue mentioned above: "exit inside of parallel" is actually a
> problem for nvptx offloading: the libgomp.oacc-c-c++-common/abort-1.c,
> libgomp.oacc-c-c++-common/abort-3.c, and libgomp.oacc-fortran/abort-1.f90
> test cases now run into annoying "WARNING: program timed out".  Here is
> what's happening, as I understand it: in
> libgomp/plugin/plugin-nvptx.c:nvptx_exec, the cuStreamSynchronize call
> returns CUDA_ERROR_LAUNCH_FAILED, upon which we call GOMP_PLUGIN_fatal.
> 
> > --- a/libgomp/target.c
> > +++ b/libgomp/target.c
> 
> > +/* This function finalizes all initialized devices.  */
> > +
> > +static void
> > +gomp_target_fini (void)
> > +{
> > +  int i;
> > +  for (i = 0; i < num_devices; i++)
> > +    {
> > +      struct gomp_device_descr *devicep = &devices[i];
> > +      gomp_mutex_lock (&devicep->lock);
> > +      if (devicep->state == GOMP_DEVICE_INITIALIZED)
> > +	{
> > +	  devicep->fini_device_func (devicep->target_id);
> > +	  devicep->state = GOMP_DEVICE_FINALIZED;
> > +	}
> > +      gomp_mutex_unlock (&devicep->lock);
> > +    }
> > +}
> 
> > @@ -2387,6 +2433,9 @@ gomp_target_init (void)
> >        if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
> >  	goacc_register (&devices[i]);
> >      }
> > +
> > +  if (atexit (gomp_target_fini) != 0)
> > +    gomp_fatal ("atexit failed");
> >  }
> 
> Now, with the above change installed, GOMP_PLUGIN_fatal will trigger the
> atexit handler, gomp_target_fini, which, with the device lock held, will
> call back into the plugin, GOMP_OFFLOAD_fini_device, which will try to
> clean up.
> 
> Because of the earlier CUDA_ERROR_LAUNCH_FAILED, the associated CUDA
> context is now in an inconsistent state, see
> <https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TYPES.html>:
> 
>     CUDA_ERROR_LAUNCH_FAILED = 719
>         An exception occurred on the device while executing a
>         kernel. Common causes include dereferencing an invalid device
>         pointer and accessing out of bounds shared memory. The context
>         cannot be used, so it must be destroyed (and a new one should be
>         created). All existing device memory allocations from this
>         context are invalid and must be reconstructed if the program is
>         to continue using CUDA.
> 
> Thus, any cuMemFreeHost invocations that are run during clean-up will now
> also/still return CUDA_ERROR_LAUNCH_FAILED, due to which we'll again call
> GOMP_PLUGIN_fatal, which again will trigger the same or another
> (GOMP_offload_unregister_ver) atexit handler, which will then deadlock
> trying to lock the device again, which is still locked.
> 
> (Jim, I wonder: after the first CUDA_ERROR_LAUNCH_FAILED and similar
> errors, should we destroy the context right away, or toggle a boolean
> flag to mark it as unusable, and use that as an indication to avoid the
> follow-on failures of cuMemFreeHost just described above, for example?)
> 
> <http://pubs.opengroup.org/onlinepubs/9699919799/functions/atexit.html>
> tells us:
> 
>     Since the behavior is undefined if the exit() function is called more
>     than once, portable applications calling atexit() must ensure that the
>     exit() function is not called at normal process termination when all
>     functions registered by the atexit() function are called.
> 
> ... which we're violating here, at least in the nvptx plugin.  I have not
> analyzed the intermic one.
> 
> As it happens, Chung-Lin has been working in that area:
> <http://news.gmane.org/find-root.php?message_id=%3C55DF1452.9050501%40codesourcery.com%3E>,
> which he recently re-posted:
> <http://news.gmane.org/find-root.php?message_id=%3C566EE49A.3050403%40codesourcery.com%3E>,
> <http://news.gmane.org/find-root.php?message_id=%3C566EC310.8000403%40codesourcery.com%3E>,
> <http://news.gmane.org/find-root.php?message_id=%3C566EC324.9050505%40codesourcery.com%3E>.
> I have not analyzed whether his changes would completely resolve the
> problem just described, but at least conceptually they seem to be a step
> into the right direction?  (Jakub?)
> 
> Now, to resolve the immediate problem, what is the right thing for us to
> do?  Is the following simple change OK, or is there a reason to still run
> atexit handlers if terminating under error conditions?
> 
> commit b1733e8f9df6ae7d6828e2194df1b314772701c5
> Author: Thomas Schwinge <thomas@codesourcery.com>
> Date:   Wed Dec 16 13:10:39 2015 +0100
> 
>     Avoid deadlocks in libgomp due to competing atexit handlers
>     
>     	libgomp/
>     	* error.c (gomp_vfatal): Call _exit instead of exit.
> ---
>  libgomp/error.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git libgomp/error.c libgomp/error.c
> index 094c24a..1ef7491 100644
> --- libgomp/error.c
> +++ libgomp/error.c
> @@ -34,6 +34,7 @@
>  #include <stdarg.h>
>  #include <stdio.h>
>  #include <stdlib.h>
> +#include <unistd.h>
>  
>  
>  #undef gomp_vdebug
> @@ -77,7 +78,7 @@ void
>  gomp_vfatal (const char *fmt, va_list list)
>  {
>    gomp_verror (fmt, list);
> -  exit (EXIT_FAILURE);
> +  _exit (EXIT_FAILURE);
>  }
>  
>  void


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [gomp4] Fix use of declare'd vars by routine procedures.
@ 2016-01-07 18:57                         ` James Norris
  2016-01-11 11:55                           ` Thomas Schwinge
  0 siblings, 1 reply; 48+ messages in thread
From: James Norris @ 2016-01-07 18:57 UTC (permalink / raw)
  To: GCC Patches

[-- Attachment #1: Type: text/plain, Size: 397 bytes --]

Hi!

The checking of variables declared by OpenACC declare directives
used within an OpenACC routine procedure was not being done correctly.
This fix adds the checking required and also adds to the testing.

This fix resolves the issue pointed out by Cesar with declare-4.c
(https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00339.html).

This patch has been applied to gomp-4_0-branch.

Thanks,
Jim


[-- Attachment #2: declare.patch --]
[-- Type: text/x-patch, Size: 6278 bytes --]

Index: gcc/c/ChangeLog.gomp
===================================================================
--- gcc/c/ChangeLog.gomp	(revision 232138)
+++ gcc/c/ChangeLog.gomp	(working copy)
@@ -1,3 +1,8 @@
+2016-01-07  James Norris  <jnorris@codesourcery.com>
+
+	* c-parser.c (c_finish_oacc_routine): Add new attribute.
+	* c-typeck.c (build_external_ref): Add usage check.
+
 2015-12-08  Thomas Schwinge  <thomas@codesourcery.com>
 
 	* c-parser.c (c_parser_oacc_clause_bind, c_parser_oacc_routine)
Index: gcc/c/c-parser.c
===================================================================
--- gcc/c/c-parser.c	(revision 232138)
+++ gcc/c/c-parser.c	(working copy)
@@ -14115,6 +14115,10 @@
   /* Also add an "omp declare target" attribute, with clauses.  */
   DECL_ATTRIBUTES (fndecl) = tree_cons (get_identifier ("omp declare target"),
 					clauses, DECL_ATTRIBUTES (fndecl));
+
+  DECL_ATTRIBUTES (fndecl)
+    = tree_cons (get_identifier ("oacc routine"),
+		 clauses, DECL_ATTRIBUTES (fndecl));
 }
 
 /* OpenACC 2.0:
Index: gcc/c/c-typeck.c
===================================================================
--- gcc/c/c-typeck.c	(revision 232138)
+++ gcc/c/c-typeck.c	(working copy)
@@ -2664,6 +2664,26 @@
   tree ref;
   tree decl = lookup_name (id);
 
+  if (decl
+      && decl != error_mark_node
+      && current_function_decl
+      && TREE_CODE (decl) == VAR_DECL
+      && is_global_var (decl)
+      && lookup_attribute ("oacc routine",
+			   DECL_ATTRIBUTES (current_function_decl)))
+    {
+      if (lookup_attribute ("omp declare target link",
+			    DECL_ATTRIBUTES (decl))
+	  || ((!lookup_attribute ("omp declare target",
+				  DECL_ATTRIBUTES (decl))
+	       && ((TREE_STATIC (decl) && !DECL_EXTERNAL (decl))
+		   || (!TREE_STATIC (decl) && DECL_EXTERNAL (decl))))))
+	{
+	  error_at (loc, "invalid use in %<routine%> function");
+	  return error_mark_node;
+	}
+    }
+
   /* In Objective-C, an instance variable (ivar) may be preferred to
      whatever lookup_name() found.  */
   decl = objc_lookup_ivar (decl, id);
Index: gcc/cp/ChangeLog.gomp
===================================================================
--- gcc/cp/ChangeLog.gomp	(revision 232138)
+++ gcc/cp/ChangeLog.gomp	(working copy)
@@ -1,3 +1,8 @@
+2016-01-07  James Norris  <jnorris@codesourcery.com>
+
+	* parser.c (cp_finalize_oacc_routine): Add new attribute.
+	* semantics.c (finish_id_expression): Add usage check.
+
 2016-01-07  Cesar Philippidis  <cesar@codesourcery.com>
 
 	* cp-tree.h (bind_decls_match): Declare.
Index: gcc/cp/parser.c
===================================================================
--- gcc/cp/parser.c	(revision 232138)
+++ gcc/cp/parser.c	(working copy)
@@ -36732,6 +36732,10 @@
       DECL_ATTRIBUTES (fndecl)
 	= tree_cons (get_identifier ("omp declare target"),
 		     clauses, DECL_ATTRIBUTES (fndecl));
+
+      DECL_ATTRIBUTES (fndecl)
+	= tree_cons (get_identifier ("oacc routine"),
+		     NULL_TREE, DECL_ATTRIBUTES (fndecl));
     }
 }
 
Index: gcc/cp/semantics.c
===================================================================
--- gcc/cp/semantics.c	(revision 232138)
+++ gcc/cp/semantics.c	(working copy)
@@ -3700,6 +3700,25 @@
 
 	  decl = convert_from_reference (decl);
 	}
+
+      if (decl != error_mark_node
+	  && current_function_decl
+	  && TREE_CODE (decl) == VAR_DECL
+	  && is_global_var (decl)
+	  && lookup_attribute ("oacc routine",
+			       DECL_ATTRIBUTES (current_function_decl)))
+	{
+	  if (lookup_attribute ("omp declare target link",
+				DECL_ATTRIBUTES (decl))
+	      || ((!lookup_attribute ("omp declare target",
+				  DECL_ATTRIBUTES (decl))
+		   && ((TREE_STATIC (decl) && !DECL_EXTERNAL (decl))
+			|| (!TREE_STATIC (decl) && DECL_EXTERNAL (decl))))))
+	    {
+	      *error_msg = "invalid use in %<routine%> function";
+	      return error_mark_node;
+	    }
+	}
     }
 
   return cp_expr (decl, location);
Index: gcc/testsuite/ChangeLog.gomp
===================================================================
--- gcc/testsuite/ChangeLog.gomp	(revision 232138)
+++ gcc/testsuite/ChangeLog.gomp	(working copy)
@@ -1,3 +1,7 @@
+2016-01-07  James Norris  <jnorris@codesourcery.com>
+
+	* c-c++-common/goacc/routine-5.c: Additional tests.
+
 2016-01-07  Cesar Philippidis  <cesar@codesourcery.com>
 
 	* g++.dg/goacc/routine-2.C: Add more coverage.
Index: gcc/testsuite/c-c++-common/goacc/routine-5.c
===================================================================
--- gcc/testsuite/c-c++-common/goacc/routine-5.c	(revision 232138)
+++ gcc/testsuite/c-c++-common/goacc/routine-5.c	(working copy)
@@ -59,3 +59,49 @@
 #pragma acc routine (Foo) gang // { dg-error "must be applied before definition" }
 
 #pragma acc routine (Baz) // { dg-error "not been declared" }
+
+float vb1;
+
+#pragma acc routine
+int
+func1 (int a)
+{
+  vb1 = a + 1; /* { dg-error "invalid use in" } */
+
+  return vb1; /* { dg-error "invalid use in" } */
+}
+
+#pragma acc routine
+int
+func2 (int a)
+{
+  static int vb2;
+
+  vb2 = a + 1; /* { dg-error "invalid use in" } */
+
+  return vb2; /* { dg-error "invalid use in" } */
+}
+
+float vb3;
+#pragma acc declare link (vb3)
+
+#pragma acc routine
+int
+func3 (int a)
+{
+  vb3 = a + 1; /* { dg-error "invalid use in" } */
+
+  return vb3; /* { dg-error "invalid use in" } */
+}
+
+float vb4;
+#pragma acc declare create (vb4)
+
+#pragma acc routine
+int
+func4 (int a)
+{
+  vb4 = a + 1;
+
+  return vb4;
+}
Index: libgomp/ChangeLog.gomp
===================================================================
--- libgomp/ChangeLog.gomp	(revision 232138)
+++ libgomp/ChangeLog.gomp	(working copy)
@@ -1,3 +1,7 @@
+2016-01-07  James Norris  <jnorris@codesourcery.com>
+
+	* testsuite/libgomp.oacc-c-c++-common/declare-4.c: Fix test.
+
 2016-01-06  Cesar Philippidis  <cesar@codesourcery.com>
 
 	* testsuite/libgomp.oacc-fortran/pr68813.f90: New test.
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/declare-4.c
===================================================================
--- libgomp/testsuite/libgomp.oacc-c-c++-common/declare-4.c	(revision 232138)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/declare-4.c	(working copy)
@@ -4,7 +4,7 @@
 #include <openacc.h>
 
 float b;
-#pragma acc declare link (b)
+#pragma acc declare create (b)
 
 #pragma acc routine
 int

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: gomp_target_fini
  2015-12-23 11:05                                         ` gomp_target_fini Thomas Schwinge
@ 2016-01-11 10:40                                           ` Thomas Schwinge
  2016-01-21  6:17                                             ` gomp_target_fini Thomas Schwinge
  0 siblings, 1 reply; 48+ messages in thread
From: Thomas Schwinge @ 2016-01-11 10:40 UTC (permalink / raw)
  To: gcc-patches, Ilya Verbin, Jakub Jelinek
  Cc: Kirill Yukhin, Chung-Lin Tang, James Norris

[-- Attachment #1: Type: text/plain, Size: 7833 bytes --]

Hi!

Ping.

On Wed, 23 Dec 2015 12:05:32 +0100, I wrote:
> Ping.
> 
> On Wed, 16 Dec 2015 13:30:21 +0100, I wrote:
> > On Mon, 14 Dec 2015 19:47:36 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> > > On Fri, Dec 11, 2015 at 18:27:13 +0100, Jakub Jelinek wrote:
> > > > On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote:
> > > > > +/* This function finalizes all initialized devices.  */
> > > > > +
> > > > > +static void
> > > > > +gomp_target_fini (void)
> > > > > +{
> > > > > +  [...]
> > > > 
> > > > The question is what will this do if there are async target tasks still
> > > > running on some of the devices at this point (forgotten #pragma omp taskwait
> > > > or similar if target nowait regions are started outside of parallel region,
> > > > or exit inside of parallel, etc.  But perhaps it can be handled incrementally.
> > > > Also there is the question that the 
> > > > So I think the patch is ok with the above mentioned changes.
> > > 
> > > Here is what I've committed to trunk.
> > 
> > > --- a/libgomp/libgomp.h
> > > +++ b/libgomp/libgomp.h
> > > @@ -888,6 +888,14 @@ typedef struct acc_dispatch_t
> > >    } cuda;
> > >  } acc_dispatch_t;
> > >  
> > > +/* Various state of the accelerator device.  */
> > > +enum gomp_device_state
> > > +{
> > > +  GOMP_DEVICE_UNINITIALIZED,
> > > +  GOMP_DEVICE_INITIALIZED,
> > > +  GOMP_DEVICE_FINALIZED
> > > +};
> > > +
> > >  /* This structure describes accelerator device.
> > >     It contains name of the corresponding libgomp plugin, function handlers for
> > >     interaction with the device, ID-number of the device, and information about
> > > @@ -933,8 +941,10 @@ struct gomp_device_descr
> > >    /* Mutex for the mutable data.  */
> > >    gomp_mutex_t lock;
> > >  
> > > -  /* Set to true when device is initialized.  */
> > > -  bool is_initialized;
> > > +  /* Current state of the device.  OpenACC allows to move from INITIALIZED state
> > > +     back to UNINITIALIZED state.  OpenMP allows only to move from INITIALIZED
> > > +     to FINALIZED state (at program shutdown).  */
> > > +  enum gomp_device_state state;
> > 
> > (ACK, but I assume we'll want to make sure that an OpenACC device is
> > never re-initialized if we're in/after the libgomp finalization phase.)
> > 
> > 
> > The issue mentioned above: "exit inside of parallel" is actually a
> > problem for nvptx offloading: the libgomp.oacc-c-c++-common/abort-1.c,
> > libgomp.oacc-c-c++-common/abort-3.c, and libgomp.oacc-fortran/abort-1.f90
> > test cases now run into annoying "WARNING: program timed out".  Here is
> > what's happening, as I understand it: in
> > libgomp/plugin/plugin-nvptx.c:nvptx_exec, the cuStreamSynchronize call
> > returns CUDA_ERROR_LAUNCH_FAILED, upon which we call GOMP_PLUGIN_fatal.
> > 
> > > --- a/libgomp/target.c
> > > +++ b/libgomp/target.c
> > 
> > > +/* This function finalizes all initialized devices.  */
> > > +
> > > +static void
> > > +gomp_target_fini (void)
> > > +{
> > > +  int i;
> > > +  for (i = 0; i < num_devices; i++)
> > > +    {
> > > +      struct gomp_device_descr *devicep = &devices[i];
> > > +      gomp_mutex_lock (&devicep->lock);
> > > +      if (devicep->state == GOMP_DEVICE_INITIALIZED)
> > > +	{
> > > +	  devicep->fini_device_func (devicep->target_id);
> > > +	  devicep->state = GOMP_DEVICE_FINALIZED;
> > > +	}
> > > +      gomp_mutex_unlock (&devicep->lock);
> > > +    }
> > > +}
> > 
> > > @@ -2387,6 +2433,9 @@ gomp_target_init (void)
> > >        if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
> > >  	goacc_register (&devices[i]);
> > >      }
> > > +
> > > +  if (atexit (gomp_target_fini) != 0)
> > > +    gomp_fatal ("atexit failed");
> > >  }
> > 
> > Now, with the above change installed, GOMP_PLUGIN_fatal will trigger the
> > atexit handler, gomp_target_fini, which, with the device lock held, will
> > call back into the plugin, GOMP_OFFLOAD_fini_device, which will try to
> > clean up.
> > 
> > Because of the earlier CUDA_ERROR_LAUNCH_FAILED, the associated CUDA
> > context is now in an inconsistent state, see
> > <https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TYPES.html>:
> > 
> >     CUDA_ERROR_LAUNCH_FAILED = 719
> >         An exception occurred on the device while executing a
> >         kernel. Common causes include dereferencing an invalid device
> >         pointer and accessing out of bounds shared memory. The context
> >         cannot be used, so it must be destroyed (and a new one should be
> >         created). All existing device memory allocations from this
> >         context are invalid and must be reconstructed if the program is
> >         to continue using CUDA.
> > 
> > Thus, any cuMemFreeHost invocations that are run during clean-up will now
> > also/still return CUDA_ERROR_LAUNCH_FAILED, due to which we'll again call
> > GOMP_PLUGIN_fatal, which again will trigger the same or another
> > (GOMP_offload_unregister_ver) atexit handler, which will then deadlock
> > trying to lock the device again, which is still locked.
> > 
> > (Jim, I wonder: after the first CUDA_ERROR_LAUNCH_FAILED and similar
> > errors, should we destroy the context right away, or toggle a boolean
> > flag to mark it as unusable, and use that as an indication to avoid the
> > follow-on failures of cuMemFreeHost just described above, for example?)
> > 
> > <http://pubs.opengroup.org/onlinepubs/9699919799/functions/atexit.html>
> > tells us:
> > 
> >     Since the behavior is undefined if the exit() function is called more
> >     than once, portable applications calling atexit() must ensure that the
> >     exit() function is not called at normal process termination when all
> >     functions registered by the atexit() function are called.
> > 
> > ... which we're violating here, at least in the nvptx plugin.  I have not
> > analyzed the intermic one.
> > 
> > As it happens, Chung-Lin has been working in that area:
> > <http://news.gmane.org/find-root.php?message_id=%3C55DF1452.9050501%40codesourcery.com%3E>,
> > which he recently re-posted:
> > <http://news.gmane.org/find-root.php?message_id=%3C566EE49A.3050403%40codesourcery.com%3E>,
> > <http://news.gmane.org/find-root.php?message_id=%3C566EC310.8000403%40codesourcery.com%3E>,
> > <http://news.gmane.org/find-root.php?message_id=%3C566EC324.9050505%40codesourcery.com%3E>.
> > I have not analyzed whether his changes would completely resolve the
> > problem just described, but at least conceptually they seem to be a step
> > into the right direction?  (Jakub?)
> > 
> > Now, to resolve the immediate problem, what is the right thing for us to
> > do?  Is the following simple change OK, or is there a reason to still run
> > atexit handlers if terminating under error conditions?
> > 
> > commit b1733e8f9df6ae7d6828e2194df1b314772701c5
> > Author: Thomas Schwinge <thomas@codesourcery.com>
> > Date:   Wed Dec 16 13:10:39 2015 +0100
> > 
> >     Avoid deadlocks in libgomp due to competing atexit handlers
> >     
> >     	libgomp/
> >     	* error.c (gomp_vfatal): Call _exit instead of exit.
> > ---
> >  libgomp/error.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git libgomp/error.c libgomp/error.c
> > index 094c24a..1ef7491 100644
> > --- libgomp/error.c
> > +++ libgomp/error.c
> > @@ -34,6 +34,7 @@
> >  #include <stdarg.h>
> >  #include <stdio.h>
> >  #include <stdlib.h>
> > +#include <unistd.h>
> >  
> >  
> >  #undef gomp_vdebug
> > @@ -77,7 +78,7 @@ void
> >  gomp_vfatal (const char *fmt, va_list list)
> >  {
> >    gomp_verror (fmt, list);
> > -  exit (EXIT_FAILURE);
> > +  _exit (EXIT_FAILURE);
> >  }
> >  
> >  void


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4] Fix use of declare'd vars by routine procedures.
  2016-01-07 18:57                         ` [gomp4] Fix use of declare'd vars by routine procedures James Norris
@ 2016-01-11 11:55                           ` Thomas Schwinge
  2016-01-11 15:38                             ` James Norris
  0 siblings, 1 reply; 48+ messages in thread
From: Thomas Schwinge @ 2016-01-11 11:55 UTC (permalink / raw)
  To: James Norris, GCC Patches; +Cc: Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 8216 bytes --]

Hi!

On Thu, 7 Jan 2016 12:57:32 -0600, James Norris <jnorris@codesourcery.com> wrote:
> The checking of variables declared by OpenACC declare directives
> used within an OpenACC routine procedure was not being done correctly.
> This fix adds the checking required and also adds to the testing.
> 
> This fix resolves the issue pointed out by Cesar with declare-4.c
> (https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00339.html).
> 
> This patch has been applied to gomp-4_0-branch.

Such a patch needs to go into trunk; see my report in
<http://news.gmane.org/find-root.php?message_id=%3C87mvtapdp0.fsf%40kepler.schwinge.homeip.net%3E>.

> --- gcc/c/c-parser.c	(revision 232138)
> +++ gcc/c/c-parser.c	(working copy)
> @@ -14115,6 +14115,10 @@
>    /* Also add an "omp declare target" attribute, with clauses.  */
>    DECL_ATTRIBUTES (fndecl) = tree_cons (get_identifier ("omp declare target"),
>  					clauses, DECL_ATTRIBUTES (fndecl));
> +
> +  DECL_ATTRIBUTES (fndecl)
> +    = tree_cons (get_identifier ("oacc routine"),
> +		 clauses, DECL_ATTRIBUTES (fndecl));
>  }

Yuck, another attribute...  (..., and it's not listed/documented in
gcc/c-family/c-common.c:c_common_attribute_table.)

You store clauses in the "oacc routine" here, but it's unused as far as I
can tell.

Given that we have the clauses available from the "omp declare target"
attribute (subject to change to switching to a generic "omp clauses"
attribute as suggested in
<http://news.gmane.org/find-root.php?message_id=%3C87twns3ebs.fsf%40hertz.schwinge.homeip.net%3E>),
could we maybe just look up some specific clause instead of detecting the
presence of this extra attribute?  (Jakub, any preference?)  Anyway, have
we verified that the desired behavior:

> --- gcc/c/c-typeck.c	(revision 232138)
> +++ gcc/c/c-typeck.c	(working copy)
> @@ -2664,6 +2664,26 @@
>    tree ref;
>    tree decl = lookup_name (id);
>  
> +  if (decl
> +      && decl != error_mark_node
> +      && current_function_decl
> +      && TREE_CODE (decl) == VAR_DECL
> +      && is_global_var (decl)
> +      && lookup_attribute ("oacc routine",
> +			   DECL_ATTRIBUTES (current_function_decl)))
> +    {
> +      if (lookup_attribute ("omp declare target link",
> +			    DECL_ATTRIBUTES (decl))
> +	  || ((!lookup_attribute ("omp declare target",
> +				  DECL_ATTRIBUTES (decl))
> +	       && ((TREE_STATIC (decl) && !DECL_EXTERNAL (decl))
> +		   || (!TREE_STATIC (decl) && DECL_EXTERNAL (decl))))))
> +	{
> +	  error_at (loc, "invalid use in %<routine%> function");
> +	  return error_mark_node;
> +	}
> +    }

..., isn't applicable to OpenMP as well (thus, no "oacc routine"
attribute conditional is needed here)?

> --- gcc/cp/parser.c	(revision 232138)
> +++ gcc/cp/parser.c	(working copy)
> @@ -36732,6 +36732,10 @@
>        DECL_ATTRIBUTES (fndecl)
>  	= tree_cons (get_identifier ("omp declare target"),
>  		     clauses, DECL_ATTRIBUTES (fndecl));
> +
> +      DECL_ATTRIBUTES (fndecl)
> +	= tree_cons (get_identifier ("oacc routine"),
> +		     NULL_TREE, DECL_ATTRIBUTES (fndecl));
>      }

You don't store clauses in the "oacc routine" here.

> --- gcc/cp/semantics.c	(revision 232138)
> +++ gcc/cp/semantics.c	(working copy)
> @@ -3700,6 +3700,25 @@
>  
>  	  decl = convert_from_reference (decl);
>  	}
> +
> +      if (decl != error_mark_node
> +	  && current_function_decl
> +	  && TREE_CODE (decl) == VAR_DECL
> +	  && is_global_var (decl)
> +	  && lookup_attribute ("oacc routine",
> +			       DECL_ATTRIBUTES (current_function_decl)))
> +	{
> +	  if (lookup_attribute ("omp declare target link",
> +				DECL_ATTRIBUTES (decl))
> +	      || ((!lookup_attribute ("omp declare target",
> +				  DECL_ATTRIBUTES (decl))
> +		   && ((TREE_STATIC (decl) && !DECL_EXTERNAL (decl))
> +			|| (!TREE_STATIC (decl) && DECL_EXTERNAL (decl))))))
> +	    {
> +	      *error_msg = "invalid use in %<routine%> function";
> +	      return error_mark_node;
> +	    }
> +	}

Likewise.

No equivalent change is needed for Fortran?

> --- libgomp/testsuite/libgomp.oacc-c-c++-common/declare-4.c	(revision 232138)
> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/declare-4.c	(working copy)
> @@ -4,7 +4,7 @@
>  #include <openacc.h>
>  
>  float b;
> -#pragma acc declare link (b)
> +#pragma acc declare create (b)
>  
>  #pragma acc routine
>  int

I have not verified the details, but a very similar fix was required to
get rid of a number of regressions:

    @@ -2637,18 +2637,18 @@ PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 350)
    PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 356)
    PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 358)
    PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 360)
    [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 371)
    [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 378)
    [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 386)
    [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 395)
    [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 402)
    [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 409)
    [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 416)
    PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 42)
    [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 423)
    [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 430)
    [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 432)
    [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 434)
    PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 47)
    PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 52)
    PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 57)
    @@ -2667,7 +2667,7 @@ PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 93)
    PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 95)
    PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 97)
    PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 99)
    [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c (test for excess errors)

Same for C++.

Committed to gomp-4_0-branch in r232219:

commit 1ac87f2b59dd03cb305ec94a7c6b5657dbb54e66
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Jan 11 11:51:57 2016 +0000

    Resolve c-c++-common/goacc-gomp/nesting-fail-1.c regressions
    
    	gcc/testsuite/
    	* c-c++-common/goacc-gomp/nesting-fail-1.c: Add OpenACC declare
    	directive for "i".
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@232219 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog.gomp                           | 5 +++++
 gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c | 1 +
 2 files changed, 6 insertions(+)

diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
index 607ca8e..2db11df 100644
--- gcc/testsuite/ChangeLog.gomp
+++ gcc/testsuite/ChangeLog.gomp
@@ -1,3 +1,8 @@
+2016-01-11  Thomas Schwinge  <thomas@codesourcery.com>
+
+	* c-c++-common/goacc-gomp/nesting-fail-1.c: Add OpenACC declare
+	directive for "i".
+
 2016-01-07  James Norris  <jnorris@codesourcery.com>
 
 	* c-c++-common/goacc/routine-5.c: Additional tests.
diff --git gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
index 8e0f217..9011fcf 100644
--- gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
+++ gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
@@ -1,4 +1,5 @@
 extern int i;
+#pragma acc declare create(i)
 
 void
 f_omp (void)


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4] Fix use of declare'd vars by routine procedures.
  2016-01-11 11:55                           ` Thomas Schwinge
@ 2016-01-11 15:38                             ` James Norris
  0 siblings, 0 replies; 48+ messages in thread
From: James Norris @ 2016-01-11 15:38 UTC (permalink / raw)
  To: Thomas Schwinge, James Norris, GCC Patches; +Cc: Jakub Jelinek

Hi!

On 01/11/2016 05:55 AM, Thomas Schwinge wrote:
> Hi!
>
> On Thu, 7 Jan 2016 12:57:32 -0600, James Norris <jnorris@codesourcery.com> wrote:
>> The checking of variables declared by OpenACC declare directives
>> used within an OpenACC routine procedure was not being done correctly.
>> This fix adds the checking required and also adds to the testing.
>>
>> This fix resolves the issue pointed out by Cesar with declare-4.c
>> (https://gcc.gnu.org/ml/gcc-patches/2016-01/msg00339.html).
>>
>> This patch has been applied to gomp-4_0-branch.
>
> Such a patch needs to go into trunk; see my report in
> <http://news.gmane.org/find-root.php?message_id=%3C87mvtapdp0.fsf%40kepler.schwinge.homeip.net%3E>.
>
>> --- gcc/c/c-parser.c	(revision 232138)
>> +++ gcc/c/c-parser.c	(working copy)
>> @@ -14115,6 +14115,10 @@
>>     /* Also add an "omp declare target" attribute, with clauses.  */
>>     DECL_ATTRIBUTES (fndecl) = tree_cons (get_identifier ("omp declare target"),
>>   					clauses, DECL_ATTRIBUTES (fndecl));
>> +
>> +  DECL_ATTRIBUTES (fndecl)
>> +    = tree_cons (get_identifier ("oacc routine"),
>> +		 clauses, DECL_ATTRIBUTES (fndecl));
>>   }
>
> Yuck, another attribute...  (..., and it's not listed/documented in
> gcc/c-family/c-common.c:c_common_attribute_table.)

Yuck is right. In the interim I've found a function: get_oacc_fn_attrib,
in omp-low.c, that seems to suit my needs. So I'll be revising to use
that one instead.

> You store clauses in the "oacc routine" here, but it's unused as far as I
> can tell.
>
> Given that we have the clauses available from the "omp declare target"
> attribute (subject to change to switching to a generic "omp clauses"
> attribute as suggested in
> <http://news.gmane.org/find-root.php?message_id=%3C87twns3ebs.fsf%40hertz.schwinge.homeip.net%3E>),
> could we maybe just look up some specific clause instead of detecting the
> presence of this extra attribute?  (Jakub, any preference?)  Anyway, have
> we verified that the desired behavior:
>
>> --- gcc/c/c-typeck.c	(revision 232138)
>> +++ gcc/c/c-typeck.c	(working copy)
>> @@ -2664,6 +2664,26 @@
>>     tree ref;
>>     tree decl = lookup_name (id);
>>
>> +  if (decl
>> +      && decl != error_mark_node
>> +      && current_function_decl
>> +      && TREE_CODE (decl) == VAR_DECL
>> +      && is_global_var (decl)
>> +      && lookup_attribute ("oacc routine",
>> +			   DECL_ATTRIBUTES (current_function_decl)))
>> +    {
>> +      if (lookup_attribute ("omp declare target link",
>> +			    DECL_ATTRIBUTES (decl))
>> +	  || ((!lookup_attribute ("omp declare target",
>> +				  DECL_ATTRIBUTES (decl))
>> +	       && ((TREE_STATIC (decl) && !DECL_EXTERNAL (decl))
>> +		   || (!TREE_STATIC (decl) && DECL_EXTERNAL (decl))))))
>> +	{
>> +	  error_at (loc, "invalid use in %<routine%> function");
>> +	  return error_mark_node;
>> +	}
>> +    }
>
> ..., isn't applicable to OpenMP as well (thus, no "oacc routine"
> attribute conditional is needed here)?

This appears to be a situation where OpenMP and OpenACC diverge,
so I need to discriminate between OpenACC routine and OpenMP
declare target.

>
>> --- gcc/cp/parser.c	(revision 232138)
>> +++ gcc/cp/parser.c	(working copy)
>> @@ -36732,6 +36732,10 @@
>>         DECL_ATTRIBUTES (fndecl)
>>   	= tree_cons (get_identifier ("omp declare target"),
>>   		     clauses, DECL_ATTRIBUTES (fndecl));
>> +
>> +      DECL_ATTRIBUTES (fndecl)
>> +	= tree_cons (get_identifier ("oacc routine"),
>> +		     NULL_TREE, DECL_ATTRIBUTES (fndecl));
>>       }
>
> You don't store clauses in the "oacc routine" here.
>
>> --- gcc/cp/semantics.c	(revision 232138)
>> +++ gcc/cp/semantics.c	(working copy)
>> @@ -3700,6 +3700,25 @@
>>
>>   	  decl = convert_from_reference (decl);
>>   	}
>> +
>> +      if (decl != error_mark_node
>> +	  && current_function_decl
>> +	  && TREE_CODE (decl) == VAR_DECL
>> +	  && is_global_var (decl)
>> +	  && lookup_attribute ("oacc routine",
>> +			       DECL_ATTRIBUTES (current_function_decl)))
>> +	{
>> +	  if (lookup_attribute ("omp declare target link",
>> +				DECL_ATTRIBUTES (decl))
>> +	      || ((!lookup_attribute ("omp declare target",
>> +				  DECL_ATTRIBUTES (decl))
>> +		   && ((TREE_STATIC (decl) && !DECL_EXTERNAL (decl))
>> +			|| (!TREE_STATIC (decl) && DECL_EXTERNAL (decl))))))
>> +	    {
>> +	      *error_msg = "invalid use in %<routine%> function";
>> +	      return error_mark_node;
>> +	    }
>> +	}
>
> Likewise.
>
> No equivalent change is needed for Fortran?

No. C/C++ statics and extern's don't appear in Fortran.

>
>> --- libgomp/testsuite/libgomp.oacc-c-c++-common/declare-4.c	(revision 232138)
>> +++ libgomp/testsuite/libgomp.oacc-c-c++-common/declare-4.c	(working copy)
>> @@ -4,7 +4,7 @@
>>   #include <openacc.h>
>>
>>   float b;
>> -#pragma acc declare link (b)
>> +#pragma acc declare create (b)
>>
>>   #pragma acc routine
>>   int
>
> I have not verified the details, but a very similar fix was required to
> get rid of a number of regressions:
>
>      @@ -2637,18 +2637,18 @@ PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 350)
>      PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 356)
>      PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 358)
>      PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 360)
>      [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 371)
>      [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 378)
>      [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 386)
>      [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 395)
>      [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 402)
>      [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 409)
>      [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 416)
>      PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 42)
>      [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 423)
>      [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 430)
>      [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 432)
>      [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 434)
>      PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 47)
>      PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 52)
>      PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 57)
>      @@ -2667,7 +2667,7 @@ PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 93)
>      PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 95)
>      PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 97)
>      PASS: c-c++-common/goacc-gomp/nesting-fail-1.c  (test for errors, line 99)
>      [-PASS:-]{+FAIL:+} c-c++-common/goacc-gomp/nesting-fail-1.c (test for excess errors)
>
> Same for C++.
>
> Committed to gomp-4_0-branch in r232219:
>
> commit 1ac87f2b59dd03cb305ec94a7c6b5657dbb54e66
> Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
> Date:   Mon Jan 11 11:51:57 2016 +0000
>
>      Resolve c-c++-common/goacc-gomp/nesting-fail-1.c regressions
>
>      	gcc/testsuite/
>      	* c-c++-common/goacc-gomp/nesting-fail-1.c: Add OpenACC declare
>      	directive for "i".
>
>      git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@232219 138bc75d-0d04-0410-961f-82ee72b054a4
> ---
>   gcc/testsuite/ChangeLog.gomp                           | 5 +++++
>   gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c | 1 +
>   2 files changed, 6 insertions(+)
>
> diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp
> index 607ca8e..2db11df 100644
> --- gcc/testsuite/ChangeLog.gomp
> +++ gcc/testsuite/ChangeLog.gomp
> @@ -1,3 +1,8 @@
> +2016-01-11  Thomas Schwinge  <thomas@codesourcery.com>
> +
> +	* c-c++-common/goacc-gomp/nesting-fail-1.c: Add OpenACC declare
> +	directive for "i".
> +
>   2016-01-07  James Norris  <jnorris@codesourcery.com>
>
>   	* c-c++-common/goacc/routine-5.c: Additional tests.
> diff --git gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
> index 8e0f217..9011fcf 100644
> --- gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
> +++ gcc/testsuite/c-c++-common/goacc-gomp/nesting-fail-1.c
> @@ -1,4 +1,5 @@
>   extern int i;
> +#pragma acc declare create(i)
>
>   void
>   f_omp (void)
>
>
> Grüße
>   Thomas
>

Thank you, thank you,
Jim

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: gomp_target_fini
  2016-01-11 10:40                                           ` gomp_target_fini Thomas Schwinge
@ 2016-01-21  6:17                                             ` Thomas Schwinge
  0 siblings, 0 replies; 48+ messages in thread
From: Thomas Schwinge @ 2016-01-21  6:17 UTC (permalink / raw)
  To: gcc-patches, Ilya Verbin, Jakub Jelinek
  Cc: Kirill Yukhin, Chung-Lin Tang, James Norris

[-- Attachment #1: Type: text/plain, Size: 8254 bytes --]

Hi!

Ping.

On Mon, 11 Jan 2016 11:39:46 +0100, I wrote:
> Ping.
> 
> On Wed, 23 Dec 2015 12:05:32 +0100, I wrote:
> > Ping.
> > 
> > On Wed, 16 Dec 2015 13:30:21 +0100, I wrote:
> > > On Mon, 14 Dec 2015 19:47:36 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> > > > On Fri, Dec 11, 2015 at 18:27:13 +0100, Jakub Jelinek wrote:
> > > > > On Tue, Dec 08, 2015 at 05:45:59PM +0300, Ilya Verbin wrote:
> > > > > > +/* This function finalizes all initialized devices.  */
> > > > > > +
> > > > > > +static void
> > > > > > +gomp_target_fini (void)
> > > > > > +{
> > > > > > +  [...]
> > > > > 
> > > > > The question is what will this do if there are async target tasks still
> > > > > running on some of the devices at this point (forgotten #pragma omp taskwait
> > > > > or similar if target nowait regions are started outside of parallel region,
> > > > > or exit inside of parallel, etc.  But perhaps it can be handled incrementally.
> > > > > Also there is the question that the 
> > > > > So I think the patch is ok with the above mentioned changes.
> > > > 
> > > > Here is what I've committed to trunk.
> > > 
> > > > --- a/libgomp/libgomp.h
> > > > +++ b/libgomp/libgomp.h
> > > > @@ -888,6 +888,14 @@ typedef struct acc_dispatch_t
> > > >    } cuda;
> > > >  } acc_dispatch_t;
> > > >  
> > > > +/* Various state of the accelerator device.  */
> > > > +enum gomp_device_state
> > > > +{
> > > > +  GOMP_DEVICE_UNINITIALIZED,
> > > > +  GOMP_DEVICE_INITIALIZED,
> > > > +  GOMP_DEVICE_FINALIZED
> > > > +};
> > > > +
> > > >  /* This structure describes accelerator device.
> > > >     It contains name of the corresponding libgomp plugin, function handlers for
> > > >     interaction with the device, ID-number of the device, and information about
> > > > @@ -933,8 +941,10 @@ struct gomp_device_descr
> > > >    /* Mutex for the mutable data.  */
> > > >    gomp_mutex_t lock;
> > > >  
> > > > -  /* Set to true when device is initialized.  */
> > > > -  bool is_initialized;
> > > > +  /* Current state of the device.  OpenACC allows to move from INITIALIZED state
> > > > +     back to UNINITIALIZED state.  OpenMP allows only to move from INITIALIZED
> > > > +     to FINALIZED state (at program shutdown).  */
> > > > +  enum gomp_device_state state;
> > > 
> > > (ACK, but I assume we'll want to make sure that an OpenACC device is
> > > never re-initialized if we're in/after the libgomp finalization phase.)
> > > 
> > > 
> > > The issue mentioned above: "exit inside of parallel" is actually a
> > > problem for nvptx offloading: the libgomp.oacc-c-c++-common/abort-1.c,
> > > libgomp.oacc-c-c++-common/abort-3.c, and libgomp.oacc-fortran/abort-1.f90
> > > test cases now run into annoying "WARNING: program timed out".  Here is
> > > what's happening, as I understand it: in
> > > libgomp/plugin/plugin-nvptx.c:nvptx_exec, the cuStreamSynchronize call
> > > returns CUDA_ERROR_LAUNCH_FAILED, upon which we call GOMP_PLUGIN_fatal.
> > > 
> > > > --- a/libgomp/target.c
> > > > +++ b/libgomp/target.c
> > > 
> > > > +/* This function finalizes all initialized devices.  */
> > > > +
> > > > +static void
> > > > +gomp_target_fini (void)
> > > > +{
> > > > +  int i;
> > > > +  for (i = 0; i < num_devices; i++)
> > > > +    {
> > > > +      struct gomp_device_descr *devicep = &devices[i];
> > > > +      gomp_mutex_lock (&devicep->lock);
> > > > +      if (devicep->state == GOMP_DEVICE_INITIALIZED)
> > > > +	{
> > > > +	  devicep->fini_device_func (devicep->target_id);
> > > > +	  devicep->state = GOMP_DEVICE_FINALIZED;
> > > > +	}
> > > > +      gomp_mutex_unlock (&devicep->lock);
> > > > +    }
> > > > +}
> > > 
> > > > @@ -2387,6 +2433,9 @@ gomp_target_init (void)
> > > >        if (devices[i].capabilities & GOMP_OFFLOAD_CAP_OPENACC_200)
> > > >  	goacc_register (&devices[i]);
> > > >      }
> > > > +
> > > > +  if (atexit (gomp_target_fini) != 0)
> > > > +    gomp_fatal ("atexit failed");
> > > >  }
> > > 
> > > Now, with the above change installed, GOMP_PLUGIN_fatal will trigger the
> > > atexit handler, gomp_target_fini, which, with the device lock held, will
> > > call back into the plugin, GOMP_OFFLOAD_fini_device, which will try to
> > > clean up.
> > > 
> > > Because of the earlier CUDA_ERROR_LAUNCH_FAILED, the associated CUDA
> > > context is now in an inconsistent state, see
> > > <https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__TYPES.html>:
> > > 
> > >     CUDA_ERROR_LAUNCH_FAILED = 719
> > >         An exception occurred on the device while executing a
> > >         kernel. Common causes include dereferencing an invalid device
> > >         pointer and accessing out of bounds shared memory. The context
> > >         cannot be used, so it must be destroyed (and a new one should be
> > >         created). All existing device memory allocations from this
> > >         context are invalid and must be reconstructed if the program is
> > >         to continue using CUDA.
> > > 
> > > Thus, any cuMemFreeHost invocations that are run during clean-up will now
> > > also/still return CUDA_ERROR_LAUNCH_FAILED, due to which we'll again call
> > > GOMP_PLUGIN_fatal, which again will trigger the same or another
> > > (GOMP_offload_unregister_ver) atexit handler, which will then deadlock
> > > trying to lock the device again, which is still locked.
> > > 
> > > (Jim, I wonder: after the first CUDA_ERROR_LAUNCH_FAILED and similar
> > > errors, should we destroy the context right away, or toggle a boolean
> > > flag to mark it as unusable, and use that as an indication to avoid the
> > > follow-on failures of cuMemFreeHost just described above, for example?)
> > > 
> > > <http://pubs.opengroup.org/onlinepubs/9699919799/functions/atexit.html>
> > > tells us:
> > > 
> > >     Since the behavior is undefined if the exit() function is called more
> > >     than once, portable applications calling atexit() must ensure that the
> > >     exit() function is not called at normal process termination when all
> > >     functions registered by the atexit() function are called.
> > > 
> > > ... which we're violating here, at least in the nvptx plugin.  I have not
> > > analyzed the intermic one.
> > > 
> > > As it happens, Chung-Lin has been working in that area:
> > > <http://news.gmane.org/find-root.php?message_id=%3C55DF1452.9050501%40codesourcery.com%3E>,
> > > which he recently re-posted:
> > > <http://news.gmane.org/find-root.php?message_id=%3C566EE49A.3050403%40codesourcery.com%3E>,
> > > <http://news.gmane.org/find-root.php?message_id=%3C566EC310.8000403%40codesourcery.com%3E>,
> > > <http://news.gmane.org/find-root.php?message_id=%3C566EC324.9050505%40codesourcery.com%3E>.
> > > I have not analyzed whether his changes would completely resolve the
> > > problem just described, but at least conceptually they seem to be a step
> > > into the right direction?  (Jakub?)
> > > 
> > > Now, to resolve the immediate problem, what is the right thing for us to
> > > do?  Is the following simple change OK, or is there a reason to still run
> > > atexit handlers if terminating under error conditions?
> > > 
> > > commit b1733e8f9df6ae7d6828e2194df1b314772701c5
> > > Author: Thomas Schwinge <thomas@codesourcery.com>
> > > Date:   Wed Dec 16 13:10:39 2015 +0100
> > > 
> > >     Avoid deadlocks in libgomp due to competing atexit handlers
> > >     
> > >     	libgomp/
> > >     	* error.c (gomp_vfatal): Call _exit instead of exit.
> > > ---
> > >  libgomp/error.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git libgomp/error.c libgomp/error.c
> > > index 094c24a..1ef7491 100644
> > > --- libgomp/error.c
> > > +++ libgomp/error.c
> > > @@ -34,6 +34,7 @@
> > >  #include <stdarg.h>
> > >  #include <stdio.h>
> > >  #include <stdlib.h>
> > > +#include <unistd.h>
> > >  
> > >  
> > >  #undef gomp_vdebug
> > > @@ -77,7 +78,7 @@ void
> > >  gomp_vfatal (const char *fmt, va_list list)
> > >  {
> > >    gomp_verror (fmt, list);
> > > -  exit (EXIT_FAILURE);
> > > +  _exit (EXIT_FAILURE);
> > >  }
> > >  
> > >  void


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: gomp_target_fini
  2015-12-16 12:30                                       ` gomp_target_fini (was: [gomp4.5] Handle #pragma omp declare target link) Thomas Schwinge
  2015-12-23 11:05                                         ` gomp_target_fini Thomas Schwinge
@ 2016-01-21 15:24                                         ` Bernd Schmidt
  2016-01-22 10:16                                           ` gomp_target_fini Jakub Jelinek
  1 sibling, 1 reply; 48+ messages in thread
From: Bernd Schmidt @ 2016-01-21 15:24 UTC (permalink / raw)
  To: Thomas Schwinge, Ilya Verbin, Jakub Jelinek, Chung-Lin Tang,
	James Norris
  Cc: gcc-patches, Kirill Yukhin

Thomas, I've mentioned this issue before - there is sometimes just too 
much irrelevant stuff to wade through in your patch submissions, and it 
discourages review. The discussion of the actual problem begins more 
than halfway through your multi-page mail. Please try to be more concise.

On 12/16/2015 01:30 PM, Thomas Schwinge wrote:
> Now, with the above change installed, GOMP_PLUGIN_fatal will trigger the
> atexit handler, gomp_target_fini, which, with the device lock held, will
> call back into the plugin, GOMP_OFFLOAD_fini_device, which will try to
> clean up.
>
> Because of the earlier CUDA_ERROR_LAUNCH_FAILED, the associated CUDA
> context is now in an inconsistent state

> Thus, any cuMemFreeHost invocations that are run during clean-up will now
> also/still return CUDA_ERROR_LAUNCH_FAILED, due to which we'll again call
> GOMP_PLUGIN_fatal, which again will trigger the same or another
> (GOMP_offload_unregister_ver) atexit handler, which will then deadlock
> trying to lock the device again, which is still locked.

>      	libgomp/
>      	* error.c (gomp_vfatal): Call _exit instead of exit.

It seems unfortunate to disable the atexit handlers for everything for 
what seems purely an nvptx problem.

What exactly happens if you don't register the cleanups with atexit in 
the first place? Or maybe you can query for CUDA_ERROR_LAUNCH_FAILED in 
the cleanup functions?


Bernd

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: gomp_target_fini
  2016-01-21 15:24                                         ` gomp_target_fini Bernd Schmidt
@ 2016-01-22 10:16                                           ` Jakub Jelinek
  2016-01-25 18:23                                             ` gomp_target_fini Mike Stump
  2016-04-19 14:01                                             ` gomp_target_fini Thomas Schwinge
  0 siblings, 2 replies; 48+ messages in thread
From: Jakub Jelinek @ 2016-01-22 10:16 UTC (permalink / raw)
  To: Bernd Schmidt
  Cc: Thomas Schwinge, Ilya Verbin, Chung-Lin Tang, James Norris,
	gcc-patches, Kirill Yukhin

On Thu, Jan 21, 2016 at 04:24:46PM +0100, Bernd Schmidt wrote:
> Thomas, I've mentioned this issue before - there is sometimes just too much
> irrelevant stuff to wade through in your patch submissions, and it
> discourages review. The discussion of the actual problem begins more than
> halfway through your multi-page mail. Please try to be more concise.
> 
> On 12/16/2015 01:30 PM, Thomas Schwinge wrote:
> >Now, with the above change installed, GOMP_PLUGIN_fatal will trigger the
> >atexit handler, gomp_target_fini, which, with the device lock held, will
> >call back into the plugin, GOMP_OFFLOAD_fini_device, which will try to
> >clean up.
> >
> >Because of the earlier CUDA_ERROR_LAUNCH_FAILED, the associated CUDA
> >context is now in an inconsistent state
> 
> >Thus, any cuMemFreeHost invocations that are run during clean-up will now
> >also/still return CUDA_ERROR_LAUNCH_FAILED, due to which we'll again call
> >GOMP_PLUGIN_fatal, which again will trigger the same or another
> >(GOMP_offload_unregister_ver) atexit handler, which will then deadlock
> >trying to lock the device again, which is still locked.
> 
> >     	libgomp/
> >     	* error.c (gomp_vfatal): Call _exit instead of exit.
> 
> It seems unfortunate to disable the atexit handlers for everything for what
> seems purely an nvptx problem.
> 
> What exactly happens if you don't register the cleanups with atexit in the
> first place? Or maybe you can query for CUDA_ERROR_LAUNCH_FAILED in the
> cleanup functions?

I agree, _exit is just wrong, there could be important atexit hooks from the
application.  You can set some flag that the libgomp or nvptx plugin atexit
hooks should not do anything, or should do things differently.  But
bypassing all atexit handlers is risky.

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: gomp_target_fini
  2016-01-22 10:16                                           ` gomp_target_fini Jakub Jelinek
@ 2016-01-25 18:23                                             ` Mike Stump
  2016-04-19 14:01                                             ` gomp_target_fini Thomas Schwinge
  1 sibling, 0 replies; 48+ messages in thread
From: Mike Stump @ 2016-01-25 18:23 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Bernd Schmidt, Thomas Schwinge, Ilya Verbin, Chung-Lin Tang,
	James Norris, gcc-patches, Kirill Yukhin

On Jan 22, 2016, at 2:16 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Jan 21, 2016 at 04:24:46PM +0100, Bernd Schmidt wrote:
>> Thomas, I've mentioned this issue before - there is sometimes just too much
>> irrelevant stuff to wade through in your patch submissions, and it
>> discourages review. The discussion of the actual problem begins more than
>> halfway through your multi-page mail. Please try to be more concise.
>> 
>> On 12/16/2015 01:30 PM, Thomas Schwinge wrote:
>>> Now, with the above change installed, GOMP_PLUGIN_fatal will trigger the
>>> atexit handler, gomp_target_fini, which, with the device lock held, will
>>> call back into the plugin, GOMP_OFFLOAD_fini_device, which will try to
>>> clean up.
>>> 
>>> Because of the earlier CUDA_ERROR_LAUNCH_FAILED, the associated CUDA
>>> context is now in an inconsistent state
>> 
>>> Thus, any cuMemFreeHost invocations that are run during clean-up will now
>>> also/still return CUDA_ERROR_LAUNCH_FAILED, due to which we'll again call
>>> GOMP_PLUGIN_fatal, which again will trigger the same or another
>>> (GOMP_offload_unregister_ver) atexit handler, which will then deadlock
>>> trying to lock the device again, which is still locked.
>> 
>>>    	libgomp/
>>>    	* error.c (gomp_vfatal): Call _exit instead of exit.
>> 
>> It seems unfortunate to disable the atexit handlers for everything for what
>> seems purely an nvptx problem.
>> 
>> What exactly happens if you don't register the cleanups with atexit in the
>> first place? Or maybe you can query for CUDA_ERROR_LAUNCH_FAILED in the
>> cleanup functions?
> 
> I agree, _exit is just wrong, there could be important atexit hooks from the
> application.  You can set some flag that the libgomp or nvptx plugin atexit
> hooks should not do anything, or should do things differently.  But
> bypassing all atexit handlers is risky.

I’d use the phrase, is wrong.

Just create a semaphore that says that init was fully done, and at the end of init, set it, and at the beginning of the cleanup, just test it and anytime you want to cancel the cleanup, reset the semaphore.  Think of it, as a is_valid predicate.  Any operation that needs it to be valid can query it first, and fail otherwise.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: gomp_target_fini
  2016-01-22 10:16                                           ` gomp_target_fini Jakub Jelinek
  2016-01-25 18:23                                             ` gomp_target_fini Mike Stump
@ 2016-04-19 14:01                                             ` Thomas Schwinge
  2016-04-19 14:04                                               ` gomp_target_fini Jakub Jelinek
  2016-04-19 15:23                                               ` gomp_target_fini Alexander Monakov
  1 sibling, 2 replies; 48+ messages in thread
From: Thomas Schwinge @ 2016-04-19 14:01 UTC (permalink / raw)
  To: Jakub Jelinek, Bernd Schmidt
  Cc: Ilya Verbin, Chung-Lin Tang, James Norris, gcc-patches,
	Kirill Yukhin, Alexander Monakov

Hi!

On Fri, 22 Jan 2016 11:16:07 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Jan 21, 2016 at 04:24:46PM +0100, Bernd Schmidt wrote:
> > On 12/16/2015 01:30 PM, Thomas Schwinge wrote:
> > >Now, with the above change installed, GOMP_PLUGIN_fatal will trigger the
> > >atexit handler, gomp_target_fini, which, with the device lock held, will
> > >call back into the plugin, GOMP_OFFLOAD_fini_device, which will try to
> > >clean up.
> > >
> > >Because of the earlier CUDA_ERROR_LAUNCH_FAILED, the associated CUDA
> > >context is now in an inconsistent state
> > 
> > >Thus, any cuMemFreeHost invocations that are run during clean-up will now
> > >also/still return CUDA_ERROR_LAUNCH_FAILED, due to which we'll again call
> > >GOMP_PLUGIN_fatal, which again will trigger the same or another
> > >(GOMP_offload_unregister_ver) atexit handler, which will then deadlock
> > >trying to lock the device again, which is still locked.

(... causing "WARNING: program timed out" for the affected libgomp test
cases, as well as deadlocks for any such user code, too.)

> > >     	libgomp/
> > >     	* error.c (gomp_vfatal): Call _exit instead of exit.
> > 
> > It seems unfortunate to disable the atexit handlers for everything for what
> > seems purely an nvptx problem.  [...]

> I agree, _exit is just wrong, there could be important atexit hooks from the
> application.  You can set some flag that the libgomp or nvptx plugin atexit
> hooks should not do anything, or should do things differently.  But
> bypassing all atexit handlers is risky.

Well, I certainly had done at least some thinking before proposing this:
we're talking about the libgomp "fatal exit" function, called when
something has gone very wrong, and we're about to terminate the process,
because there's no hope to recover.  In this situation/consideration it
didn't seem important to me to have atexit handlers called.  Just like
these are also not called when we run into a SIGSEGV, or the kernel kills
the process for other reasons.  So I'm not completely convinced by your
assessment that calling "_exit is just wrong".  Anyway, I can certainly
accept that my understanding of the seriousness of a libgomp "fatal exit"
has been too pessimistic, and that we can do better than my proposed
_exit solution.

Two other solutions have been proposed in the past months: Chung-Lin's
patches with subject: "Adjust offload plugin interface for avoiding
deadlock on exit", later: "Resolve libgomp plugin deadlock on exit",
later: "Resolve deadlock on plugin exit" (still pending review/approval),
and Alexander's much smaller patch with subject: "libgomp plugin: make
cuMemFreeHost error non-fatal",
<http://news.gmane.org/find-root.php?message_id=%3C1458323327-9908-4-git-send-email-amonakov%40ispras.ru%3E>.
(Both of which I have not reviewed in detail.)  Assuming that Chung-Lin's
patches are considered too invasive for gcc-6-branch, can we at least get
Alexander's patch committed to gcc-6-branch as well as on trunk, please?

commit d86a582bd9c21451dc888695ee6ecef37b5fb6ac
Author: Alexander Monakov <amonakov@ispras.ru>
Date:   Fri Mar 11 15:31:33 2016 +0300

    libgomp plugin: make cuMemFreeHost error non-fatal
    
    Unlike cuMemFree and other resource-releasing functions called on exit,
    cuMemFreeHost appears to re-report errors encountered in kernel launch.
    This leads to a deadlock after GOMP_PLUGIN_fatal is reentered.
    
    While the behavior on libgomp side is suboptimal (there's no need to
    call resource-releasing functions if we're about to destroy the CUDA
    context anyway), this behavior on cuMemFreeHost part is not useful
    and just makes error "recovery" harder.  This was reported to NVIDIA
    (bug ref. 1737876), but we can work around it by simply reporting the
    error without making it fatal.
    
    	* plugin/plugin-nvptx.c (map_fini): Make cuMemFreeHost error non-fatal.
---
 libgomp/ChangeLog.gomp-nvptx  | 4 ++++
 libgomp/plugin/plugin-nvptx.c | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git libgomp/ChangeLog.gomp-nvptx libgomp/ChangeLog.gomp-nvptx
index 7eefe0b..6bd9e5e 100644
--- libgomp/ChangeLog.gomp-nvptx
+++ libgomp/ChangeLog.gomp-nvptx
@@ -1,3 +1,7 @@
+2016-03-11  Alexander Monakov  <amonakov@ispras.ru>
+
+	* plugin/plugin-nvptx.c (map_fini): Make cuMemFreeHost error non-fatal.
+
 2016-03-04  Alexander Monakov  <amonakov@ispras.ru>
 
 	* config/nvptx/bar.c: Remove wrong invocation of
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index adf57b1..4e44242 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -135,7 +135,7 @@ map_fini (struct ptx_stream *s)
 
   r = cuMemFreeHost (s->h);
   if (r != CUDA_SUCCESS)
-    GOMP_PLUGIN_fatal ("cuMemFreeHost error: %s", cuda_error (r));
+    GOMP_PLUGIN_error ("cuMemFreeHost error: %s", cuda_error (r));
 }
 
 static void


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: gomp_target_fini
  2016-04-19 14:01                                             ` gomp_target_fini Thomas Schwinge
@ 2016-04-19 14:04                                               ` Jakub Jelinek
  2016-04-21 13:43                                                 ` gomp_target_fini Alexander Monakov
  2016-04-19 15:23                                               ` gomp_target_fini Alexander Monakov
  1 sibling, 1 reply; 48+ messages in thread
From: Jakub Jelinek @ 2016-04-19 14:04 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Bernd Schmidt, Ilya Verbin, Chung-Lin Tang, James Norris,
	gcc-patches, Kirill Yukhin, Alexander Monakov

On Tue, Apr 19, 2016 at 04:01:06PM +0200, Thomas Schwinge wrote:
> Two other solutions have been proposed in the past months: Chung-Lin's
> patches with subject: "Adjust offload plugin interface for avoiding
> deadlock on exit", later: "Resolve libgomp plugin deadlock on exit",
> later: "Resolve deadlock on plugin exit" (still pending review/approval),
> and Alexander's much smaller patch with subject: "libgomp plugin: make
> cuMemFreeHost error non-fatal",
> <http://news.gmane.org/find-root.php?message_id=%3C1458323327-9908-4-git-send-email-amonakov%40ispras.ru%3E>.
> (Both of which I have not reviewed in detail.)  Assuming that Chung-Lin's
> patches are considered too invasive for gcc-6-branch, can we at least get
> Alexander's patch committed to gcc-6-branch as well as on trunk, please?

Yeah, Alex' patch is IMHO fine, even for gcc-6-branch.

> --- libgomp/ChangeLog.gomp-nvptx
> +++ libgomp/ChangeLog.gomp-nvptx
> @@ -1,3 +1,7 @@
> +2016-03-11  Alexander Monakov  <amonakov@ispras.ru>
> +
> +	* plugin/plugin-nvptx.c (map_fini): Make cuMemFreeHost error non-fatal.
> +
>  2016-03-04  Alexander Monakov  <amonakov@ispras.ru>
>  
>  	* config/nvptx/bar.c: Remove wrong invocation of
> diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
> index adf57b1..4e44242 100644
> --- libgomp/plugin/plugin-nvptx.c
> +++ libgomp/plugin/plugin-nvptx.c
> @@ -135,7 +135,7 @@ map_fini (struct ptx_stream *s)
>  
>    r = cuMemFreeHost (s->h);
>    if (r != CUDA_SUCCESS)
> -    GOMP_PLUGIN_fatal ("cuMemFreeHost error: %s", cuda_error (r));
> +    GOMP_PLUGIN_error ("cuMemFreeHost error: %s", cuda_error (r));
>  }
>  
>  static void

	Jakub

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: gomp_target_fini
  2016-04-19 14:01                                             ` gomp_target_fini Thomas Schwinge
  2016-04-19 14:04                                               ` gomp_target_fini Jakub Jelinek
@ 2016-04-19 15:23                                               ` Alexander Monakov
  1 sibling, 0 replies; 48+ messages in thread
From: Alexander Monakov @ 2016-04-19 15:23 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Jakub Jelinek, Bernd Schmidt, Ilya Verbin, Chung-Lin Tang,
	James Norris, gcc-patches, Kirill Yukhin

On Tue, 19 Apr 2016, Thomas Schwinge wrote:
> Well, I certainly had done at least some thinking before proposing this:
> we're talking about the libgomp "fatal exit" function, called when
> something has gone very wrong, and we're about to terminate the process,
> because there's no hope to recover.

By the way, this relates to something I wanted to bring up for a while now.

The OpenMP spec does not talk about error conditions arising in well-formed
programs due to resource exhaustion (OOM, in particular).  My understanding
is that an implementation always has a "way out": if e.g. it fails to allocate
memory required for a thread, it could run with reduced parallelism.
Ultimately the implementation can "fail gracefully" all the way back to
running the program sequentially.

Offloading makes that unclear due to how host fallbacks for target regions are
observable (which I don't understand, and I hope we get a chance to discuss
it), but is the above understanding generally correct?  Today libgomp is
clearly "trigger happy" to crash the process when something goes slightly
wrong, but was graceful failure ever considered as a design [non-]goal?

In that light, can a general policy of avoiding aborting the program be in
place, and should plugin authors work towards introducing fallback paths
instead of [over-]using GOMP_PLUGIN_fatal?

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: gomp_target_fini
  2016-04-19 14:04                                               ` gomp_target_fini Jakub Jelinek
@ 2016-04-21 13:43                                                 ` Alexander Monakov
  2016-04-21 15:38                                                   ` gomp_target_fini Thomas Schwinge
  0 siblings, 1 reply; 48+ messages in thread
From: Alexander Monakov @ 2016-04-21 13:43 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Thomas Schwinge, Bernd Schmidt, Ilya Verbin, Chung-Lin Tang,
	James Norris, gcc-patches, Kirill Yukhin

On Tue, 19 Apr 2016, Jakub Jelinek wrote:
> On Tue, Apr 19, 2016 at 04:01:06PM +0200, Thomas Schwinge wrote:
> > Two other solutions have been proposed in the past months: Chung-Lin's
> > patches with subject: "Adjust offload plugin interface for avoiding
> > deadlock on exit", later: "Resolve libgomp plugin deadlock on exit",
> > later: "Resolve deadlock on plugin exit" (still pending review/approval),
> > and Alexander's much smaller patch with subject: "libgomp plugin: make
> > cuMemFreeHost error non-fatal",
> > <http://news.gmane.org/find-root.php?message_id=%3C1458323327-9908-4-git-send-email-amonakov%40ispras.ru%3E>.
> > (Both of which I have not reviewed in detail.)  Assuming that Chung-Lin's
> > patches are considered too invasive for gcc-6-branch, can we at least get
> > Alexander's patch committed to gcc-6-branch as well as on trunk, please?
> 
> Yeah, Alex' patch is IMHO fine, even for gcc-6-branch.

Applied to both.

Thanks.
Alexander

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: gomp_target_fini
  2016-04-21 13:43                                                 ` gomp_target_fini Alexander Monakov
@ 2016-04-21 15:38                                                   ` Thomas Schwinge
  0 siblings, 0 replies; 48+ messages in thread
From: Thomas Schwinge @ 2016-04-21 15:38 UTC (permalink / raw)
  To: gcc-patches
  Cc: Alexander Monakov, Jakub Jelinek, Bernd Schmidt, Ilya Verbin,
	Chung-Lin Tang, James Norris, gcc-patches, Kirill Yukhin

Hi!

On Thu, 21 Apr 2016 16:43:22 +0300, Alexander Monakov <amonakov@ispras.ru> wrote:
> On Tue, 19 Apr 2016, Jakub Jelinek wrote:
> > On Tue, Apr 19, 2016 at 04:01:06PM +0200, Thomas Schwinge wrote:
> > > [...] Alexander's much smaller patch with subject: "libgomp plugin: make
> > > cuMemFreeHost error non-fatal",
> > > <http://news.gmane.org/find-root.php?message_id=%3C1458323327-9908-4-git-send-email-amonakov%40ispras.ru%3E>.

> > Yeah, Alex' patch is IMHO fine, even for gcc-6-branch.
> 
> Applied to both.

Thanks!

Backported to gomp-4_0-branch in r235345:

commit 7e774a1bb94e2c5f17765342a59c6cb25e76c943
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Thu Apr 21 15:35:57 2016 +0000

    libgomp nvptx plugin: make cuMemFreeHost error non-fatal
    
    Backport trunk r235339:
    
    	libgomp/
    	2016-04-21  Alexander Monakov  <amonakov@ispras.ru>
    
    	* plugin/plugin-nvptx.c (map_fini): Make cuMemFreeHost error
    	non-fatal.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@235345 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libgomp/ChangeLog.gomp        | 8 ++++++++
 libgomp/plugin/plugin-nvptx.c | 2 +-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp
index 1c99026..23a8eef 100644
--- libgomp/ChangeLog.gomp
+++ libgomp/ChangeLog.gomp
@@ -1,3 +1,11 @@
+2016-04-21  Thomas Schwinge  <thomas@codesourcery.com>
+
+	Backport trunk r235339:
+	2016-04-21  Alexander Monakov  <amonakov@ispras.ru>
+
+	* plugin/plugin-nvptx.c (map_fini): Make cuMemFreeHost error
+	non-fatal.
+
 2016-04-08  Thomas Schwinge  <thomas@codesourcery.com>
 
 	PR testsuite/70579
diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index eea74d4..6b674c0 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -128,7 +128,7 @@ map_fini (struct ptx_stream *s)
 
   r = cuMemFreeHost (s->h);
   if (r != CUDA_SUCCESS)
-    GOMP_PLUGIN_fatal ("cuMemFreeHost error: %s", cuda_error (r));
+    GOMP_PLUGIN_error ("cuMemFreeHost error: %s", cuda_error (r));
 }
 
 static void


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [gomp4.5] Handle #pragma omp declare target link
  2015-12-14 17:18                     ` [gomp4.5] Handle #pragma omp declare target link Ilya Verbin
  2015-12-15  8:42                       ` Jakub Jelinek
  2015-12-16 16:21                       ` Thomas Schwinge
@ 2019-06-26 16:23                       ` Thomas Schwinge
  2 siblings, 0 replies; 48+ messages in thread
From: Thomas Schwinge @ 2019-06-26 16:23 UTC (permalink / raw)
  To: Ilya Verbin, Jakub Jelinek, Tom de Vries, Alexander Monakov
  Cc: gcc-patches, Kirill Yukhin

[-- Attachment #1: Type: text/plain, Size: 1989 bytes --]

Hi!

On Mon, 14 Dec 2015 20:17:33 +0300, Ilya Verbin <iverbin@gmail.com> wrote:
> Here is an updated patch [for "#pragma omp declare target link"]

..., that got committed long ago (trunk r231655), with additional changes
later on.

As has later been filed in PR81689, the test case added
"libgomp.c/target-link-1.c fails for nvptx: #pragma omp target link not
implemented".  Curious, has anybody ever looked into what's going
on/wrong?


Grüße
 Thomas


> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c/target-link-1.c
> @@ -0,0 +1,63 @@
> +struct S { int s, t; };
> +
> +int a = 1, b = 1;
> +double c[27];
> +struct S d = { 8888, 8888 };
> +#pragma omp declare target link (a) to (b) link (c, d)
> +
> +int
> +foo (void)
> +{
> +  return a++ + b++;
> +}
> +
> +int
> +bar (int n)
> +{
> +  int *p1 = &a;
> +  int *p2 = &b;
> +  c[n] += 2.0;
> +  d.s -= 2;
> +  d.t -= 2;
> +  return *p1 + *p2 + d.s + d.t;
> +}
> +
> +#pragma omp declare target (foo, bar)
> +
> +int
> +main ()
> +{
> +  a = b = 2;
> +  d.s = 17;
> +  d.t = 18;
> +
> +  int res, n = 10;
> +  #pragma omp target map (to: a, b, c, d) map (from: res)
> +  {
> +    res = foo () + foo ();
> +    c[n] = 3.0;
> +    res += bar (n);
> +  }
> +
> +  int shared_mem = 0;
> +  #pragma omp target map (alloc: shared_mem)
> +    shared_mem = 1;
> +
> +  if ((shared_mem && res != (2 + 2) + (3 + 3) + (4 + 4 + 15 + 16))
> +      || (!shared_mem && res != (2 + 1) + (3 + 2) + (4 + 3 + 15 + 16)))
> +    __builtin_abort ();
> +
> +  #pragma omp target enter data map (to: c)
> +  #pragma omp target update from (c)
> +  res = (int) (c[n] + 0.5);
> +  if ((shared_mem && res != 5) || (!shared_mem && res != 0))
> +    __builtin_abort ();
> +
> +  #pragma omp target map (to: a, b) map (from: res)
> +    res = foo ();
> +
> +  if ((shared_mem && res != 4 + 4) || (!shared_mem && res != 2 + 3))
> +    __builtin_abort ();
> +
> +  return 0;
> +}

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 658 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2019-06-26 16:23 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-17 13:43 [gomp4.1] Handle new form of #pragma omp declare target Jakub Jelinek
2015-07-17 15:48 ` James Norris
2015-10-26 18:45 ` Ilya Verbin
2015-10-26 19:11   ` Jakub Jelinek
2015-10-26 19:49     ` Ilya Verbin
2015-10-26 19:55       ` Jakub Jelinek
2015-11-16 15:41         ` [gomp4.5] Handle #pragma omp declare target link Ilya Verbin
2015-11-19 15:31           ` Jakub Jelinek
2015-11-27 16:51             ` Ilya Verbin
2015-11-30 12:08               ` Jakub Jelinek
2015-11-30 20:42                 ` Ilya Verbin
2015-11-30 20:55                   ` Jakub Jelinek
2015-11-30 21:38                     ` Ilya Verbin
2015-12-01  8:18                       ` Jakub Jelinek
2015-12-01  8:48                         ` Ilya Verbin
2015-12-01 13:16                           ` Jakub Jelinek
2015-12-01 17:30                             ` Ilya Verbin
2015-12-01 19:05                               ` Jakub Jelinek
2015-12-08 14:46                                 ` Ilya Verbin
2015-12-11 17:27                                   ` Jakub Jelinek
2015-12-11 17:46                                     ` Ilya Verbin
2015-12-14 16:48                                     ` Ilya Verbin
2015-12-16 12:30                                       ` gomp_target_fini (was: [gomp4.5] Handle #pragma omp declare target link) Thomas Schwinge
2015-12-23 11:05                                         ` gomp_target_fini Thomas Schwinge
2016-01-11 10:40                                           ` gomp_target_fini Thomas Schwinge
2016-01-21  6:17                                             ` gomp_target_fini Thomas Schwinge
2016-01-21 15:24                                         ` gomp_target_fini Bernd Schmidt
2016-01-22 10:16                                           ` gomp_target_fini Jakub Jelinek
2016-01-25 18:23                                             ` gomp_target_fini Mike Stump
2016-04-19 14:01                                             ` gomp_target_fini Thomas Schwinge
2016-04-19 14:04                                               ` gomp_target_fini Jakub Jelinek
2016-04-21 13:43                                                 ` gomp_target_fini Alexander Monakov
2016-04-21 15:38                                                   ` gomp_target_fini Thomas Schwinge
2016-04-19 15:23                                               ` gomp_target_fini Alexander Monakov
2015-12-14 17:18                     ` [gomp4.5] Handle #pragma omp declare target link Ilya Verbin
2015-12-15  8:42                       ` Jakub Jelinek
2015-12-16 16:21                       ` Thomas Schwinge
2016-01-07 18:57                         ` [gomp4] Fix use of declare'd vars by routine procedures James Norris
2016-01-11 11:55                           ` Thomas Schwinge
2016-01-11 15:38                             ` James Norris
2019-06-26 16:23                       ` [gomp4.5] Handle #pragma omp declare target link Thomas Schwinge
2015-10-27 21:15 ` [gomp4.1] Handle new form of #pragma omp declare target Ilya Verbin
2015-10-30 17:48   ` Ilya Verbin
2015-10-30 19:23     ` Jakub Jelinek
2015-11-02 16:54       ` Ilya Verbin
2015-11-02 18:01         ` Jakub Jelinek
2015-11-23 11:33 ` Thomas Schwinge
2015-11-23 11:41   ` Jakub Jelinek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).