public inbox for fortran@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH,openacc] check for compatible loop parallelism with acc routine calls
@ 2016-06-16  3:12 Cesar Philippidis
  2016-06-17 14:42 ` Jakub Jelinek
  2019-02-28 21:12 ` [PR72741] Encode OpenACC 'routine' directive inside Fortran module files Thomas Schwinge
  0 siblings, 2 replies; 14+ messages in thread
From: Cesar Philippidis @ 2016-06-16  3:12 UTC (permalink / raw)
  To: gcc-patches, Fortran List, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 2424 bytes --]

This patch addresses the following problems with acc routines:

 * incorrectly permitting 'acc seq' loops to call gang, worker and
   vector routines

 * lto-wrapper errors when a function or subroutine isn't marked as
   'acc routine'

The solution to the first problem is straightforward. It only required a
small change to oacc_loop_fixed_partitions. The solution to the second
problem is more involved, since it required changes to the fortran FE,
gimplifier, the behavior of flag_generate_offload, and libgomp.

Starting with the the fortran changes, this patch updates the way that
the fortran FE handles the 'acc routine' attribute in modules. Before,
it only recorded that a function was marked as an acc routine. With this
patch, it now records the level of parallelism the routine has. This is
necessary for the middle end to validate compatible parallelism between
the loop calling the routine and the routine itself.

The second set of changes involves teaching the gimplifier to error when
it detects a function call to an non-acc routines inside an OpenACC
offloaded region. Actually, I relaxed non-acc routines by excluding
calls to builtin functions, including those prefixed with _gfortran_.
Nvptx does have a newlib c library, and it also has a subset of
libgfortran. Still, this solution is probably not optimal.

Next, I had to modify the openacc header files in libgomp to mark
acc_on_device as an acc routine. Unfortunately, this meant that I had to
build the opeancc.mod module for gfortran with -fopenacc. But doing
that, caused caused gcc to stream offloaded code to the openacc.o object
file. So, I've updated the behavior of flag_generate_offload such that
minus one indicates that the user specified -foffload=disable, and that
will prevent gcc from streaming offloaded lto code. The alternative was
to hack libtool to build libgomp with -foffload=disable.

Is this patch OK for trunk?

There are still a couple of other quirks with routines we'll need to
address with a follow up patch. Namely, passing scalar dummy arguments
causes to subroutines trips up the nvptx worker and vector state
propagator if the actual argument is a local variable. That's because
the nvptx state propagator only forwards the pointer to the worker and
vector threads, and not the actual variable itself. Consequently, those
pointers dereference garbage. This is a problem with pass-by-reference
in general.

Cesar


[-- Attachment #2: acc-subroutines-20160615.diff --]
[-- Type: text/x-patch, Size: 24095 bytes --]

2016-06-15  Cesar Philippidis  <cesar@codesourcery.com>

	gcc/
	* cgraphunit.c (ipa_passes): Only stream offloaded code when
	flag_generate_offload is positive.
	(symbol_table::compile): Likewise.
	* common.opt (flag_generate_offload): Update comment on its usage.
	* gimplify.c (gimplify_call_expr): Verify that function calls inside
	OpenACC offloaded regions are 'acc routines'.
	* ipa-inline-analysis.c (inline_generate_summary): Update the usage of
	flag_generate_offload.
	* lto-streamer.c (gate_lto_out): Likewise.
	* omp-low.c (oacc_loop_fixed_partitions): Consider SEQ loop when
	validing loop parallelism restrictions.
	* opts.c (common_handle_option): Set x_flag_generate_offload to minus
	one with -foffload=disable.
	* passes.c (ipa_write_summaries): Update usage of flag_generate_offload.
	* toplev.c (compile_file): Likewise.
	* tree.c (free_lang_data):  Likewise.

	gcc/fortran/
	* gfortran.h (enum oacc_function): New enum.
	* module.c (oacc_function): New DECIO_MIO_NAME.
	(mio_symbol_attribute): Handle oacc_function attributes.
	* openmp.c (gfc_oacc_routine_dims): Use enum oacc_function to capture
	acc routine geometry.
	(gfc_match_oacc_routine): Update call to gfc_oacc_routine_dims.
	* symbol.c (oacc_function_types): New const mstring.
	* trans-decl.c (add_attributes_to_decl): Update handling of
	oacc_function.

	gcc/testsuite/
	* c-c++-common/goacc/routine-3.c: Add test coverage for seq loops.
	* c-c++-common/goacc/routine-6.c: New test.
	* gfortran.dg/goacc/routine-7.f90: New test.
	* gfortran.dg/goacc/routine-8.f90: New test.

	libgomp/
	* Makefile.am (openacc.lo): New target.
	(openacc.mod): Build with -fopenacc -foffload=disable.
	* Makefile.in: Regenerate.
	* openacc.f90 (function_on_device_h): Make 'acc routine seq'.
	* openacc.h (acc_on_device): Likewise.
	* openacc_lib.h (acc_on_device): Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-4.c: Filter out warning.
	* testsuite/libgomp.oacc-fortran/routine-7.f90: Update test case to
	properly utilize acc parallelism.

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index 4bfcad7..5dd211c 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -2292,12 +2292,12 @@ ipa_passes (void)
     }
 
   /* Some targets need to handle LTO assembler output specially.  */
-  if (flag_generate_lto || flag_generate_offload)
+  if (flag_generate_lto || flag_generate_offload > 0)
     targetm.asm_out.lto_start ();
 
   if (!in_lto_p)
     {
-      if (g->have_offload)
+      if (g->have_offload && flag_generate_offload > 0)
 	{
 	  section_name_prefix = OFFLOAD_SECTION_NAME_PREFIX;
 	  lto_stream_offload_p = true;
@@ -2312,7 +2312,7 @@ ipa_passes (void)
 	}
     }
 
-  if (flag_generate_lto || flag_generate_offload)
+  if (flag_generate_lto || flag_generate_offload > 0)
     targetm.asm_out.lto_end ();
 
   if (!flag_ltrans && (in_lto_p || !flag_lto || flag_fat_lto_objects))
@@ -2393,11 +2393,11 @@ symbol_table::compile (void)
   state = IPA;
 
   /* Offloading requires LTO infrastructure.  */
-  if (!in_lto_p && g->have_offload)
+  if (!in_lto_p && g->have_offload && flag_generate_offload >= 0)
     flag_generate_offload = 1;
 
   /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE.  */
-  if (flag_generate_lto || flag_generate_offload)
+  if (flag_generate_lto || flag_generate_offload > 0)
     lto_streamer_hooks_init ();
 
   /* Don't run the IPA passes if there was any error or sorry messages.  */
diff --git a/gcc/common.opt b/gcc/common.opt
index f0d7196..9560e08 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -73,7 +73,8 @@ int *param_values
 Variable
 int flag_generate_lto
 
-; Nonzero if we should write GIMPLE bytecode for offload compilation.
+; Positive if we should write GIMPLE bytecode for offload compilation.
+; Negative if the user explicitly passed -foffload=disable.
 Variable
 int flag_generate_offload = 0
 
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 0bb71cb..fac94ca 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -303,6 +303,15 @@ enum save_state
 { SAVE_NONE = 0, SAVE_EXPLICIT, SAVE_IMPLICIT
 };
 
+/* Flags to keep track of ACC routine states.  */
+enum oacc_function
+{ OACC_FUNCTION_NONE = 0,
+  OACC_FUNCTION_SEQ,
+  OACC_FUNCTION_GANG,
+  OACC_FUNCTION_WORKER,
+  OACC_FUNCTION_VECTOR
+};
+
 /* Strings for all symbol attributes.  We use these for dumping the
    parse tree, in error messages, and also when reading and writing
    modules.  In symbol.c.  */
@@ -312,6 +321,7 @@ extern const mstring intents[];
 extern const mstring access_types[];
 extern const mstring ifsrc_types[];
 extern const mstring save_status[];
+extern const mstring oacc_function_types[];
 
 /* Enumeration of all the generic intrinsic functions.  Used by the
    backend for identification of a function.  */
@@ -862,7 +872,7 @@ typedef struct
   unsigned oacc_declare_link:1;
 
   /* This is an OpenACC acclerator function at level N - 1  */
-  unsigned oacc_function:3;
+  ENUM_BITFIELD (oacc_function) oacc_function:3;
 
   /* Attributes set by compiler extensions (!GCC$ ATTRIBUTES).  */
   unsigned ext_attr:EXT_ATTR_NUM;
diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 6d3860e..e3ed2a0 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -2095,6 +2095,7 @@ DECL_MIO_NAME (procedure_type)
 DECL_MIO_NAME (ref_type)
 DECL_MIO_NAME (sym_flavor)
 DECL_MIO_NAME (sym_intent)
+DECL_MIO_NAME (oacc_function)
 #undef DECL_MIO_NAME
 
 /* Symbol attributes are stored in list with the first three elements
@@ -2116,6 +2117,8 @@ mio_symbol_attribute (symbol_attribute *attr)
   attr->proc = MIO_NAME (procedure_type) (attr->proc, procedures);
   attr->if_source = MIO_NAME (ifsrc) (attr->if_source, ifsrc_types);
   attr->save = MIO_NAME (save_state) (attr->save, save_status);
+  attr->oacc_function = MIO_NAME (oacc_function) (attr->oacc_function,
+						  oacc_function_types);
 
   ext_attr = attr->ext_attr;
   mio_integer ((int *) &ext_attr);
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 2c92794..96fc2fd 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1716,21 +1716,31 @@ gfc_match_oacc_cache (void)
 
 /* Determine the loop level for a routine.   */
 
-static int
+static oacc_function
 gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
 {
   int level = -1;
+  oacc_function ret = OACC_FUNCTION_SEQ;
 
   if (clauses)
     {
       unsigned mask = 0;
 
       if (clauses->gang)
-	level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
+	{
+	  level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
+	  ret = OACC_FUNCTION_GANG;
+	}
       if (clauses->worker)
-	level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
+	{
+	  level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
+	  ret = OACC_FUNCTION_WORKER;
+	}
       if (clauses->vector)
-	level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
+	{
+	  level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
+	  ret = OACC_FUNCTION_VECTOR;
+	}
       if (clauses->seq)
 	level = GOMP_DIM_MAX, mask |= GOMP_DIM_MASK (level);
 
@@ -1741,7 +1751,7 @@ gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
   if (level < 0)
     level = GOMP_DIM_MAX;
 
-  return level;
+  return ret;
 }
 
 match
@@ -1834,7 +1844,7 @@ gfc_match_oacc_routine (void)
 				       &old_loc))
 	goto cleanup;
       gfc_current_ns->proc_name->attr.oacc_function
-	= gfc_oacc_routine_dims (c) + 1;
+	= gfc_oacc_routine_dims (c);
     }
 
   if (n)
diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index 0ee7dec..b1dd32b 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -87,6 +87,15 @@ const mstring save_status[] =
     minit ("IMPLICIT-SAVE", SAVE_IMPLICIT),
 };
 
+const mstring oacc_function_types[] =
+{
+  minit ("NONE", OACC_FUNCTION_NONE),
+  minit ("OACC_FUNCTION_SEQ", OACC_FUNCTION_SEQ),
+  minit ("OACC_FUNCTION_GANG", OACC_FUNCTION_GANG),
+  minit ("OACC_FUNCTION_WORKER", OACC_FUNCTION_WORKER),
+  minit ("OACC_FUNCTION_VECTOR", OACC_FUNCTION_VECTOR)
+};
+
 /* This is to make sure the backend generates setup code in the correct
    order.  */
 
diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index 2f5e434..0b8d638 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -1331,7 +1331,22 @@ add_attributes_to_decl (symbol_attribute sym_attr, tree list)
     {
       tree dims = NULL_TREE;
       int ix;
-      int level = sym_attr.oacc_function - 1;
+      int level = GOMP_DIM_MAX;
+
+      switch (sym_attr.oacc_function)
+	{
+	case OACC_FUNCTION_GANG:
+	  level = GOMP_DIM_GANG;
+	  break;
+	case OACC_FUNCTION_WORKER:
+	  level = GOMP_DIM_WORKER;
+	  break;
+	case OACC_FUNCTION_VECTOR:
+	  level = GOMP_DIM_VECTOR;
+	  break;
+	case OACC_FUNCTION_SEQ:
+	default:;
+	}
 
       for (ix = GOMP_DIM_MAX; ix--;)
 	dims = tree_cons (build_int_cst (boolean_type_node, ix >= level),
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index ae8b4fc..6a9ab3c 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -2697,6 +2697,26 @@ gimplify_call_expr (tree *expr_p, gimple_seq *pre_p, bool want_value)
     CALL_EXPR_FN (*expr_p) = build1 (NOP_EXPR, fnptrtype,
 				     CALL_EXPR_FN (*expr_p));
 
+  /* Check if this function is being called from inside an OpenACC
+     offloaded region.  If so, verify that this function has been
+     declared as an 'acc routine'.  Defer loop parallelism geometry
+     checking until oacc_device_lower.  */
+
+  const char *name = fndecl == NULL_TREE ? ""
+    : IDENTIFIER_POINTER (DECL_NAME (fndecl));
+  gimplify_omp_ctx *ctx = gimplify_omp_ctxp;
+  if (ctx != NULL && (ctx->region_type == ORT_ACC
+		      || ctx->region_type == ORT_ACC_PARALLEL
+		      || ctx->region_type == ORT_ACC_KERNELS)
+      && !is_builtin_fn (fndecl)
+      && !(lang_GNU_Fortran () && strncmp (name, "_gfortran_", 10) == 0)
+      && get_oacc_fn_attrib (fndecl) == NULL_TREE)
+    {
+      error ("%qE is not an %<acc routine%>", fndecl);
+      CALL_EXPR_FN (*expr_p) = NULL_TREE;
+      return GS_OK;
+    }
+
   return ret;
 }
 
diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c
index 5d67218..c37a5ab 100644
--- a/gcc/ipa-inline-analysis.c
+++ b/gcc/ipa-inline-analysis.c
@@ -4179,7 +4179,8 @@ inline_generate_summary (void)
 
   /* When not optimizing, do not bother to analyze.  Inlining is still done
      because edge redirection needs to happen there.  */
-  if (!optimize && !flag_generate_lto && !flag_generate_offload && !flag_wpa)
+  if (!optimize && !flag_generate_lto && flag_generate_offload <= 0
+      && !flag_wpa)
     return;
 
   if (!inline_summaries)
diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c
index bfde1fe..e1848be 100644
--- a/gcc/lto-streamer.c
+++ b/gcc/lto-streamer.c
@@ -309,7 +309,7 @@ lto_streamer_init (void)
 bool
 gate_lto_out (void)
 {
-  return ((flag_generate_lto || flag_generate_offload || in_lto_p)
+  return ((flag_generate_lto || flag_generate_offload > 0 || in_lto_p)
 	  /* Don't bother doing anything if the program has errors.  */
 	  && !seen_error ());
 }
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 22e5909..7824048 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -19420,7 +19420,8 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask)
     {
       unsigned outermost = this_mask & -this_mask;
 
-      if (outermost && outermost <= outer_mask)
+      if ((outermost && outermost <= outer_mask)
+	  || (this_mask && (loop->parent->flags & OLF_SEQ)))
 	{
 	  if (noisy)
 	    {
diff --git a/gcc/opts.c b/gcc/opts.c
index e80331f..eec91f8 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -1930,6 +1930,7 @@ common_handle_option (struct gcc_options *opts,
 		&& (p[7] == ',' || p[7] == '\0'))
 	      {
 		opts->x_flag_disable_hsa = true;
+		opts->x_flag_generate_offload = -1;
 		break;
 	      }
 
diff --git a/gcc/passes.c b/gcc/passes.c
index 0565cfa..9c1b902 100644
--- a/gcc/passes.c
+++ b/gcc/passes.c
@@ -2531,7 +2531,7 @@ ipa_write_summaries (void)
   struct cgraph_node *node;
   struct cgraph_node **order;
 
-  if ((!flag_generate_lto && !flag_generate_offload) || seen_error ())
+  if ((!flag_generate_lto && flag_generate_offload <= 0) || seen_error ())
     return;
 
   select_what_to_stream ();
diff --git a/gcc/testsuite/c-c++-common/goacc/routine-3.c b/gcc/testsuite/c-c++-common/goacc/routine-3.c
index b322d26..fabae1f 100644
--- a/gcc/testsuite/c-c++-common/goacc/routine-3.c
+++ b/gcc/testsuite/c-c++-common/goacc/routine-3.c
@@ -49,7 +49,7 @@ main ()
   int red = 0;
 #pragma acc parallel copy (red)
   {
-    /* Independent/seq loop tests.  */
+    /* Independent loop tests.  */
 #pragma acc loop reduction (+:red) // { dg-warning "insufficient partitioning" }
     for (int i = 0; i < 10; i++)
       red += gang ();
@@ -62,6 +62,19 @@ main ()
     for (int i = 0; i < 10; i++)
       red += vector ();
 
+    /* Seq loop tests.  */
+#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += gang (); /* { dg-error "incorrectly nested" } */
+
+#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += worker (); /* { dg-error "incorrectly nested" } */
+
+#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += vector (); /* { dg-error "incorrectly nested" } */
+    
     /* Gang routine tests.  */
 #pragma acc loop gang reduction (+:red)  /* { dg-message "containing loop" } */
     for (int i = 0; i < 10; i++)
diff --git a/gcc/testsuite/c-c++-common/goacc/routine-6.c b/gcc/testsuite/c-c++-common/goacc/routine-6.c
new file mode 100644
index 0000000..e95954b
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/routine-6.c
@@ -0,0 +1,26 @@
+/* Test calls to non-routines.  */
+
+int
+sum (int a, int b)
+{
+  return a + b;
+}
+
+int
+main ()
+{
+  int c = 0, i;
+
+#pragma acc parallel loop reduction(+:c)
+  for (i = 0; i < 100; i++)
+    c += sum (i, i); /* { dg-error "'sum' is not an 'acc routine'" } */
+
+  /* Built-in functions are permitted.  */
+#pragma acc parallel
+  {
+    if (c < 0)
+      __builtin_abort ();
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-7.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-7.f90
new file mode 100644
index 0000000..27b08b5
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-7.f90
@@ -0,0 +1,25 @@
+! Test calls to non-acc routines.
+
+program test
+  implicit none
+  integer c, i
+
+  c = 0
+
+  !$acc parallel loop reduction(+:c)
+  do i = 0, 100
+     c = c + sum (i, i) ! { dg-error "'sum' is not an 'acc routine'" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel
+  if (c .le. 0) call abort
+  !$acc end parallel
+  
+contains
+  integer function sum(a, b)
+    integer a, b
+    sum = a + b
+  end function sum
+  
+end program test
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-8.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-8.f90
new file mode 100644
index 0000000..d2cb51a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-8.f90
@@ -0,0 +1,122 @@
+! Check routine calls with insufficient parallelism.
+
+! { dg-do compile }
+! { dg-additional-options "-cpp -O0" }
+
+#define M 8
+#define N 32
+
+program main
+  integer :: i
+  integer :: a(N)
+  integer :: b(M * N)
+
+  do i = 1, N
+    a(i) = 0
+  end do
+
+  !$acc parallel copy (a)
+  !$acc loop seq
+    do i = 1, N
+      call seq (a)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne.N) call abort
+  end do
+
+  !$acc parallel copy (a)
+  !$acc loop seq ! { dg-message "containing loop here" }
+    do i = 1, N 
+      call gang (a) ! { dg-error "incorrectly nested OpenACC loop parallelism" }
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. (N + (N * (-1 * i)))) call abort
+  end do
+
+  do i = 1, N
+    b(i) = i
+  end do
+
+  !$acc parallel copy (b)
+  !$acc loop seq ! { dg-message "containing loop here" }
+    do i = 1, N
+      call worker (b) ! { dg-error "incorrectly nested OpenACC loop parallelism" }
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. N + i) call abort
+  end do
+
+  do i = 1, N
+    a(i) = i
+  end do
+
+  !$acc parallel copy (a)
+  !$acc loop seq ! { dg-message "containing loop here" }
+    do i = 1, N
+      call vector (a) ! { dg-error "incorrectly nested OpenACC loop parallelism" }
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 0) call abort
+  end do
+
+contains
+
+subroutine vector (a)
+  !$acc routine vector
+  integer, intent (inout) :: a(N)
+  integer :: i
+
+  !$acc loop vector
+  do i = 1, N
+    a(i) = a(i) - a(i) 
+  end do
+
+end subroutine vector
+
+subroutine worker (b)
+  !$acc routine worker
+  integer, intent (inout) :: b(M*N)
+  integer :: i, j
+
+  !$acc loop worker
+  do i = 1, N
+  !$acc loop vector
+    do j = 1, M
+      b(j + ((i - 1) * M)) = b(j + ((i - 1) * M)) + 1
+    end do
+  end do
+
+end subroutine worker
+
+subroutine gang (a)
+  !$acc routine gang
+  integer, intent (inout) :: a(N)
+  integer :: i
+
+  !$acc loop gang
+  do i = 1, N
+    a(i) = a(i) - i 
+  end do
+
+end subroutine gang
+
+subroutine seq (a)
+  !$acc routine seq
+  integer, intent (inout) :: a(M)
+  integer :: i
+
+  do i = 1, N
+    a(i) = a(i) + 1
+  end do
+
+end subroutine seq
+
+end program main
diff --git a/gcc/toplev.c b/gcc/toplev.c
index f51d2cb..47d8e2e 100644
--- a/gcc/toplev.c
+++ b/gcc/toplev.c
@@ -556,7 +556,7 @@ compile_file (void)
      We used to emit an undefined reference here, but this produces
      link errors if an object file with IL is stored into a shared
      library without invoking lto1.  */
-  if (flag_generate_lto || flag_generate_offload)
+  if (flag_generate_lto || flag_generate_offload > 0)
     {
 #if defined ASM_OUTPUT_ALIGNED_DECL_COMMON
       ASM_OUTPUT_ALIGNED_DECL_COMMON (asm_out_file, NULL_TREE,
diff --git a/gcc/tree.c b/gcc/tree.c
index fd0e692..e6712c2 100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -6000,7 +6000,7 @@ free_lang_data (void)
 
   /* If we are the LTO frontend we have freed lang-specific data already.  */
   if (in_lto_p
-      || (!flag_generate_lto && !flag_generate_offload))
+      || (!flag_generate_lto && flag_generate_offload <= 0))
     return 0;
 
   /* Allocate and assign alias sets to the standard integer types
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index a3e1c2b..085478b 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -87,8 +87,10 @@ omp_lib_kinds.mod: omp_lib.mod
 	:
 openacc_kinds.mod: openacc.mod
 	:
-openacc.mod: openacc.lo
-	:
+openacc.lo: openacc.f90
+	$(LTFCCOMPILE) -fopenacc -foffload=disable -c -o $@ $^
+openacc.mod: openacc.f90
+	$(FC) $(FCFLAGS) -fopenacc -foffload=disable -c $<
 %.mod: %.f90
 	$(FC) $(FCFLAGS) -fsyntax-only $<
 fortran.lo: libgomp_f.h
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 88c8517..baf0f8d 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -1286,8 +1286,10 @@ omp_lib_kinds.mod: omp_lib.mod
 	:
 openacc_kinds.mod: openacc.mod
 	:
-openacc.mod: openacc.lo
-	:
+openacc.lo: openacc.f90
+	$(LTFCCOMPILE) -fopenacc -foffload=disable -c -o $@ $^
+openacc.mod: openacc.f90
+	$(FC) $(FCFLAGS) -fopenacc -foffload=disable -c $<
 %.mod: %.f90
 	$(FC) $(FCFLAGS) -fsyntax-only $<
 fortran.lo: libgomp_f.h
diff --git a/libgomp/openacc.f90 b/libgomp/openacc.f90
index 4b71489..98ba493 100644
--- a/libgomp/openacc.f90
+++ b/libgomp/openacc.f90
@@ -128,6 +128,7 @@ module openacc_internal
 
     function acc_on_device_h (d)
       import
+      !$acc routine seq
       integer (acc_device_kind) d
       logical acc_on_device_h
     end function
@@ -719,6 +720,7 @@ end subroutine
 function acc_on_device_h (d)
   use openacc_internal, only: acc_on_device_l
   use openacc_kinds
+  !$acc routine seq
   integer (acc_device_kind) d
   logical acc_on_device_h
   if (acc_on_device_l (d) .eq. 1) then
diff --git a/libgomp/openacc.h b/libgomp/openacc.h
index 7ea8794..094db50 100644
--- a/libgomp/openacc.h
+++ b/libgomp/openacc.h
@@ -83,6 +83,9 @@ void acc_shutdown (acc_device_t) __GOACC_NOTHROW;
 #ifdef __cplusplus
 int acc_on_device (int __arg) __GOACC_NOTHROW;
 #else
+#ifdef _OPENACC
+#pragma acc routine seq
+#endif
 int acc_on_device (acc_device_t __arg) __GOACC_NOTHROW;
 #endif
 void *acc_malloc (size_t) __GOACC_NOTHROW;
diff --git a/libgomp/openacc_lib.h b/libgomp/openacc_lib.h
index a3f94d7..d627857 100644
--- a/libgomp/openacc_lib.h
+++ b/libgomp/openacc_lib.h
@@ -142,6 +142,7 @@
       interface acc_on_device
         function acc_on_device_h (devicetype)
           import acc_device_kind
+!$acc routine seq
           logical acc_on_device_h
           integer (acc_device_kind) devicetype
         end function
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c
index d6ff44d..02b1f15 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c
@@ -78,7 +78,7 @@ main(int argc, char **argv)
 
 #pragma acc parallel copy (a[0:N])
   {
-#pragma acc loop seq
+#pragma acc loop /* { dg-warning "insufficient partitioning" } */
     for (i = 0; i < N; i++)
       gang (&a[0]);
   }
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90 b/libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90
index 200188e..27cda44 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90
@@ -1,121 +1,101 @@
+! Test acc routines.
 
 ! { dg-do run }
-! { dg-additional-options "-cpp" }
 
-#define M 8
-#define N 32
+module size
+  integer, parameter :: N = 32
+end module size
 
 program main
+  use size
+  implicit none
+
   integer :: i
   integer :: a(N)
-  integer :: b(M * N)
 
-  do i = 1, N
-    a(i) = 0
-  end do
-
-  !$acc parallel copy (a)
-  !$acc loop seq
-    do i = 1, N
-      call seq (a)
-    end do
+  !$acc parallel
+  call seq (a)
   !$acc end parallel
 
   do i = 1, N
-    if (a(i) .ne.N) call abort
+    if (a(i) .ne. 4) call abort
   end do
 
-  !$acc parallel copy (a)
-  !$acc loop seq
-    do i = 1, N 
-      call gang (a)
-    end do
+  !$acc parallel
+  call gang (a)
   !$acc end parallel
 
   do i = 1, N
-    if (a(i) .ne. (N + (N * (-1 * i)))) call abort
+    if (a(i) .ne. 3) call abort
   end do
 
-  do i = 1, N
-    b(i) = i
-  end do
-
-  !$acc parallel copy (b)
-  !$acc loop seq
-    do i = 1, N
-      call worker (b)
-    end do
+  !$acc parallel
+  call worker (a)
   !$acc end parallel
 
   do i = 1, N
-    if (b(i) .ne. N + i) call abort
-  end do
-
-  do i = 1, N
-    a(i) = i
+    if (a(i) .ne. 2) call abort
   end do
 
-  !$acc parallel copy (a)
-  !$acc loop seq
-    do i = 1, N
-      call vector (a)
-    end do
+  !$acc parallel
+  call vector (a)
   !$acc end parallel
 
   do i = 1, N
-    if (a(i) .ne. 0) call abort
+    if (a(i) .ne. 1) call abort
   end do
 
 contains
 
 subroutine vector (a)
+  use size
+  implicit none
   !$acc routine vector
   integer, intent (inout) :: a(N)
   integer :: i
 
   !$acc loop vector
   do i = 1, N
-    a(i) = a(i) - a(i) 
+    a(i) = 1
   end do
-
 end subroutine vector
 
-subroutine worker (b)
+subroutine worker (a)
+  use size
+  implicit none
   !$acc routine worker
-  integer, intent (inout) :: b(M*N)
-  integer :: i, j
+  integer, intent (inout) :: a(N)
+  integer :: i
 
   !$acc loop worker
   do i = 1, N
-  !$acc loop vector
-    do j = 1, M
-      b(j + ((i - 1) * M)) = b(j + ((i - 1) * M)) + 1
-    end do
+    a(i) = 2
   end do
-
 end subroutine worker
 
 subroutine gang (a)
+  use size
+  implicit none
   !$acc routine gang
   integer, intent (inout) :: a(N)
   integer :: i
 
   !$acc loop gang
   do i = 1, N
-    a(i) = a(i) - i 
+    a(i) = 3
   end do
-
 end subroutine gang
 
 subroutine seq (a)
+  use size
+  implicit none
   !$acc routine seq
-  integer, intent (inout) :: a(M)
+  integer, intent (inout) :: a(N)
   integer :: i
 
   do i = 1, N
-    a(i) = a(i) + 1
+    a(i) = 4
   end do
-
 end subroutine seq
 
 end program main

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls
  2016-06-16  3:12 [PATCH,openacc] check for compatible loop parallelism with acc routine calls Cesar Philippidis
@ 2016-06-17 14:42 ` Jakub Jelinek
  2016-06-23 16:05   ` Cesar Philippidis
  2019-02-28 21:12 ` [PR72741] Encode OpenACC 'routine' directive inside Fortran module files Thomas Schwinge
  1 sibling, 1 reply; 14+ messages in thread
From: Jakub Jelinek @ 2016-06-17 14:42 UTC (permalink / raw)
  To: Cesar Philippidis; +Cc: gcc-patches, Fortran List

On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote:
> The second set of changes involves teaching the gimplifier to error when
> it detects a function call to an non-acc routines inside an OpenACC
> offloaded region. Actually, I relaxed non-acc routines by excluding
> calls to builtin functions, including those prefixed with _gfortran_.
> Nvptx does have a newlib c library, and it also has a subset of
> libgfortran. Still, this solution is probably not optimal.

I don't really like that, hardcoding prefixes or whatever is available
(you have quite some subset of libc, libm etc. available too) in the
compiler looks very hackish.  What is wrong with complaining during
linking of the offloaded code?

> Next, I had to modify the openacc header files in libgomp to mark
> acc_on_device as an acc routine. Unfortunately, this meant that I had to
> build the opeancc.mod module for gfortran with -fopenacc. But doing
> that, caused caused gcc to stream offloaded code to the openacc.o object
> file. So, I've updated the behavior of flag_generate_offload such that
> minus one indicates that the user specified -foffload=disable, and that
> will prevent gcc from streaming offloaded lto code. The alternative was
> to hack libtool to build libgomp with -foffload=disable.

This also looks wrong.  I'd say the right thing is when loading modules
that have OpenACC bits set in it (and also OpenMP bits, I admit I haven't
handled this well) into CU with the corresponding flags unset (-fopenacc,
-fopenmp, -fopenmp-simd here, depending on which bit it is), then
IMHO the module loading code should just ignore it, pretend it wasn't there.
Similarly e.g. to how lto1 with -g0 should ignore debug statements that
could be in the LTO inputs.

	Jakub

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls
  2016-06-17 14:42 ` Jakub Jelinek
@ 2016-06-23 16:05   ` Cesar Philippidis
  2016-06-29 14:11     ` Thomas Schwinge
  0 siblings, 1 reply; 14+ messages in thread
From: Cesar Philippidis @ 2016-06-23 16:05 UTC (permalink / raw)
  To: Jakub Jelinek, Thomas Schwinge; +Cc: gcc-patches, Fortran List

[-- Attachment #1: Type: text/plain, Size: 3248 bytes --]

On 06/17/2016 07:42 AM, Jakub Jelinek wrote:
> On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote:
>> The second set of changes involves teaching the gimplifier to error when
>> it detects a function call to an non-acc routines inside an OpenACC
>> offloaded region. Actually, I relaxed non-acc routines by excluding
>> calls to builtin functions, including those prefixed with _gfortran_.
>> Nvptx does have a newlib c library, and it also has a subset of
>> libgfortran. Still, this solution is probably not optimal.
> 
> I don't really like that, hardcoding prefixes or whatever is available
> (you have quite some subset of libc, libm etc. available too) in the
> compiler looks very hackish.  What is wrong with complaining during
> linking of the offloaded code?

Wouldn't the error get reported multiple times then, i.e. once per
target? Then again, maybe this error could have been restrained to the
host compiler.

Anyway, this patch now reduces that error to a warning. Furthermore,
that warning is being thrown in lower_omp_1 instead of
gimplify_call_expr because the latter is called multiple times and that
causes duplicate warnings. The only bit of fallout I had with this
change was with the fortran FE's usage of BUILT_IN_EXPECT in
gfc_{un}likely. Since these are generated implicitly by the FE, I just
added an oacc_function attribute to those calls when flag_openacc is set.

>> Next, I had to modify the openacc header files in libgomp to mark
>> acc_on_device as an acc routine. Unfortunately, this meant that I had to
>> build the opeancc.mod module for gfortran with -fopenacc. But doing
>> that, caused caused gcc to stream offloaded code to the openacc.o object
>> file. So, I've updated the behavior of flag_generate_offload such that
>> minus one indicates that the user specified -foffload=disable, and that
>> will prevent gcc from streaming offloaded lto code. The alternative was
>> to hack libtool to build libgomp with -foffload=disable.
> 
> This also looks wrong.  I'd say the right thing is when loading modules
> that have OpenACC bits set in it (and also OpenMP bits, I admit I haven't
> handled this well) into CU with the corresponding flags unset (-fopenacc,
> -fopenmp, -fopenmp-simd here, depending on which bit it is), then
> IMHO the module loading code should just ignore it, pretend it wasn't there.
> Similarly e.g. to how lto1 with -g0 should ignore debug statements that
> could be in the LTO inputs.

This required two changes. First, I had to teach lto-cgraph.c how to
report an error rather then fail an assert when partitions are missing
decls. Second, I taught the lto wrapper how to stream offloaded code on
the absence of -fopen*. The only kink with this approach is that I had
to build libgomp/openacc.f90 with -frandom-seed=1 to prevent lto related
bootstrap failures.

By the way, Thomas, I've added

 #pragma acc routine(__builtin_acc_on_device) seq

to openacc.h. Is this OK, or should I just modify the various
libgomp.oacc-c-c++-common/loop* tests to use that pragma directly? Or
another option is to have the compiler add that attribute directly. I
don't think we're really expecting the end user to use
__builtin_acc_on_device directly since this is a gcc-ism.

Cesar

[-- Attachment #2: acc-subroutines-20160623.diff --]
[-- Type: text/x-patch, Size: 51988 bytes --]

2016-06-23  Cesar Philippidis  <cesar@codesourcery.com>

	gcc/
	* lto-cgraph.c (input_overwrite_node): Error on missing symbols.
	(input_varpool_node): Likewise.
	* lto-wrapper.c (compile_images_for_offload_targets): Don't stream
	offloaded images without -fopenacc, -fopenmp or -fopenmp-simd.
	(run_gcc): Set flag_openacc, flag_openmp, and flag_openmp_simd.
	* omp-low.c (lower_omp_1): Emit a warning when calling a function
	that doesn't have an oacc_function attribute from an OpenACC offloaded
	region.
	(oacc_loop_fixed_partitions): Consider SEQ loops when checking
	parallelism.

	gcc/fortran/
	* gfortran.h (enum oacc_function): New enum.
	(oacc_function_types): Declare.
	(symbol_attribute): Add oacc_function field.
	(gfc_intrinsic_sym): Likewise.
	(add_omp_offloading_attributes): Declare.
	* intrinsic.c (add_sym): Initialize oacc_fuction to zero.
	(gfc_intrinsic_sub_interface): Set attr.oacc_function as to
	OACC_FUNCTION_SEQ in the resolved symbol when appropriate.
	* module.c (oacc_function): New DECL_MIO_NAME.
	(mio_symbol_attribute): Set attr->oacc_function.
	* openmp.c (gfc_oacc_routine_dims): Change return type to oacc_function.
	(gfc_match_oacc_routine): Permit named 'acc routine' directives on
	intrinsic procedures.  Update call to gfc_oacc_routine_dims.
	* symbol.c (oacc_function_types): Define.
	* trans-decl.c (add_omp_offloading_attributes): New function.
	(add_attributes_to_decl): Use it.
	* trans.c (gfc_unlikely): Mark calls BUILT_IN_EXPECT as 'acc routines'
	with flag_openacc is set.
	(gfc_likely): Likewise.

	gcc/testsuite/
	* c-c++-common/goacc/kernels-1.c: Add warnings to calls to
	__builtin_abort.
	* c-c++-common/goacc/parallel-1.c: Likewise.
	* c-c++-common/goacc/routine-3.c: Add coverage for acc seq loops.
	* c-c++-common/goacc/routine-6.c: New test.
	* gfortran.dg/goacc/fixed-1.f: Mark abort as an 'acc routine'.
	* gfortran.dg/goacc/routine-7.f90: New test.
	* gfortran.dg/goacc/routine-8.f90: New test.

	libgomp/
	* Makefile.am (openacc.lo): New rule.
	(openacc.mod): Build with -fopenacc -frandom-seed=1.
	* Makefile.in: Regenerate.
	* openacc.f90 (acc_on_device_h): Mark as 'acc routine seq'.
	(acc_on_device_l): Likewise.
	* openacc.h (acc_on_device): Mark as 'acc routine seq'.
	(__builtin_acc_on_device): New declaration. Mark as 'acc routine seq'.
	* openacc_lib.h (acc_on_device_h): Mark as 'acc routine seq'.
	* testsuite/libgomp.oacc-c-c++-common/abort-1.c: Apply 'acc routine
	seq' on abort.
	* testsuite/libgomp.oacc-c-c++-common/abort-1.c: Add pragma 'acc
	routine(abort) seq'.
	* testsuite/libgomp.oacc-c-c++-common/abort-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/abort-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/abort-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/abort-5.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c:
	Add -fno-exceptions to dg-additional-options.
	* testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c:
	Likewise.
	* testsuite/libgomp.oacc-c-c++-common/data-clauses.h: Add pragma 'acc
	routine(__builtin_abort) seq'.
	* testsuite/libgomp.oacc-c-c++-common/if-1.c: Add pragma 'acc
	routine(abort) seq'.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-1.c: Include openacc.h.
	pass acc_device_nvidia to __builtin_acc_on_device.
	* testsuite/libgomp.oacc-c-c++-common/loop-g-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-v-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-4.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-g-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-v-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-w-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/vector-type-1.c: Add pragma
	'acc routine(__builtin_abort) seq'.
	* testsuite/libgomp.oacc-fortran/abort-1.f90: Add directive 'acc
	routine(abort) seq'.
	* testsuite/libgomp.oacc-fortran/abort-2.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.
	* testsuite/libgomp.oacc-fortran/nested-function-1.f90: Likewise.
	* testsuite/libgomp.oacc-fortran/routine-7.f90: Update test to be
	thread safe.

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 0bb71cb..bf46931 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -303,6 +303,15 @@ enum save_state
 { SAVE_NONE = 0, SAVE_EXPLICIT, SAVE_IMPLICIT
 };
 
+/* Flags to keep track of ACC routine states.  */
+enum oacc_function
+{ OACC_FUNCTION_NONE = 0,
+  OACC_FUNCTION_SEQ,
+  OACC_FUNCTION_GANG,
+  OACC_FUNCTION_WORKER,
+  OACC_FUNCTION_VECTOR
+};
+
 /* Strings for all symbol attributes.  We use these for dumping the
    parse tree, in error messages, and also when reading and writing
    modules.  In symbol.c.  */
@@ -312,6 +321,7 @@ extern const mstring intents[];
 extern const mstring access_types[];
 extern const mstring ifsrc_types[];
 extern const mstring save_status[];
+extern const mstring oacc_function_types[];
 
 /* Enumeration of all the generic intrinsic functions.  Used by the
    backend for identification of a function.  */
@@ -862,7 +872,7 @@ typedef struct
   unsigned oacc_declare_link:1;
 
   /* This is an OpenACC acclerator function at level N - 1  */
-  unsigned oacc_function:3;
+  ENUM_BITFIELD (oacc_function) oacc_function:3;
 
   /* Attributes set by compiler extensions (!GCC$ ATTRIBUTES).  */
   unsigned ext_attr:EXT_ATTR_NUM;
@@ -1956,7 +1966,7 @@ typedef struct gfc_intrinsic_sym
   gfc_typespec ts;
   unsigned elemental:1, inquiry:1, transformational:1, pure:1,
     generic:1, specific:1, actual_ok:1, noreturn:1, conversion:1,
-    from_module:1, vararg:1;
+    from_module:1, vararg:1, oacc_function:1;
 
   int standard;
 
@@ -3299,5 +3309,8 @@ bool gfc_is_reallocatable_lhs (gfc_expr *);
 /* trans-decl.c */
 
 void finish_oacc_declare (gfc_namespace *, gfc_symbol *, bool);
+tree add_omp_offloading_attributes (unsigned omp_declare_target,
+				    enum oacc_function, tree list);
+
 
 #endif /* GCC_GFORTRAN_H  */
diff --git a/gcc/fortran/intrinsic.c b/gcc/fortran/intrinsic.c
index 1d7503d..7b8935b 100644
--- a/gcc/fortran/intrinsic.c
+++ b/gcc/fortran/intrinsic.c
@@ -354,6 +354,7 @@ add_sym (const char *name, gfc_isym_id id, enum klass cl, int actual_ok, bt type
       next_sym->generic = 0;
       next_sym->conversion = 0;
       next_sym->id = id;
+      next_sym->oacc_function = 0;
       break;
 
     default:
@@ -4583,6 +4584,8 @@ gfc_intrinsic_sub_interface (gfc_code *c, int error_flag)
     {
       c->resolved_sym = gfc_get_intrinsic_sub_symbol (isym->lib_name);
       c->resolved_sym->attr.elemental = isym->elemental;
+      if (isym->oacc_function)
+	c->resolved_sym->attr.oacc_function = OACC_FUNCTION_SEQ;
     }
 
   if (gfc_do_concurrent_flag && !isym->pure)
diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 6d3860e..e3ed2a0 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -2095,6 +2095,7 @@ DECL_MIO_NAME (procedure_type)
 DECL_MIO_NAME (ref_type)
 DECL_MIO_NAME (sym_flavor)
 DECL_MIO_NAME (sym_intent)
+DECL_MIO_NAME (oacc_function)
 #undef DECL_MIO_NAME
 
 /* Symbol attributes are stored in list with the first three elements
@@ -2116,6 +2117,8 @@ mio_symbol_attribute (symbol_attribute *attr)
   attr->proc = MIO_NAME (procedure_type) (attr->proc, procedures);
   attr->if_source = MIO_NAME (ifsrc) (attr->if_source, ifsrc_types);
   attr->save = MIO_NAME (save_state) (attr->save, save_status);
+  attr->oacc_function = MIO_NAME (oacc_function) (attr->oacc_function,
+						  oacc_function_types);
 
   ext_attr = attr->ext_attr;
   mio_integer ((int *) &ext_attr);
diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index f514866..a8446fe 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -1664,21 +1664,31 @@ gfc_match_oacc_cache (void)
 
 /* Determine the loop level for a routine.   */
 
-static int
+static oacc_function
 gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
 {
   int level = -1;
+  oacc_function ret = OACC_FUNCTION_SEQ;
 
   if (clauses)
     {
       unsigned mask = 0;
 
       if (clauses->gang)
-	level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
+	{
+	  level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
+	  ret = OACC_FUNCTION_GANG;
+	}
       if (clauses->worker)
-	level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
+	{
+	  level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
+	  ret = OACC_FUNCTION_WORKER;
+	}
       if (clauses->vector)
-	level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
+	{
+	  level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
+	  ret = OACC_FUNCTION_VECTOR;
+	}
       if (clauses->seq)
 	level = GOMP_DIM_MAX, mask |= GOMP_DIM_MASK (level);
 
@@ -1689,7 +1699,7 @@ gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
   if (level < 0)
     level = GOMP_DIM_MAX;
 
-  return level;
+  return ret;
 }
 
 match
@@ -1700,6 +1710,7 @@ gfc_match_oacc_routine (void)
   match m;
   gfc_omp_clauses *c = NULL;
   gfc_oacc_routine_name *n = NULL;
+  gfc_intrinsic_sym *isym = NULL;
 
   old_loc = gfc_current_locus;
 
@@ -1717,12 +1728,16 @@ gfc_match_oacc_routine (void)
   if (m == MATCH_YES)
     {
       char buffer[GFC_MAX_SYMBOL_LEN + 1];
-      gfc_symtree *st;
+      gfc_symtree *st = NULL;
 
       m = gfc_match_name (buffer);
       if (m == MATCH_YES)
 	{
-	  st = gfc_find_symtree (gfc_current_ns->sym_root, buffer);
+	  /* Intrinsic functions don't have symtrees yet.  Defer marking
+	     as oacc_functions.  */
+	  if ((isym = gfc_find_function (buffer)) == NULL
+	      && (isym = gfc_find_subroutine (buffer)) == NULL)
+	    st = gfc_find_symtree (gfc_current_ns->sym_root, buffer);
 	  if (st)
 	    {
 	      sym = st->n.sym;
@@ -1730,7 +1745,7 @@ gfc_match_oacc_routine (void)
 	        sym = NULL;
 	    }
 
-	  if (st == NULL
+	  if ((st == NULL && isym == NULL)
 	      || (sym
 		  && !sym->attr.external
 		  && !sym->attr.function
@@ -1764,7 +1779,9 @@ gfc_match_oacc_routine (void)
 	  != MATCH_YES))
     return MATCH_ERROR;
 
-  if (sym != NULL)
+  if (isym != NULL)
+    isym->oacc_function = 1;
+  else if (sym != NULL)
     {
       n = gfc_get_oacc_routine_name ();
       n->sym = sym;
@@ -1782,7 +1799,7 @@ gfc_match_oacc_routine (void)
 				       &old_loc))
 	goto cleanup;
       gfc_current_ns->proc_name->attr.oacc_function
-	= gfc_oacc_routine_dims (c) + 1;
+	= gfc_oacc_routine_dims (c);
     }
 
   if (n)
diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index 0ee7dec..b1dd32b 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -87,6 +87,15 @@ const mstring save_status[] =
     minit ("IMPLICIT-SAVE", SAVE_IMPLICIT),
 };
 
+const mstring oacc_function_types[] =
+{
+  minit ("NONE", OACC_FUNCTION_NONE),
+  minit ("OACC_FUNCTION_SEQ", OACC_FUNCTION_SEQ),
+  minit ("OACC_FUNCTION_GANG", OACC_FUNCTION_GANG),
+  minit ("OACC_FUNCTION_WORKER", OACC_FUNCTION_WORKER),
+  minit ("OACC_FUNCTION_VECTOR", OACC_FUNCTION_VECTOR)
+};
+
 /* This is to make sure the backend generates setup code in the correct
    order.  */
 
diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index 2f5e434..84fd4ee 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -1308,30 +1308,34 @@ gfc_add_assign_aux_vars (gfc_symbol * sym)
 }
 
 
-static tree
-add_attributes_to_decl (symbol_attribute sym_attr, tree list)
+tree
+add_omp_offloading_attributes (unsigned omp_declare_target,
+			       enum oacc_function acc_routine, tree list)
 {
-  unsigned id;
-  tree attr;
-
-  for (id = 0; id < EXT_ATTR_NUM; id++)
-    if (sym_attr.ext_attr & (1 << id))
-      {
-	attr = build_tree_list (
-		 get_identifier (ext_attr_list[id].middle_end_name),
-				 NULL_TREE);
-	list = chainon (list, attr);
-      }
-
-  if (sym_attr.omp_declare_target)
+  if (omp_declare_target)
     list = tree_cons (get_identifier ("omp declare target"),
 		      NULL_TREE, list);
 
-  if (sym_attr.oacc_function)
+  if (acc_routine)
     {
       tree dims = NULL_TREE;
       int ix;
-      int level = sym_attr.oacc_function - 1;
+      int level = GOMP_DIM_MAX;
+
+      switch (acc_routine)
+	{
+	case OACC_FUNCTION_GANG:
+	  level = GOMP_DIM_GANG;
+	  break;
+	case OACC_FUNCTION_WORKER:
+	  level = GOMP_DIM_WORKER;
+	  break;
+	case OACC_FUNCTION_VECTOR:
+	  level = GOMP_DIM_VECTOR;
+	  break;
+	case OACC_FUNCTION_SEQ:
+	default:;
+	}
 
       for (ix = GOMP_DIM_MAX; ix--;)
 	dims = tree_cons (build_int_cst (boolean_type_node, ix >= level),
@@ -1344,6 +1348,27 @@ add_attributes_to_decl (symbol_attribute sym_attr, tree list)
   return list;
 }
 
+static tree
+add_attributes_to_decl (symbol_attribute sym_attr, tree list)
+{
+  unsigned id;
+  tree attr;
+
+  for (id = 0; id < EXT_ATTR_NUM; id++)
+    if (sym_attr.ext_attr & (1 << id))
+      {
+	attr = build_tree_list (
+		 get_identifier (ext_attr_list[id].middle_end_name),
+				 NULL_TREE);
+	list = chainon (list, attr);
+      }
+
+  list = add_omp_offloading_attributes (sym_attr.omp_declare_target,
+					sym_attr.oacc_function, list);
+
+  return list;
+}
+
 
 static void build_function_decl (gfc_symbol * sym, bool global);
 
diff --git a/gcc/fortran/trans.c b/gcc/fortran/trans.c
index 28d1341..94eb16d 100644
--- a/gcc/fortran/trans.c
+++ b/gcc/fortran/trans.c
@@ -33,6 +33,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans-array.h"
 #include "trans-types.h"
 #include "trans-const.h"
+#include "attribs.h"
 
 /* Naming convention for backend interface code:
 
@@ -2121,11 +2122,18 @@ gfc_unlikely (tree cond, enum br_predictor predictor)
 
   if (optimize)
     {
+      tree fndecl = builtin_decl_explicit (BUILT_IN_EXPECT);
+      tree attributes = NULL_TREE;
+
+      /* Mark calls to BUILT_IN_EXPECT as 'ACC ROUTINE SEQ'.  */
+      if (flag_openacc)
+	attributes = add_omp_offloading_attributes (1, OACC_FUNCTION_SEQ,
+						    attributes);
+
+      decl_attributes (&fndecl, attributes, 0);
       cond = fold_convert (long_integer_type_node, cond);
       tmp = build_zero_cst (long_integer_type_node);
-      cond = build_call_expr_loc (input_location,
-				  builtin_decl_explicit (BUILT_IN_EXPECT),
-				  3, cond, tmp,
+      cond = build_call_expr_loc (input_location, fndecl, 3, cond, tmp,
 				  build_int_cst (integer_type_node,
 						 predictor));
     }
@@ -2143,11 +2151,17 @@ gfc_likely (tree cond, enum br_predictor predictor)
 
   if (optimize)
     {
+      tree fndecl = builtin_decl_explicit (BUILT_IN_EXPECT);
+      tree attributes = NULL_TREE;
+
+      /* Mark calls to BUILT_IN_EXPECT as 'ACC ROUTINE SEQ'.  */
+      if (flag_openacc)
+	attributes = add_omp_offloading_attributes (1, OACC_FUNCTION_SEQ,
+						    attributes);
+      decl_attributes (&fndecl, attributes, 0);
       cond = fold_convert (long_integer_type_node, cond);
       tmp = build_one_cst (long_integer_type_node);
-      cond = build_call_expr_loc (input_location,
-				  builtin_decl_explicit (BUILT_IN_EXPECT),
-				  3, cond, tmp,
+      cond = build_call_expr_loc (input_location, fndecl, 3, cond, tmp,
 				  build_int_cst (integer_type_node,
 						 predictor));
     }
diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c
index 5cef2ba..552ea6b 100644
--- a/gcc/lto-cgraph.c
+++ b/gcc/lto-cgraph.c
@@ -1201,9 +1201,11 @@ input_overwrite_node (struct lto_file_decl_data *file_data,
 				     LDPR_NUM_KNOWN);
   node->instrumentation_clone = bp_unpack_value (bp, 1);
   node->split_part = bp_unpack_value (bp, 1);
-  gcc_assert (flag_ltrans
-	      || (!node->in_other_partition
-		  && !node->used_from_other_partition));
+
+  int success = flag_ltrans || (!node->in_other_partition
+				&& !node->used_from_other_partition);
+  if (!success)
+    error ("Missing %<%s%>", node->name ());
 }
 
 /* Return string alias is alias of.  */
@@ -1416,9 +1418,11 @@ input_varpool_node (struct lto_file_decl_data *file_data,
     node->set_section_for_node (section);
   node->resolution = streamer_read_enum (ib, ld_plugin_symbol_resolution,
 					        LDPR_NUM_KNOWN);
-  gcc_assert (flag_ltrans
-	      || (!node->in_other_partition
-		  && !node->used_from_other_partition));
+
+  int success = flag_ltrans || (!node->in_other_partition
+				&& !node->used_from_other_partition);
+  if (!success)
+    error ("Missing %<%s%>", node->name ());
 
   return node;
 }
diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c
index f240812..84b8ad1 100644
--- a/gcc/lto-wrapper.c
+++ b/gcc/lto-wrapper.c
@@ -785,6 +785,8 @@ compile_images_for_offload_targets (unsigned in_argc, char *in_argv[],
 				    struct cl_decoded_option *linker_opts,
 				    unsigned int linker_opt_count)
 {
+  if (!flag_openacc && !flag_openmp && !flag_openmp_simd)
+    return;
   char **names = NULL;
   const char *target_names = getenv (OFFLOAD_TARGET_NAMES_ENV);
   if (!target_names)
@@ -1082,6 +1084,18 @@ run_gcc (unsigned argc, char *argv[])
 	  lto_mode = LTO_MODE_WHOPR;
 	  break;
 
+	case OPT_fopenacc:
+	  flag_openacc = true;
+	  break;
+
+	case OPT_fopenmp:
+	  flag_openmp = true;
+	  break;
+
+	case OPT_fopenmp_simd:
+	  flag_openmp_simd = true;
+	  break;
+
 	default:
 	  break;
 	}
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 22e5909..13e30a6 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -17114,6 +17114,28 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	  default:
 	    break;
 	  }
+      /* Warn if a non-'acc routine' function is called from an OpenACC
+	 offloaded region.  */
+      if (fndecl)
+	{
+	  omp_context *octx = ctx;
+	  bool is_oacc_offloaded = false;
+
+	  /* Check if the current function is an 'acc routine'.  */
+	  if (get_oacc_fn_attrib (current_function_decl) != NULL_TREE)
+	    is_oacc_offloaded = true;
+
+	  while (!is_oacc_offloaded && octx)
+	    {
+	      if (is_oacc_parallel (octx) || is_oacc_kernels (octx))
+		is_oacc_offloaded = true;
+	      octx = octx->outer;
+	    }
+
+	  if (is_oacc_offloaded && get_oacc_fn_attrib (fndecl) == NULL_TREE)
+	    warning_at (gimple_location (call_stmt), 0,
+			"%qE is not an %<acc routine%>", fndecl);
+	}
       /* FALLTHRU */
     default:
       if ((ctx || task_shared_vars)
@@ -19420,7 +19442,8 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask)
     {
       unsigned outermost = this_mask & -this_mask;
 
-      if (outermost && outermost <= outer_mask)
+      if ((outermost && outermost <= outer_mask)
+	  || (this_mask && (loop->parent->flags & OLF_SEQ)))
 	{
 	  if (noisy)
 	    {
diff --git a/gcc/testsuite/c-c++-common/goacc/kernels-1.c b/gcc/testsuite/c-c++-common/goacc/kernels-1.c
index 4fcf86e..7afa8c9 100644
--- a/gcc/testsuite/c-c++-common/goacc/kernels-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/kernels-1.c
@@ -23,7 +23,7 @@ int
 kernels_noreturn (void)
 {
 #pragma acc kernels
-  __builtin_abort ();
+  __builtin_abort (); /* { dg-warning "'__builtin_abort' is not an 'acc routine'" } */
 
   return 0;
 }
diff --git a/gcc/testsuite/c-c++-common/goacc/parallel-1.c b/gcc/testsuite/c-c++-common/goacc/parallel-1.c
index 6c6cc88..3e070e1 100644
--- a/gcc/testsuite/c-c++-common/goacc/parallel-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/parallel-1.c
@@ -23,7 +23,7 @@ int
 parallel_noreturn (void)
 {
 #pragma acc parallel
-  __builtin_abort ();
+  __builtin_abort (); /* { dg-warning "'__builtin_abort' is not an 'acc routine'" } */
 
   return 0;
 }
diff --git a/gcc/testsuite/c-c++-common/goacc/routine-3.c b/gcc/testsuite/c-c++-common/goacc/routine-3.c
index b322d26..fabae1f 100644
--- a/gcc/testsuite/c-c++-common/goacc/routine-3.c
+++ b/gcc/testsuite/c-c++-common/goacc/routine-3.c
@@ -49,7 +49,7 @@ main ()
   int red = 0;
 #pragma acc parallel copy (red)
   {
-    /* Independent/seq loop tests.  */
+    /* Independent loop tests.  */
 #pragma acc loop reduction (+:red) // { dg-warning "insufficient partitioning" }
     for (int i = 0; i < 10; i++)
       red += gang ();
@@ -62,6 +62,19 @@ main ()
     for (int i = 0; i < 10; i++)
       red += vector ();
 
+    /* Seq loop tests.  */
+#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += gang (); /* { dg-error "incorrectly nested" } */
+
+#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += worker (); /* { dg-error "incorrectly nested" } */
+
+#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
+    for (int i = 0; i < 10; i++)
+      red += vector (); /* { dg-error "incorrectly nested" } */
+    
     /* Gang routine tests.  */
 #pragma acc loop gang reduction (+:red)  /* { dg-message "containing loop" } */
     for (int i = 0; i < 10; i++)
diff --git a/gcc/testsuite/c-c++-common/goacc/routine-6.c b/gcc/testsuite/c-c++-common/goacc/routine-6.c
new file mode 100644
index 0000000..fddb5e0
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/routine-6.c
@@ -0,0 +1,26 @@
+/* Test calls to non-routines.  */
+
+int
+sum (int a, int b)
+{
+  return a + b;
+}
+
+int
+main ()
+{
+  int c = 0, i;
+
+#pragma acc parallel loop reduction(+:c)
+  for (i = 0; i < 100; i++)
+    c += sum (i, i); /* { dg-warning "'sum' is not an 'acc routine'" } */
+
+  /* Built-in functions are permitted.  */
+#pragma acc parallel
+  {
+    if (c < 0)
+      __builtin_abort (); /* { dg-warning "'__builtin_abort' is not an 'acc routine'" } */
+  }
+
+  return 0;
+}
diff --git a/gcc/testsuite/gfortran.dg/goacc/fixed-1.f b/gcc/testsuite/gfortran.dg/goacc/fixed-1.f
index 6a454190..0c0fb98 100644
--- a/gcc/testsuite/gfortran.dg/goacc/fixed-1.f
+++ b/gcc/testsuite/gfortran.dg/goacc/fixed-1.f
@@ -1,3 +1,5 @@
+!$ACC ROUTINE(ABORT) SEQ
+
       INTEGER :: ARGC
       ARGC = COMMAND_ARGUMENT_COUNT ()
 
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-7.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-7.f90
new file mode 100644
index 0000000..76b08eb
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-7.f90
@@ -0,0 +1,25 @@
+! Test calls to non-acc routines.
+
+program test
+  implicit none
+  integer c, i
+
+  c = 0
+
+  !$acc parallel loop reduction(+:c)
+  do i = 0, 100
+     c = c + sum (i, i) ! { dg-warning "'sum' is not an 'acc routine'" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel
+  if (c .le. 0) call abort ! { dg-warning "is not an 'acc routine'" }
+  !$acc end parallel
+  
+contains
+  integer function sum(a, b)
+    integer a, b
+    sum = a + b
+  end function sum
+  
+end program test
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-8.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-8.f90
new file mode 100644
index 0000000..d2cb51a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-8.f90
@@ -0,0 +1,122 @@
+! Check routine calls with insufficient parallelism.
+
+! { dg-do compile }
+! { dg-additional-options "-cpp -O0" }
+
+#define M 8
+#define N 32
+
+program main
+  integer :: i
+  integer :: a(N)
+  integer :: b(M * N)
+
+  do i = 1, N
+    a(i) = 0
+  end do
+
+  !$acc parallel copy (a)
+  !$acc loop seq
+    do i = 1, N
+      call seq (a)
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne.N) call abort
+  end do
+
+  !$acc parallel copy (a)
+  !$acc loop seq ! { dg-message "containing loop here" }
+    do i = 1, N 
+      call gang (a) ! { dg-error "incorrectly nested OpenACC loop parallelism" }
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. (N + (N * (-1 * i)))) call abort
+  end do
+
+  do i = 1, N
+    b(i) = i
+  end do
+
+  !$acc parallel copy (b)
+  !$acc loop seq ! { dg-message "containing loop here" }
+    do i = 1, N
+      call worker (b) ! { dg-error "incorrectly nested OpenACC loop parallelism" }
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (b(i) .ne. N + i) call abort
+  end do
+
+  do i = 1, N
+    a(i) = i
+  end do
+
+  !$acc parallel copy (a)
+  !$acc loop seq ! { dg-message "containing loop here" }
+    do i = 1, N
+      call vector (a) ! { dg-error "incorrectly nested OpenACC loop parallelism" }
+    end do
+  !$acc end parallel
+
+  do i = 1, N
+    if (a(i) .ne. 0) call abort
+  end do
+
+contains
+
+subroutine vector (a)
+  !$acc routine vector
+  integer, intent (inout) :: a(N)
+  integer :: i
+
+  !$acc loop vector
+  do i = 1, N
+    a(i) = a(i) - a(i) 
+  end do
+
+end subroutine vector
+
+subroutine worker (b)
+  !$acc routine worker
+  integer, intent (inout) :: b(M*N)
+  integer :: i, j
+
+  !$acc loop worker
+  do i = 1, N
+  !$acc loop vector
+    do j = 1, M
+      b(j + ((i - 1) * M)) = b(j + ((i - 1) * M)) + 1
+    end do
+  end do
+
+end subroutine worker
+
+subroutine gang (a)
+  !$acc routine gang
+  integer, intent (inout) :: a(N)
+  integer :: i
+
+  !$acc loop gang
+  do i = 1, N
+    a(i) = a(i) - i 
+  end do
+
+end subroutine gang
+
+subroutine seq (a)
+  !$acc routine seq
+  integer, intent (inout) :: a(M)
+  integer :: i
+
+  do i = 1, N
+    a(i) = a(i) + 1
+  end do
+
+end subroutine seq
+
+end program main
diff --git a/libgomp/Makefile.am b/libgomp/Makefile.am
index a3e1c2b..c5e7614 100644
--- a/libgomp/Makefile.am
+++ b/libgomp/Makefile.am
@@ -87,8 +87,10 @@ omp_lib_kinds.mod: omp_lib.mod
 	:
 openacc_kinds.mod: openacc.mod
 	:
-openacc.mod: openacc.lo
-	:
+openacc.lo: openacc.f90
+	$(LTFCCOMPILE) -fopenacc -frandom-seed=1 -c -o $@ $^
+openacc.mod: openacc.f90
+	$(FC) $(FCFLAGS) -fopenacc -frandom-seed=1 -c $<
 %.mod: %.f90
 	$(FC) $(FCFLAGS) -fsyntax-only $<
 fortran.lo: libgomp_f.h
diff --git a/libgomp/Makefile.in b/libgomp/Makefile.in
index 88c8517..999409a4 100644
--- a/libgomp/Makefile.in
+++ b/libgomp/Makefile.in
@@ -1286,8 +1286,10 @@ omp_lib_kinds.mod: omp_lib.mod
 	:
 openacc_kinds.mod: openacc.mod
 	:
-openacc.mod: openacc.lo
-	:
+openacc.lo: openacc.f90
+	$(LTFCCOMPILE) -fopenacc -frandom-seed=1 -c -o $@ $^
+openacc.mod: openacc.f90
+	$(FC) $(FCFLAGS) -fopenacc -frandom-seed=1 -c $<
 %.mod: %.f90
 	$(FC) $(FCFLAGS) -fsyntax-only $<
 fortran.lo: libgomp_f.h
diff --git a/libgomp/openacc.f90 b/libgomp/openacc.f90
index 4b71489..98ba493 100644
--- a/libgomp/openacc.f90
+++ b/libgomp/openacc.f90
@@ -128,6 +128,7 @@ module openacc_internal
 
     function acc_on_device_h (d)
       import
+      !$acc routine seq
       integer (acc_device_kind) d
       logical acc_on_device_h
     end function
@@ -719,6 +720,7 @@ end subroutine
 function acc_on_device_h (d)
   use openacc_internal, only: acc_on_device_l
   use openacc_kinds
+  !$acc routine seq
   integer (acc_device_kind) d
   logical acc_on_device_h
   if (acc_on_device_l (d) .eq. 1) then
diff --git a/libgomp/openacc.h b/libgomp/openacc.h
index 7ea8794..e98985c 100644
--- a/libgomp/openacc.h
+++ b/libgomp/openacc.h
@@ -83,6 +83,9 @@ void acc_shutdown (acc_device_t) __GOACC_NOTHROW;
 #ifdef __cplusplus
 int acc_on_device (int __arg) __GOACC_NOTHROW;
 #else
+#ifdef _OPENACC
+#pragma acc routine seq
+#endif
 int acc_on_device (acc_device_t __arg) __GOACC_NOTHROW;
 #endif
 void *acc_malloc (size_t) __GOACC_NOTHROW;
@@ -128,4 +131,8 @@ inline int acc_on_device (acc_device_t __arg) __GOACC_NOTHROW
 }
 #endif
 
+#ifdef _OPENACC
+#pragma acc routine(__builtin_acc_on_device) seq
+#endif
+
 #endif /* _OPENACC_H */
diff --git a/libgomp/openacc_lib.h b/libgomp/openacc_lib.h
index a3f94d7..d627857 100644
--- a/libgomp/openacc_lib.h
+++ b/libgomp/openacc_lib.h
@@ -142,6 +142,7 @@
       interface acc_on_device
         function acc_on_device_h (devicetype)
           import acc_device_kind
+!$acc routine seq
           logical acc_on_device_h
           integer (acc_device_kind) devicetype
         end function
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-1.c
index 296708f..bc4eab3 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-1.c
@@ -2,6 +2,7 @@
 
 #include <stdio.h>
 #include <stdlib.h>
+#pragma acc routine(abort) seq
 
 int
 main (void)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-2.c
index debb81e..20076cd 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-2.c
@@ -1,6 +1,7 @@
 /* { dg-do run } */
 
 #include <stdlib.h>
+#pragma acc routine(abort) seq
 
 int
 main (int argc, char **argv)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c
index bca425e..e6fc72f 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-3.c
@@ -2,6 +2,7 @@
 
 #include <stdio.h>
 #include <stdlib.h>
+#pragma acc routine(abort) seq
 
 int
 main (void)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c
index c29ca3f..53a069a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-4.c
@@ -1,6 +1,7 @@
 /* { dg-do run } */
 
 #include <stdlib.h>
+#pragma acc routine(abort) seq
 
 int
 main (int argc, char **argv)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-5.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-5.c
index 314f04a..c38576e 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-5.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/abort-5.c
@@ -2,6 +2,7 @@
 /* { dg-additional-options "-flto" { target lto } } */
 
 #include <stdlib.h>
+#pragma acc routine(abort) seq
 
 int
 main (int argc, char **argv)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c
index 8112745..a214329 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/acc_on_device-1.c
@@ -4,6 +4,7 @@
 
 #include <stdlib.h>
 #include <openacc.h>
+#pragma acc routine(abort) seq
 
 int
 main (int argc, char *argv[])
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
index 2cd98bd..83d8e56 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
@@ -1,4 +1,4 @@
 /* { dg-do run { target lto } } */
-/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
+/* { dg-additional-options "-fipa-pta -flto -flto-partition=max -fno-exceptions" } */
 
 #include "data-clauses-kernels.c"
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
index f7f2d1c..a3934cb 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
@@ -1,2 +1,4 @@
+/* { dg-additional-options "-fno-exceptions" }  */
+
 #define CONSTRUCT kernels
 #include "data-clauses.h"
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
index ddcf4e3..6d24b3d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
@@ -1,4 +1,4 @@
 /* { dg-do run { target lto } } */
-/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
+/* { dg-additional-options "-fipa-pta -flto -flto-partition=max -fno-exceptions" } */
 
 #include "data-clauses-parallel.c"
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
index e734b2f..02f1e88 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
@@ -1,2 +1,4 @@
+/* { dg-additional-options "-fno-exceptions" }  */
+
 #define CONSTRUCT parallel
 #include "data-clauses.h"
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
index d557bef..5e7eb14 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses.h
@@ -1,3 +1,5 @@
+#pragma acc routine(__builtin_abort) seq
+
 int i;
 
 int main(void)
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
index 5398905..81aec7e 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/if-1.c
@@ -2,6 +2,8 @@
 #include <stdlib.h>
 #include <stdbool.h>
 
+#pragma acc routine(abort) seq
+
 #define N   32
 
 int
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-1.c
index 7bff6cd..21f8bc1 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-1.c
@@ -3,6 +3,7 @@
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
 
 #include <stdio.h>
+#include <openacc.h>
 
 #define N (32*32*32+17)
 int main ()
@@ -20,7 +21,7 @@ int main ()
 #pragma acc loop gang
     for (unsigned ix = 0; ix < N; ix++)
       {
-	if (__builtin_acc_on_device (5))
+	if (__builtin_acc_on_device (acc_device_nvidia))
 	  {
 	    int g = 0, w = 0, v = 0;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-2.c
index 92b82a0..72c3bde 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-g-2.c
@@ -3,6 +3,7 @@
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
 
 #include <stdio.h>
+#include <openacc.h>
 
 #define N (32*32*32+17)
 int main ()
@@ -20,7 +21,7 @@ int main ()
 #pragma acc loop gang (static:1)
     for (unsigned ix = 0; ix < N; ix++)
       {
-	if (__builtin_acc_on_device (5))
+	if (__builtin_acc_on_device (acc_device_nvidia))
 	  {
 	    int g = 0, w = 0, v = 0;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c
index 42b612a..364f058 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c
@@ -3,6 +3,7 @@
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
 
 #include <stdio.h>
+#include <openacc.h>
 
 #define N (32*32*32+17)
 int main ()
@@ -20,7 +21,7 @@ int main ()
 #pragma acc loop gang worker vector
     for (unsigned ix = 0; ix < N; ix++)
       {
-	if (__builtin_acc_on_device (5))
+	if (__builtin_acc_on_device (acc_device_nvidia))
 	  {
 	    int g = 0, w = 0, v = 0;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c
index a8684f95..d1d27b3 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-g-1.c
@@ -3,6 +3,7 @@
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
 
 #include <stdio.h>
+#include <openacc.h>
 
 #define N (32*32*32+17)
 int main ()
@@ -18,7 +19,7 @@ int main ()
       {
 	int val = ix;
 	
-	if (__builtin_acc_on_device (5))
+	if (__builtin_acc_on_device (acc_device_nvidia))
 	  {
 	    int g = 0, w = 0, v = 0;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c
index 3b104cf..0ebbc63 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c
@@ -3,6 +3,7 @@
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
 
 #include <stdio.h>
+#include <openacc.h>
 
 #define N (32*32*32+17)
 int main ()
@@ -18,7 +19,7 @@ int main ()
       {
 	int val = ix;
 	
-	if (__builtin_acc_on_device (5))
+	if (__builtin_acc_on_device (acc_device_nvidia))
 	  {
 	    int g = 0, w = 0, v = 0;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c
index b77ae76..1b350e9 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-1.c
@@ -3,6 +3,7 @@
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
 
 #include <stdio.h>
+#include <openacc.h>
 
 #define N (32*32*32+17)
 
@@ -19,7 +20,7 @@ int main ()
       {
 	int val = ix;
 	
-	if (__builtin_acc_on_device (5))
+	if (__builtin_acc_on_device (acc_device_nvidia))
 	  {
 	    int g = 0, w = 0, v = 0;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c
index 16d8f9f..4b6d835 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c
@@ -3,6 +3,7 @@
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
 
 #include <stdio.h>
+#include <openacc.h>
 
 #define N (32*32*32+17)
 
@@ -21,7 +22,7 @@ int main ()
       {
 	int val = ix;
 	
-	if (__builtin_acc_on_device (5))
+	if (__builtin_acc_on_device (acc_device_nvidia))
 	  {
 	    int g = 0, w = 0, v = 0;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
index 9cc12b3..44ab546 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-1.c
@@ -3,6 +3,7 @@
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
 
 #include <stdio.h>
+#include <openacc.h>
 
 #define N (32*32*32+17)
 int main ()
@@ -18,7 +19,7 @@ int main ()
       {
 	int val = ix;
 	
-	if (__builtin_acc_on_device (5))
+	if (__builtin_acc_on_device (acc_device_nvidia))
 	  {
 	    int g = 0, w = 0, v = 0;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c
index f0c9d81..2e3f1e5 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c
@@ -3,6 +3,7 @@
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
 
 #include <stdio.h>
+#include <openacc.h>
 
 #define N (32*32*32+17)
 int main ()
@@ -20,7 +21,7 @@ int main ()
       {
 	int val = ix;
 	
-	if (__builtin_acc_on_device (5))
+	if (__builtin_acc_on_device (acc_device_nvidia))
 	  {
 	    int g = 0, w = 0, v = 0;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c
index 398b7cc..30f767a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c
@@ -2,6 +2,7 @@
 /* { dg-additional-options "-O2" } */
 
 #include <stdio.h>
+#include <openacc.h>
 
 #define N (32*32*32+17)
 int main ()
@@ -17,7 +18,7 @@ int main ()
       {
 	int val = ix;
 	
-	if (__builtin_acc_on_device (5))
+	if (__builtin_acc_on_device (acc_device_nvidia))
 	  {
 	    int g = 0, w = 0, v = 0;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-v-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-v-1.c
index 2974807..7a0d688 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-v-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-v-1.c
@@ -3,6 +3,7 @@
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
 
 #include <stdio.h>
+#include <openacc.h>
 
 #define N (32*32*32+17)
 int main ()
@@ -20,7 +21,7 @@ int main ()
 #pragma acc loop vector
     for (unsigned ix = 0; ix < N; ix++)
       {
-	if (__builtin_acc_on_device (5))
+	if (__builtin_acc_on_device (acc_device_nvidia))
 	  {
 	    int g = 0, w = 0, v = 0;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c
index 33b6eae..c165f1d 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-w-1.c
@@ -3,6 +3,7 @@
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
 
 #include <stdio.h>
+#include <openacc.h>
 
 #define N (32*32*32+17)
 int main ()
@@ -20,7 +21,7 @@ int main ()
 #pragma acc loop worker
     for (unsigned ix = 0; ix < N; ix++)
       {
-	if (__builtin_acc_on_device (5))
+	if (__builtin_acc_on_device (acc_device_nvidia))
 	  {
 	    int g = 0, w = 0, v = 0;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c
index 578cfad..70bfa62 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c
@@ -3,6 +3,7 @@
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
 
 #include <stdio.h>
+#include <openacc.h>
 
 #define N (32*32*32+17)
 int main ()
@@ -20,7 +21,7 @@ int main ()
 #pragma acc loop worker vector
     for (unsigned ix = 0; ix < N; ix++)
       {
-	if (__builtin_acc_on_device (5))
+	if (__builtin_acc_on_device (acc_device_nvidia))
 	  {
 	    int g = 0, w = 0, v = 0;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c
index d6ff44d..02b1f15 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-4.c
@@ -78,7 +78,7 @@ main(int argc, char **argv)
 
 #pragma acc parallel copy (a[0:N])
   {
-#pragma acc loop seq
+#pragma acc loop /* { dg-warning "insufficient partitioning" } */
     for (i = 0; i < N; i++)
       gang (&a[0]);
   }
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-g-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-g-1.c
index 9d14c3b..be80457 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-g-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-g-1.c
@@ -3,6 +3,7 @@
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
 
 #include <stdio.h>
+#include <openacc.h>
 
 #define N (32*32*32+17)
 
@@ -12,7 +13,7 @@ void __attribute__ ((noinline)) gang (int ary[N])
 #pragma acc loop gang
     for (unsigned ix = 0; ix < N; ix++)
       {
-	if (__builtin_acc_on_device (5))
+	if (__builtin_acc_on_device (acc_device_nvidia))
 	  {
 	    int g = 0, w = 0, v = 0;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c
index ace2f49..b6c689b 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c
@@ -3,6 +3,7 @@
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
 
 #include <stdio.h>
+#include <openacc.h>
 
 #define N (32*32*32+17)
 
@@ -12,7 +13,7 @@ void __attribute__ ((noinline)) gang (int ary[N])
 #pragma acc loop gang worker vector
   for (unsigned ix = 0; ix < N; ix++)
     {
-      if (__builtin_acc_on_device (5))
+      if (__builtin_acc_on_device (acc_device_nvidia))
 	{
 	  int g = 0, w = 0, v = 0;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-v-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-v-1.c
index 2503e8d..5a73b0b 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-v-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-v-1.c
@@ -3,6 +3,7 @@
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
 
 #include <stdio.h>
+#include <openacc.h>
 
 #define N (32*32*32+17)
 
@@ -12,7 +13,7 @@ void __attribute__ ((noinline)) vector (int ary[N])
 #pragma acc loop vector
   for (unsigned ix = 0; ix < N; ix++)
     {
-      if (__builtin_acc_on_device (5))
+      if (__builtin_acc_on_device (acc_device_nvidia))
 	{
 	  int g = 0, w = 0, v = 0;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-w-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-w-1.c
index 80cd462..523353a 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-w-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-w-1.c
@@ -3,6 +3,7 @@
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
 
 #include <stdio.h>
+#include <openacc.h>
 
 #define N (32*32*32+17)
 
@@ -12,7 +13,7 @@ void __attribute__ ((noinline)) worker (int ary[N])
 #pragma acc loop worker
   for (unsigned ix = 0; ix < N; ix++)
     {
-      if (__builtin_acc_on_device (5))
+      if (__builtin_acc_on_device (acc_device_nvidia))
 	{
 	  int g = 0, w = 0, v = 0;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c
index 5e45fad..e92b160 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c
@@ -3,6 +3,7 @@
    { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
 
 #include <stdio.h>
+#include <openacc.h>
 
 #define N (32*32*32+17)
 
@@ -12,7 +13,7 @@ void __attribute__ ((noinline)) worker (int ary[N])
 #pragma acc loop worker vector
   for (unsigned ix = 0; ix < N; ix++)
     {
-      if (__builtin_acc_on_device (5))
+      if (__builtin_acc_on_device (acc_device_nvidia))
 	{
 	  int g = 0, w = 0, v = 0;
 
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-type-1.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-type-1.c
index 5adfcec..6c2b1c2 100644
--- a/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-type-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/vector-type-1.c
@@ -1,3 +1,5 @@
+#pragma acc routine(__builtin_abort) seq
+
 #define vector __attribute__ ((vector_size (4 * sizeof(int))))
 
 int main(void)
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/abort-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/abort-1.f90
index b38303d..48ebc38 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/abort-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/abort-1.f90
@@ -1,5 +1,6 @@
 program main
   implicit none
+  !$acc routine(abort) seq
 
   print *, "CheCKpOInT"
   !$acc parallel
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/abort-2.f90 b/libgomp/testsuite/libgomp.oacc-fortran/abort-2.f90
index 2ba2bcb..a80593e 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/abort-2.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/abort-2.f90
@@ -1,5 +1,6 @@
 program main
   implicit none
+  !$acc routine(abort) seq
 
   integer :: argc
   argc = command_argument_count ()
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
index 1a10f32..94e45b3 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
@@ -6,6 +6,7 @@
 
 use openacc
 implicit none
+!$acc routine(abort) seq
 
 ! Host.
 
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
index a19045b..cbd1dd9 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
@@ -6,6 +6,7 @@
 
       USE OPENACC
       IMPLICIT NONE
+!$ACC ROUTINE(ABORT) SEQ
 
 !Host.
 
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
index c391776..3e016f4 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
@@ -6,6 +6,7 @@
 
       IMPLICIT NONE
       INCLUDE "openacc_lib.h"
+!$ACC ROUTINE(ABORT) SEQ
 
 !Host.
 
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/nested-function-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/nested-function-1.f90
index fdbca44..2b14159 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/nested-function-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/nested-function-1.f90
@@ -3,6 +3,8 @@
 ! { dg-do run }
 
 program collapse2
+  !$acc routine(abort) seq
+
   call test1
   call test2
 contains
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90 b/libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90
index 200188e..27cda44 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/routine-7.f90
@@ -1,121 +1,101 @@
+! Test acc routines.
 
 ! { dg-do run }
-! { dg-additional-options "-cpp" }
 
-#define M 8
-#define N 32
+module size
+  integer, parameter :: N = 32
+end module size
 
 program main
+  use size
+  implicit none
+
   integer :: i
   integer :: a(N)
-  integer :: b(M * N)
 
-  do i = 1, N
-    a(i) = 0
-  end do
-
-  !$acc parallel copy (a)
-  !$acc loop seq
-    do i = 1, N
-      call seq (a)
-    end do
+  !$acc parallel
+  call seq (a)
   !$acc end parallel
 
   do i = 1, N
-    if (a(i) .ne.N) call abort
+    if (a(i) .ne. 4) call abort
   end do
 
-  !$acc parallel copy (a)
-  !$acc loop seq
-    do i = 1, N 
-      call gang (a)
-    end do
+  !$acc parallel
+  call gang (a)
   !$acc end parallel
 
   do i = 1, N
-    if (a(i) .ne. (N + (N * (-1 * i)))) call abort
+    if (a(i) .ne. 3) call abort
   end do
 
-  do i = 1, N
-    b(i) = i
-  end do
-
-  !$acc parallel copy (b)
-  !$acc loop seq
-    do i = 1, N
-      call worker (b)
-    end do
+  !$acc parallel
+  call worker (a)
   !$acc end parallel
 
   do i = 1, N
-    if (b(i) .ne. N + i) call abort
-  end do
-
-  do i = 1, N
-    a(i) = i
+    if (a(i) .ne. 2) call abort
   end do
 
-  !$acc parallel copy (a)
-  !$acc loop seq
-    do i = 1, N
-      call vector (a)
-    end do
+  !$acc parallel
+  call vector (a)
   !$acc end parallel
 
   do i = 1, N
-    if (a(i) .ne. 0) call abort
+    if (a(i) .ne. 1) call abort
   end do
 
 contains
 
 subroutine vector (a)
+  use size
+  implicit none
   !$acc routine vector
   integer, intent (inout) :: a(N)
   integer :: i
 
   !$acc loop vector
   do i = 1, N
-    a(i) = a(i) - a(i) 
+    a(i) = 1
   end do
-
 end subroutine vector
 
-subroutine worker (b)
+subroutine worker (a)
+  use size
+  implicit none
   !$acc routine worker
-  integer, intent (inout) :: b(M*N)
-  integer :: i, j
+  integer, intent (inout) :: a(N)
+  integer :: i
 
   !$acc loop worker
   do i = 1, N
-  !$acc loop vector
-    do j = 1, M
-      b(j + ((i - 1) * M)) = b(j + ((i - 1) * M)) + 1
-    end do
+    a(i) = 2
   end do
-
 end subroutine worker
 
 subroutine gang (a)
+  use size
+  implicit none
   !$acc routine gang
   integer, intent (inout) :: a(N)
   integer :: i
 
   !$acc loop gang
   do i = 1, N
-    a(i) = a(i) - i 
+    a(i) = 3
   end do
-
 end subroutine gang
 
 subroutine seq (a)
+  use size
+  implicit none
   !$acc routine seq
-  integer, intent (inout) :: a(M)
+  integer, intent (inout) :: a(N)
   integer :: i
 
   do i = 1, N
-    a(i) = a(i) + 1
+    a(i) = 4
   end do
-
 end subroutine seq
 
 end program main

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls
  2016-06-23 16:05   ` Cesar Philippidis
@ 2016-06-29 14:11     ` Thomas Schwinge
  2016-06-29 14:35       ` Jakub Jelinek
  2016-06-29 15:31       ` Cesar Philippidis
  0 siblings, 2 replies; 14+ messages in thread
From: Thomas Schwinge @ 2016-06-29 14:11 UTC (permalink / raw)
  To: Cesar Philippidis, Jakub Jelinek
  Cc: gcc-patches, Fortran List, Alexander Monakov, Ilya Verbin

[-- Attachment #1: Type: text/plain, Size: 16736 bytes --]

Hi!

Cesar, I have not yet fully digested this, but do I understand right that
you're really fixing two issues here, that are related (OpenACC routines)
but still can be addressed independently of each other?  Do I understand
right that the first one, the "problems with acc routines [...]
incorrectly permitting 'acc seq' loops to call gang, worker and vector
routines" is just a Fortran front end patch?  If yes, please split that
one out, so as to reduce the volume of remaining changes that remain to
be discussed.

On Thu, 23 Jun 2016 09:05:38 -0700, Cesar Philippidis <cesar@codesourcery.com> wrote:
> On 06/17/2016 07:42 AM, Jakub Jelinek wrote:
> > On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote:
> >> The second set of changes involves teaching the gimplifier to error when
> >> it detects a function call to an non-acc routines inside an OpenACC
> >> offloaded region.

As I understand, that's the same problem as has been discussed before
(Ilya CCed), and has recently again been filed in
<https://gcc.gnu.org/PR71499> "ICE in LTO1 when attempting NVPTX
offloading (-fopenacc)", and <https://gcc.gnu.org/PR71535> "ICE in LTO1
with -fopenmp offloading" (Alexander CCed).  Some earlier discussion
threads include:
<http://news.gmane.org/find-root.php?message_id=%3C20150109145702.GA45210%40msticlxl57.ims.intel.com%3E>,
<http://news.gmane.org/find-root.php?message_id=%3C20150724152119.GA41292%40msticlxl57.ims.intel.com%3E>,
<http://news.gmane.org/find-root.php?message_id=%3C56269C05.6030502%40acm.org%3E>.

> >> Actually, I relaxed non-acc routines by excluding
> >> calls to builtin functions, including those prefixed with _gfortran_.
> >> Nvptx does have a newlib c library, and it also has a subset of
> >> libgfortran. Still, this solution is probably not optimal.
> > 
> > I don't really like that, hardcoding prefixes or whatever is available
> > (you have quite some subset of libc, libm etc. available too) in the
> > compiler looks very hackish.  What is wrong with complaining during
> > linking of the offloaded code?

ACK.  Jakub, do I understand you correctly, that you basically say that
every function declaration that is in scope inside offloaded regions (for
example, GCC builtin functions, or standard library functions declared in
target compiler's header files) is permitted to be called in offloaded
regions, and the offloading compiler will then either be able to resolve
these (nvptx back end knows about trigonometric functions, for example,
and a lot of functions are available in the nvptx libc), or otherwise
error out during the offloading compilation (during linking), gracefully
without terminating the target compilation (that "gracefully" bit is
currently missing -- that's for another day).  That is, all such
functions are implicitly callable as OpenACC "seq" functions (which means
that they don't internally use gang/worker/vector parallelism).  In
particular, all these functions do *not* need to be marked with an
explicit "#pragma acc routine seq" directive.  (Functions internally
using gang/worker/vector parallelism will need to be marked
appropriately, using a "#pragma acc routine gang/worker/vector"
directive.)  That's how I understand your comment above, and your earlier
comments on this topic, and also is what I think should be done.

> Wouldn't the error get reported multiple times then, i.e. once per
> target? Then again, maybe this error could have been restrained to the
> host compiler.

That's not something I would care about right now.  :-)

> Anyway, this patch now reduces that error to a warning. Furthermore,
> that warning is being thrown in lower_omp_1 instead of
> gimplify_call_expr because the latter is called multiple times and that
> causes duplicate warnings. The only bit of fallout I had with this
> change was with the fortran FE's usage of BUILT_IN_EXPECT in
> gfc_{un}likely. Since these are generated implicitly by the FE, I just
> added an oacc_function attribute to those calls when flag_openacc is set.
> 
> >> Next, I had to modify the openacc header files in libgomp to mark
> >> acc_on_device as an acc routine. Unfortunately, this meant that I had to
> >> build the opeancc.mod module for gfortran with -fopenacc. But doing
> >> that, caused caused gcc to stream offloaded code to the openacc.o object
> >> file. So, I've updated the behavior of flag_generate_offload such that
> >> minus one indicates that the user specified -foffload=disable, and that
> >> will prevent gcc from streaming offloaded lto code. The alternative was
> >> to hack libtool to build libgomp with -foffload=disable.
> > 
> > This also looks wrong.  I'd say the right thing is when loading modules
> > that have OpenACC bits set in it (and also OpenMP bits, I admit I haven't
> > handled this well) into CU with the corresponding flags unset (-fopenacc,
> > -fopenmp, -fopenmp-simd here, depending on which bit it is), then
> > IMHO the module loading code should just ignore it, pretend it wasn't there.
> > Similarly e.g. to how lto1 with -g0 should ignore debug statements that
> > could be in the LTO inputs.

(Also a task for another day, in my opinion.)

> This required two changes. First, I had to teach lto-cgraph.c how to
> report an error rather then fail an assert when partitions are missing
> decls.

Something like that may make sense (conceptually).

> Second, I taught the lto wrapper how to stream offloaded code on
> the absence of -fopen*. The only kink with this approach is that I had
> to build libgomp/openacc.f90 with -frandom-seed=1 to prevent lto related
> bootstrap failures.

Uh.  Hopefully we're not going to need something like that.

> By the way, Thomas, I've added
> 
>  #pragma acc routine(__builtin_acc_on_device) seq
> 
> to openacc.h. Is this OK, or should I just modify the various
> libgomp.oacc-c-c++-common/loop* tests to use that pragma directly? Or
> another option is to have the compiler add that attribute directly. I
> don't think we're really expecting the end user to use
> __builtin_acc_on_device directly since this is a gcc-ism.

As per my reasoning above, all that should not be needed.


A few random comments on the patch:

> --- a/gcc/fortran/gfortran.h
> +++ b/gcc/fortran/gfortran.h
> @@ -303,6 +303,15 @@ enum save_state
>  { SAVE_NONE = 0, SAVE_EXPLICIT, SAVE_IMPLICIT
>  };
>  
> +/* Flags to keep track of ACC routine states.  */
> +enum oacc_function
> +{ OACC_FUNCTION_NONE = 0,
> +  OACC_FUNCTION_SEQ,
> +  OACC_FUNCTION_GANG,
> +  OACC_FUNCTION_WORKER,
> +  OACC_FUNCTION_VECTOR
> +};

What's the purpose of OACC_FUNCTION_NONE?  It's not used anywhere, as far
as I can tell?

> --- a/gcc/fortran/openmp.c
> +++ b/gcc/fortran/openmp.c
> @@ -1664,21 +1664,31 @@ gfc_match_oacc_cache (void)
>  
>  /* Determine the loop level for a routine.   */
>  
> -static int
> +static oacc_function
>  gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
>  {
>    int level = -1;
> +  oacc_function ret = OACC_FUNCTION_SEQ;
>  
>    if (clauses)
>      {
>        unsigned mask = 0;
>  
>        if (clauses->gang)
> -	level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
> +	{
> +	  level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
> +	  ret = OACC_FUNCTION_GANG;
> +	}
>        if (clauses->worker)
> -	level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
> +	{
> +	  level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
> +	  ret = OACC_FUNCTION_WORKER;
> +	}
>        if (clauses->vector)
> -	level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
> +	{
> +	  level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
> +	  ret = OACC_FUNCTION_VECTOR;
> +	}
>        if (clauses->seq)
>  	level = GOMP_DIM_MAX, mask |= GOMP_DIM_MASK (level);
>  

I have not looked in detail, so maybe I'm misunderstanding what is being
done here -- but how do "clauses->seq" and "level = GOMP_DIM_MAX" fit
together?

> @@ -1689,7 +1699,7 @@ gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
>    if (level < 0)
>      level = GOMP_DIM_MAX;
>  
> -  return level;
> +  return ret;
>  }

Just from that last hunk, it seems that the assignment to "level" is a
dead store?

> --- a/gcc/fortran/trans-decl.c
> +++ b/gcc/fortran/trans-decl.c
> @@ -1308,30 +1308,34 @@ gfc_add_assign_aux_vars (gfc_symbol * sym)
>  }
>  
>  
> -static tree
> -add_attributes_to_decl (symbol_attribute sym_attr, tree list)
> +tree
> +add_omp_offloading_attributes (unsigned omp_declare_target,
> +			       enum oacc_function acc_routine, tree list)
>  {
> -  unsigned id;
> -  tree attr;
> -
> -  for (id = 0; id < EXT_ATTR_NUM; id++)
> -    if (sym_attr.ext_attr & (1 << id))
> -      {
> -	attr = build_tree_list (
> -		 get_identifier (ext_attr_list[id].middle_end_name),
> -				 NULL_TREE);
> -	list = chainon (list, attr);
> -      }
> -
> -  if (sym_attr.omp_declare_target)
> +  if (omp_declare_target)
>      list = tree_cons (get_identifier ("omp declare target"),
>  		      NULL_TREE, list);
>  
> -  if (sym_attr.oacc_function)
> +  if (acc_routine)
>      {
>        tree dims = NULL_TREE;
>        int ix;
> -      int level = sym_attr.oacc_function - 1;
> +      int level = GOMP_DIM_MAX;
> +
> +      switch (acc_routine)
> +	{
> +	case OACC_FUNCTION_GANG:
> +	  level = GOMP_DIM_GANG;
> +	  break;
> +	case OACC_FUNCTION_WORKER:
> +	  level = GOMP_DIM_WORKER;
> +	  break;
> +	case OACC_FUNCTION_VECTOR:
> +	  level = GOMP_DIM_VECTOR;
> +	  break;
> +	case OACC_FUNCTION_SEQ:
> +	default:;
> +	}
>  
>        for (ix = GOMP_DIM_MAX; ix--;)
>  	dims = tree_cons (build_int_cst (boolean_type_node, ix >= level),
> @@ -1344,6 +1348,27 @@ add_attributes_to_decl (symbol_attribute sym_attr, tree list)
>    return list;
>  }
>  
> +static tree
> +add_attributes_to_decl (symbol_attribute sym_attr, tree list)
> +{
> +  unsigned id;
> +  tree attr;
> +
> +  for (id = 0; id < EXT_ATTR_NUM; id++)
> +    if (sym_attr.ext_attr & (1 << id))
> +      {
> +	attr = build_tree_list (
> +		 get_identifier (ext_attr_list[id].middle_end_name),
> +				 NULL_TREE);
> +	list = chainon (list, attr);
> +      }
> +
> +  list = add_omp_offloading_attributes (sym_attr.omp_declare_target,
> +					sym_attr.oacc_function, list);
> +
> +  return list;
> +}

Something that I had noticed before, possibly related here: code in
gcc/fortran/ does never call replace_oacc_fn_attrib, but the C and C++
front ends do.  Is that function what you've re-implemented here?

> --- a/gcc/lto-cgraph.c
> +++ b/gcc/lto-cgraph.c
> @@ -1201,9 +1201,11 @@ input_overwrite_node (struct lto_file_decl_data *file_data,
>  				     LDPR_NUM_KNOWN);
>    node->instrumentation_clone = bp_unpack_value (bp, 1);
>    node->split_part = bp_unpack_value (bp, 1);
> -  gcc_assert (flag_ltrans
> -	      || (!node->in_other_partition
> -		  && !node->used_from_other_partition));
> +
> +  int success = flag_ltrans || (!node->in_other_partition
> +				&& !node->used_from_other_partition);
> +  if (!success)
> +    error ("Missing %<%s%>", node->name ());
>  }
>  
>  /* Return string alias is alias of.  */
> @@ -1416,9 +1418,11 @@ input_varpool_node (struct lto_file_decl_data *file_data,
>      node->set_section_for_node (section);
>    node->resolution = streamer_read_enum (ib, ld_plugin_symbol_resolution,
>  					        LDPR_NUM_KNOWN);
> -  gcc_assert (flag_ltrans
> -	      || (!node->in_other_partition
> -		  && !node->used_from_other_partition));
> +
> +  int success = flag_ltrans || (!node->in_other_partition
> +				&& !node->used_from_other_partition);
> +  if (!success)
> +    error ("Missing %<%s%>", node->name ());
>  
>    return node;
>  }

That looks similar to what I remember from earlier, simiar patches, as
referenced above.

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -17114,6 +17114,28 @@ lower_omp_1 (gimple_stmt_iterator *gsi_p, omp_context *ctx)
>  	  default:
>  	    break;
>  	  }
> +      /* Warn if a non-'acc routine' function is called from an OpenACC
> +	 offloaded region.  */
> +      if (fndecl)
> +	{
> +	  omp_context *octx = ctx;
> +	  bool is_oacc_offloaded = false;
> +
> +	  /* Check if the current function is an 'acc routine'.  */
> +	  if (get_oacc_fn_attrib (current_function_decl) != NULL_TREE)
> +	    is_oacc_offloaded = true;
> +
> +	  while (!is_oacc_offloaded && octx)
> +	    {
> +	      if (is_oacc_parallel (octx) || is_oacc_kernels (octx))
> +		is_oacc_offloaded = true;
> +	      octx = octx->outer;
> +	    }
> +
> +	  if (is_oacc_offloaded && get_oacc_fn_attrib (fndecl) == NULL_TREE)
> +	    warning_at (gimple_location (call_stmt), 0,
> +			"%qE is not an %<acc routine%>", fndecl);
> +	}
>        /* FALLTHRU */
>      default:
>        if ((ctx || task_shared_vars)

Per my reasoning above, we should either get a undeclared symbol error
(if the target compiler doesn't know about the routine), or should get a
offloading compiler link-time error, if the -- implicit "seq" -- routine
is missing there.

> @@ -19420,7 +19442,8 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask)
>      {
>        unsigned outermost = this_mask & -this_mask;
>  
> -      if (outermost && outermost <= outer_mask)
> +      if ((outermost && outermost <= outer_mask)
> +	  || (this_mask && (loop->parent->flags & OLF_SEQ)))
>  	{
>  	  if (noisy)
>  	    {

> --- a/gcc/testsuite/c-c++-common/goacc/routine-3.c
> +++ b/gcc/testsuite/c-c++-common/goacc/routine-3.c
> @@ -49,7 +49,7 @@ main ()
>    int red = 0;
>  #pragma acc parallel copy (red)
>    {
> -    /* Independent/seq loop tests.  */
> +    /* Independent loop tests.  */
>  #pragma acc loop reduction (+:red) // { dg-warning "insufficient partitioning" }
>      for (int i = 0; i < 10; i++)
>        red += gang ();
> @@ -62,6 +62,19 @@ main ()
>      for (int i = 0; i < 10; i++)
>        red += vector ();
>  
> +    /* Seq loop tests.  */
> +#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
> +    for (int i = 0; i < 10; i++)
> +      red += gang (); /* { dg-error "incorrectly nested" } */
> +
> +#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
> +    for (int i = 0; i < 10; i++)
> +      red += worker (); /* { dg-error "incorrectly nested" } */
> +
> +#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
> +    for (int i = 0; i < 10; i++)
> +      red += vector (); /* { dg-error "incorrectly nested" } */
> +    
>      /* Gang routine tests.  */
>  #pragma acc loop gang reduction (+:red)  /* { dg-message "containing loop" } */
>      for (int i = 0; i < 10; i++)

Do these test case changes actually relate to any of the compiler changes
discussed above?  Maybe to the oacc_loop_fixed_partitions cited just
above?  Is that a separate issue to fix?  Eh, or is that actually the fix
for your first issue, the "problems with acc routines [...] incorrectly
permitting 'acc seq' loops to call gang, worker and vector routines"?

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
> @@ -1,4 +1,4 @@
>  /* { dg-do run { target lto } } */
> -/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
> +/* { dg-additional-options "-fipa-pta -flto -flto-partition=max -fno-exceptions" } */
>  
>  #include "data-clauses-kernels.c"

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
> @@ -1,2 +1,4 @@
> +/* { dg-additional-options "-fno-exceptions" }  */
> +
>  #define CONSTRUCT kernels
>  #include "data-clauses.h"

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
> @@ -1,4 +1,4 @@
>  /* { dg-do run { target lto } } */
> -/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
> +/* { dg-additional-options "-fipa-pta -flto -flto-partition=max -fno-exceptions" } */
>  
>  #include "data-clauses-parallel.c"

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
> @@ -1,2 +1,4 @@
> +/* { dg-additional-options "-fno-exceptions" }  */
> +
>  #define CONSTRUCT parallel
>  #include "data-clauses.h"

Hmm?


Grüße
 Thomas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 472 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls
  2016-06-29 14:11     ` Thomas Schwinge
@ 2016-06-29 14:35       ` Jakub Jelinek
  2016-06-29 15:31       ` Cesar Philippidis
  1 sibling, 0 replies; 14+ messages in thread
From: Jakub Jelinek @ 2016-06-29 14:35 UTC (permalink / raw)
  To: Thomas Schwinge
  Cc: Cesar Philippidis, gcc-patches, Fortran List, Alexander Monakov,
	Ilya Verbin

On Wed, Jun 29, 2016 at 04:11:31PM +0200, Thomas Schwinge wrote:
> > >> Actually, I relaxed non-acc routines by excluding
> > >> calls to builtin functions, including those prefixed with _gfortran_.
> > >> Nvptx does have a newlib c library, and it also has a subset of
> > >> libgfortran. Still, this solution is probably not optimal.
> > > 
> > > I don't really like that, hardcoding prefixes or whatever is available
> > > (you have quite some subset of libc, libm etc. available too) in the
> > > compiler looks very hackish.  What is wrong with complaining during
> > > linking of the offloaded code?
> 
> ACK.  Jakub, do I understand you correctly, that you basically say that
> every function declaration that is in scope inside offloaded regions (for
> example, GCC builtin functions, or standard library functions declared in
> target compiler's header files) is permitted to be called in offloaded
> regions, and the offloading compiler will then either be able to resolve
> these (nvptx back end knows about trigonometric functions, for example,
> and a lot of functions are available in the nvptx libc), or otherwise
> error out during the offloading compilation (during linking), gracefully
> without terminating the target compilation (that "gracefully" bit is
> currently missing -- that's for another day).  That is, all such
> functions are implicitly callable as OpenACC "seq" functions (which means
> that they don't internally use gang/worker/vector parallelism).  In
> particular, all these functions do *not* need to be marked with an
> explicit "#pragma acc routine seq" directive.  (Functions internally
> using gang/worker/vector parallelism will need to be marked
> appropriately, using a "#pragma acc routine gang/worker/vector"
> directive.)  That's how I understand your comment above, and your earlier
> comments on this topic, and also is what I think should be done.

Yes.  Well, OpenMP doesn't have different kinds of target functions, just
one.  And at least the current spec doesn't require that target regions or
declare target functions only call functions declared target, I guess mainly
because that would require that all the C/C++ headers are OpenMP aware and
declare everything that has the offloading counterpart.
For user code, of course users have to declare their routines, otherwise it
just can't be offloaded, and the implementation runtime is a very fuzzy
thing outside of the standard.

	Jakub

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH,openacc] check for compatible loop parallelism with acc routine calls
  2016-06-29 14:11     ` Thomas Schwinge
  2016-06-29 14:35       ` Jakub Jelinek
@ 2016-06-29 15:31       ` Cesar Philippidis
  1 sibling, 0 replies; 14+ messages in thread
From: Cesar Philippidis @ 2016-06-29 15:31 UTC (permalink / raw)
  To: Thomas Schwinge, Jakub Jelinek
  Cc: gcc-patches, Fortran List, Alexander Monakov, Ilya Verbin

On 06/29/2016 07:11 AM, Thomas Schwinge wrote:

> Cesar, I have not yet fully digested this, but do I understand right that
> you're really fixing two issues here, that are related (OpenACC routines)
> but still can be addressed independently of each other?  Do I understand
> right that the first one, the "problems with acc routines [...]
> incorrectly permitting 'acc seq' loops to call gang, worker and vector
> routines" is just a Fortran front end patch?  If yes, please split that
> one out, so as to reduce the volume of remaining changes that remain to
> be discussed.

This patch addresses the following issues:

 1. Issues warnings when a non-acc routine function is called inside an
    OpenACC offloaded region.

 2. It corrects a bug what was allowing seq loops to call gang, worker
    and vector routines.

 3. It adds supports for acc routines in fortran modules (which I
    noticed was missing when I added 'acc routine seq' to acc_on_device
    in the fortran openacc include files).

I'll split these into separate patches.

> On Thu, 23 Jun 2016 09:05:38 -0700, Cesar Philippidis <cesar@codesourcery.com> wrote:
>> On 06/17/2016 07:42 AM, Jakub Jelinek wrote:
>>> On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote:
>>>> The second set of changes involves teaching the gimplifier to error when
>>>> it detects a function call to an non-acc routines inside an OpenACC
>>>> offloaded region.
> 
> As I understand, that's the same problem as has been discussed before
> (Ilya CCed), and has recently again been filed in
> <https://gcc.gnu.org/PR71499> "ICE in LTO1 when attempting NVPTX
> offloading (-fopenacc)", and <https://gcc.gnu.org/PR71535> "ICE in LTO1
> with -fopenmp offloading" (Alexander CCed).  Some earlier discussion
> threads include:
> <http://news.gmane.org/find-root.php?message_id=%3C20150109145702.GA45210%40msticlxl57.ims.intel.com%3E>,
> <http://news.gmane.org/find-root.php?message_id=%3C20150724152119.GA41292%40msticlxl57.ims.intel.com%3E>,
> <http://news.gmane.org/find-root.php?message_id=%3C56269C05.6030502%40acm.org%3E>.
> 
>>>> Actually, I relaxed non-acc routines by excluding
>>>> calls to builtin functions, including those prefixed with _gfortran_.
>>>> Nvptx does have a newlib c library, and it also has a subset of
>>>> libgfortran. Still, this solution is probably not optimal.
>>>
>>> I don't really like that, hardcoding prefixes or whatever is available
>>> (you have quite some subset of libc, libm etc. available too) in the
>>> compiler looks very hackish.  What is wrong with complaining during
>>> linking of the offloaded code?
> 
> ACK.  Jakub, do I understand you correctly, that you basically say that
> every function declaration that is in scope inside offloaded regions (for
> example, GCC builtin functions, or standard library functions declared in
> target compiler's header files) is permitted to be called in offloaded
> regions, and the offloading compiler will then either be able to resolve
> these (nvptx back end knows about trigonometric functions, for example,
> and a lot of functions are available in the nvptx libc), or otherwise
> error out during the offloading compilation (during linking), gracefully
> without terminating the target compilation (that "gracefully" bit is
> currently missing -- that's for another day).  That is, all such
> functions are implicitly callable as OpenACC "seq" functions (which means
> that they don't internally use gang/worker/vector parallelism).  In
> particular, all these functions do *not* need to be marked with an
> explicit "#pragma acc routine seq" directive.  (Functions internally
> using gang/worker/vector parallelism will need to be marked
> appropriately, using a "#pragma acc routine gang/worker/vector"
> directive.)  That's how I understand your comment above, and your earlier
> comments on this topic, and also is what I think should be done.

OK. I'll drop the warning changes from my patch set then unless you want
to keep it.

> A few random comments on the patch:
> 
>> --- a/gcc/fortran/gfortran.h
>> +++ b/gcc/fortran/gfortran.h
>> @@ -303,6 +303,15 @@ enum save_state
>>  { SAVE_NONE = 0, SAVE_EXPLICIT, SAVE_IMPLICIT
>>  };
>>  
>> +/* Flags to keep track of ACC routine states.  */
>> +enum oacc_function
>> +{ OACC_FUNCTION_NONE = 0,
>> +  OACC_FUNCTION_SEQ,
>> +  OACC_FUNCTION_GANG,
>> +  OACC_FUNCTION_WORKER,
>> +  OACC_FUNCTION_VECTOR
>> +};
> 
> What's the purpose of OACC_FUNCTION_NONE?  It's not used anywhere, as far
> as I can tell?

It's used by the fortran module code. It controls how parallelism gets
encoded in the .mod files.

>> --- a/gcc/fortran/openmp.c
>> +++ b/gcc/fortran/openmp.c
>> @@ -1664,21 +1664,31 @@ gfc_match_oacc_cache (void)
>>  
>>  /* Determine the loop level for a routine.   */
>>  
>> -static int
>> +static oacc_function
>>  gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
>>  {
>>    int level = -1;
>> +  oacc_function ret = OACC_FUNCTION_SEQ;
>>  
>>    if (clauses)
>>      {
>>        unsigned mask = 0;
>>  
>>        if (clauses->gang)
>> -	level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
>> +	{
>> +	  level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
>> +	  ret = OACC_FUNCTION_GANG;
>> +	}
>>        if (clauses->worker)
>> -	level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
>> +	{
>> +	  level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
>> +	  ret = OACC_FUNCTION_WORKER;
>> +	}
>>        if (clauses->vector)
>> -	level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
>> +	{
>> +	  level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
>> +	  ret = OACC_FUNCTION_VECTOR;
>> +	}
>>        if (clauses->seq)
>>  	level = GOMP_DIM_MAX, mask |= GOMP_DIM_MASK (level);
>>  
> 
> I have not looked in detail, so maybe I'm misunderstanding what is being
> done here -- but how do "clauses->seq" and "level = GOMP_DIM_MAX" fit
> together?

Conceptually, if you take a look at the oacc_function attribute in a
tree dump, you'll see an array with three elements. Basically, each
element in that array represents a gang, worker or vector parallelism.
By definition, a gang loop permits a worker and vector loop to be nested
inside it. So, for a gang routine, the oacc_function attribute is
constructed such that it permits gang, worker and vector level
parallelism. Similarly, for a worker routine, the oacc_function
attribute has the worker and vector level parallelism 'bits' set.

With that in mind, setting seq to GOMP_DIM_MASK allows the loop creating
that oacc_function attribute to mask out any gang, worker and vector
parallelism.

>> @@ -1689,7 +1699,7 @@ gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
>>    if (level < 0)
>>      level = GOMP_DIM_MAX;
>>  
>> -  return level;
>> +  return ret;
>>  }
> 
> Just from that last hunk, it seems that the assignment to "level" is a
> dead store?

I'll need to check this when I split out the patch.

>> +static tree
>> +add_attributes_to_decl (symbol_attribute sym_attr, tree list)
>> +{
>> +  unsigned id;
>> +  tree attr;
>> +
>> +  for (id = 0; id < EXT_ATTR_NUM; id++)
>> +    if (sym_attr.ext_attr & (1 << id))
>> +      {
>> +	attr = build_tree_list (
>> +		 get_identifier (ext_attr_list[id].middle_end_name),
>> +				 NULL_TREE);
>> +	list = chainon (list, attr);
>> +      }
>> +
>> +  list = add_omp_offloading_attributes (sym_attr.omp_declare_target,
>> +					sym_attr.oacc_function, list);
>> +
>> +  return list;
>> +}
> 
> Something that I had noticed before, possibly related here: code in
> gcc/fortran/ does never call replace_oacc_fn_attrib, but the C and C++
> front ends do.  Is that function what you've re-implemented here?

Similar, but I broke this code out from another function to handle
BUILT_IN_EXPECT. But I can revert this change now, since BUILT_IN_EXPECT
will be treated as an implicit SEQ routine.

>> --- a/gcc/lto-cgraph.c
>> +++ b/gcc/lto-cgraph.c
>> @@ -1201,9 +1201,11 @@ input_overwrite_node (struct lto_file_decl_data *file_data,
>>  				     LDPR_NUM_KNOWN);
>>    node->instrumentation_clone = bp_unpack_value (bp, 1);
>>    node->split_part = bp_unpack_value (bp, 1);
>> -  gcc_assert (flag_ltrans
>> -	      || (!node->in_other_partition
>> -		  && !node->used_from_other_partition));
>> +
>> +  int success = flag_ltrans || (!node->in_other_partition
>> +				&& !node->used_from_other_partition);
>> +  if (!success)
>> +    error ("Missing %<%s%>", node->name ());
>>  }
>>  
>>  /* Return string alias is alias of.  */
>> @@ -1416,9 +1418,11 @@ input_varpool_node (struct lto_file_decl_data *file_data,
>>      node->set_section_for_node (section);
>>    node->resolution = streamer_read_enum (ib, ld_plugin_symbol_resolution,
>>  					        LDPR_NUM_KNOWN);
>> -  gcc_assert (flag_ltrans
>> -	      || (!node->in_other_partition
>> -		  && !node->used_from_other_partition));
>> +
>> +  int success = flag_ltrans || (!node->in_other_partition
>> +				&& !node->used_from_other_partition);
>> +  if (!success)
>> +    error ("Missing %<%s%>", node->name ());
>>  
>>    return node;
>>  }
> 
> That looks similar to what I remember from earlier, simiar patches, as
> referenced above.

It is. I never got around to pushing that patch very strongly because I
thought those link failures were legitimate compiler bugs.

>> @@ -19420,7 +19442,8 @@ oacc_loop_fixed_partitions (oacc_loop *loop, unsigned outer_mask)
>>      {
>>        unsigned outermost = this_mask & -this_mask;
>>  
>> -      if (outermost && outermost <= outer_mask)
>> +      if ((outermost && outermost <= outer_mask)
>> +	  || (this_mask && (loop->parent->flags & OLF_SEQ)))
>>  	{
>>  	  if (noisy)
>>  	    {
> 
>> --- a/gcc/testsuite/c-c++-common/goacc/routine-3.c
>> +++ b/gcc/testsuite/c-c++-common/goacc/routine-3.c
>> @@ -49,7 +49,7 @@ main ()
>>    int red = 0;
>>  #pragma acc parallel copy (red)
>>    {
>> -    /* Independent/seq loop tests.  */
>> +    /* Independent loop tests.  */
>>  #pragma acc loop reduction (+:red) // { dg-warning "insufficient partitioning" }
>>      for (int i = 0; i < 10; i++)
>>        red += gang ();
>> @@ -62,6 +62,19 @@ main ()
>>      for (int i = 0; i < 10; i++)
>>        red += vector ();
>>  
>> +    /* Seq loop tests.  */
>> +#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
>> +    for (int i = 0; i < 10; i++)
>> +      red += gang (); /* { dg-error "incorrectly nested" } */
>> +
>> +#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
>> +    for (int i = 0; i < 10; i++)
>> +      red += worker (); /* { dg-error "incorrectly nested" } */
>> +
>> +#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } */
>> +    for (int i = 0; i < 10; i++)
>> +      red += vector (); /* { dg-error "incorrectly nested" } */
>> +    
>>      /* Gang routine tests.  */
>>  #pragma acc loop gang reduction (+:red)  /* { dg-message "containing loop" } */
>>      for (int i = 0; i < 10; i++)
> 
> Do these test case changes actually relate to any of the compiler changes
> discussed above?  Maybe to the oacc_loop_fixed_partitions cited just
> above?  Is that a separate issue to fix?  Eh, or is that actually the fix
> for your first issue, the "problems with acc routines [...] incorrectly
> permitting 'acc seq' loops to call gang, worker and vector routines"?

This is issue 2, and I'll break it out into a separate patch.

>> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
>> @@ -1,4 +1,4 @@
>>  /* { dg-do run { target lto } } */
>> -/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
>> +/* { dg-additional-options "-fipa-pta -flto -flto-partition=max -fno-exceptions" } */
>>  
>>  #include "data-clauses-kernels.c"
> 
>> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
>> @@ -1,2 +1,4 @@
>> +/* { dg-additional-options "-fno-exceptions" }  */
>> +
>>  #define CONSTRUCT kernels
>>  #include "data-clauses.h"
> 
>> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
>> @@ -1,4 +1,4 @@
>>  /* { dg-do run { target lto } } */
>> -/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
>> +/* { dg-additional-options "-fipa-pta -flto -flto-partition=max -fno-exceptions" } */
>>  
>>  #include "data-clauses-parallel.c"
> 
>> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
>> @@ -1,2 +1,4 @@
>> +/* { dg-additional-options "-fno-exceptions" }  */
>> +
>>  #define CONSTRUCT parallel
>>  #include "data-clauses.h"
> 
> Hmm?

I'm not sure what happened here either. Maybe adding the 'acc routine'
directive to acc_on_device is preventing that function from expanding to
its builtin function counterpart, which caused gcc to generate exception
code?

Cesar

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PR72741] Encode OpenACC 'routine' directive inside Fortran module files
  2016-06-16  3:12 [PATCH,openacc] check for compatible loop parallelism with acc routine calls Cesar Philippidis
  2016-06-17 14:42 ` Jakub Jelinek
@ 2019-02-28 21:12 ` Thomas Schwinge
  2019-02-28 21:17   ` Jakub Jelinek
  1 sibling, 1 reply; 14+ messages in thread
From: Thomas Schwinge @ 2019-02-28 21:12 UTC (permalink / raw)
  To: gcc-patches, fortran; +Cc: Tobias Burnus, Jakub Jelinek


[-- Attachment #1.1: Type: text/plain, Size: 2202 bytes --]

Hi!

On Wed, 15 Jun 2016 20:12:15 -0700, Cesar Philippidis <cesar@codesourcery.com> wrote:
> [...], this patch updates the way that
> the fortran FE handles the 'acc routine' attribute in modules. Before,
> it only recorded that a function was marked as an acc routine.

(By means of 'OMP_DECLARE_TARGET', that is.)

> With this
> patch, it now records the level of parallelism the routine has. This is
> necessary for the middle end to validate compatible parallelism between
> the loop calling the routine and the routine itself.

This patch has seen a bunch of further revisions later on.  I've now
singled out the changes that are actually relevant for the feature under
discussion here, and added test cases that actually test what they
describe to be testing...  ;-)


The code changes now are actually very simple.  The "problem" is that
we're incrementing the Fortran module version, 'MOD_VERSION', which
breaks binary compatibility with Fortran module files created with
earlier versions of GCC, which is something that is to be avoided, as
I've heard.  Or, is it not that bad actually?

We might be able to resolve this by encoding individual "bits" for the
'gang'/'worker'/'vector'/'seq' clauses, instead of using the values for
'enum oacc_routine_lop' via 'oacc_routine_lop_types'.  The earlier
Fortran module files would simply not have these bits set, and could thus
still be read (given 'MOD_VERSION' not incremented).  Would that be a
solution to this issue?  (You'd then get a parse error when trying to use
with an older version of GCC any Fortran module files created with a
newer version, which seems OK?)

Or, an idea I just had (but not yet verified), guard the stream-out and
stream-in of 'attr->oacc_routine_lop' to just happen if '-fopenacc' is
active (or some such), which thus won't affect that majority of users.
Would that be a solution to this issue?


See attached, any comments?

And, can this still go into trunk now?  (Rationale: I'd like to later fix
this issue on GCC release branches also, because this is quite a
limitation to usage.  This will thus again bring up the 'MOD_VERSION'
issue.)


Grüße
 Thomas



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: 0001-testsuite-Fortran-Provide-dg-compile-aux-modules-in-.patch --]
[-- Type: text/x-diff, Size: 1826 bytes --]

From f2c4627045e70a6f6c52914cf6334392aca75230 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 28 Feb 2019 17:45:13 +0100
Subject: [PATCH 1/2] [testsuite, Fortran] Provide 'dg-compile-aux-modules' in
 'gfortran.dg/goacc/goacc.exp'

..., as yet another copy from 'gfortran.dg/dg.exp', which there are a few
already.

	gcc/testsuite/
	* gfortran.dg/goacc/goacc.exp (dg-compile-aux-modules): New proc.
---
 gcc/testsuite/gfortran.dg/goacc/goacc.exp | 25 +++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/gcc/testsuite/gfortran.dg/goacc/goacc.exp b/gcc/testsuite/gfortran.dg/goacc/goacc.exp
index f1adb186a1e4..409c5fe54003 100644
--- a/gcc/testsuite/gfortran.dg/goacc/goacc.exp
+++ b/gcc/testsuite/gfortran.dg/goacc/goacc.exp
@@ -28,6 +28,31 @@ if ![check_effective_target_fopenacc] {
 # Initialize `dg'.
 dg-init
 
+global gfortran_test_path
+global gfortran_aux_module_flags
+set gfortran_test_path $srcdir/$subdir
+set gfortran_aux_module_flags "-fopenacc"
+proc dg-compile-aux-modules { args } {
+    global gfortran_test_path
+    global gfortran_aux_module_flags
+    if { [llength $args] != 2 } {
+	error "dg-set-target-env-var: needs one argument"
+	return
+    }
+
+    set level [info level]
+    if { [info procs dg-save-unknown] != [list] } {
+	rename dg-save-unknown dg-save-unknown-level-$level
+    }
+
+    dg-test $gfortran_test_path/[lindex $args 1] "" $gfortran_aux_module_flags
+    # cleanup-modules is intentionally not invoked here.
+
+    if { [info procs dg-save-unknown-level-$level] != [list] } {
+	rename dg-save-unknown-level-$level dg-save-unknown
+    }
+}
+
 # Main loop.
 gfortran-dg-runtest [lsort \
        [find $srcdir/$subdir *.\[fF\]{,90,95,03,08} ] ] "" "-fopenacc"
-- 
2.17.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.3: 0002-PR72741-Encode-OpenACC-routine-directive-inside-Fort.patch --]
[-- Type: text/x-diff, Size: 10279 bytes --]

From d947a297d224ced389abd7ad74d3519b4a0e8d32 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 28 Feb 2019 21:58:14 +0100
Subject: [PATCH 2/2] [PR72741] Encode OpenACC 'routine' directive inside
 Fortran module files

	gcc/fortran/
	* gfortran.h (oacc_routine_lop_types): Declare.
	* module.c (MOD_VERSION): Increment
	(oacc_routine_lop): New DECL_MIO_NAME.
	(mio_symbol_attribute): Set the oacc_routine_lop attribute.
	* symbol.c (oacc_routine_lop_types): Define.
	gcc/testsuite/
	* gfortran.dg/goacc/routine-module-1.f90: New file.
	* gfortran.dg/goacc/routine-module-2.f90: Likewise.
	* gfortran.dg/goacc/routine-module-mod-1.f90: Likewise.
---
 gcc/fortran/gfortran.h                        |  1 +
 gcc/fortran/module.c                          |  6 +-
 gcc/fortran/symbol.c                          | 12 +++
 .../gfortran.dg/goacc/routine-module-1.f90    | 47 +++++++++++
 .../gfortran.dg/goacc/routine-module-2.f90    | 31 ++++++++
 .../goacc/routine-module-mod-1.f90            | 79 +++++++++++++++++++
 6 files changed, 175 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index 3e0f634c3a8e..0d929d4d0c2c 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -336,6 +336,7 @@ extern const mstring intents[];
 extern const mstring access_types[];
 extern const mstring ifsrc_types[];
 extern const mstring save_status[];
+extern const mstring oacc_routine_lop_types[];
 
 /* Strings for DTIO procedure names.  In symbol.c.  */
 extern const mstring dtio_procs[];
diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 320b30c529ac..ce43997ccf48 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -84,7 +84,7 @@ along with GCC; see the file COPYING3.  If not see
 
 /* Don't put any single quote (') in MOD_VERSION, if you want it to be
    recognized.  */
-#define MOD_VERSION "15"
+#define MOD_VERSION "16"
 
 
 /* Structure that describes a position within a module file.  */
@@ -2126,6 +2126,7 @@ DECL_MIO_NAME (ref_type)
 DECL_MIO_NAME (sym_flavor)
 DECL_MIO_NAME (sym_intent)
 DECL_MIO_NAME (inquiry_type)
+DECL_MIO_NAME (oacc_routine_lop)
 #undef DECL_MIO_NAME
 
 /* Symbol attributes are stored in list with the first three elements
@@ -2147,6 +2148,9 @@ mio_symbol_attribute (symbol_attribute *attr)
   attr->proc = MIO_NAME (procedure_type) (attr->proc, procedures);
   attr->if_source = MIO_NAME (ifsrc) (attr->if_source, ifsrc_types);
   attr->save = MIO_NAME (save_state) (attr->save, save_status);
+  attr->oacc_routine_lop
+    = MIO_NAME (oacc_routine_lop) (attr->oacc_routine_lop,
+				   oacc_routine_lop_types);
 
   ext_attr = attr->ext_attr;
   mio_integer ((int *) &ext_attr);
diff --git a/gcc/fortran/symbol.c b/gcc/fortran/symbol.c
index c8a1f842d353..3c3d8cb22f7b 100644
--- a/gcc/fortran/symbol.c
+++ b/gcc/fortran/symbol.c
@@ -96,6 +96,18 @@ const mstring dtio_procs[] =
     minit ("_dtio_unformatted_write", DTIO_WUF),
 };
 
+const mstring oacc_routine_lop_types[] =
+{
+    minit ("OACC_ROUTINE_LOP_NONE", OACC_ROUTINE_LOP_NONE),
+    minit ("OACC_ROUTINE_LOP_GANG", OACC_ROUTINE_LOP_GANG),
+    minit ("OACC_ROUTINE_LOP_WORKER", OACC_ROUTINE_LOP_WORKER),
+    minit ("OACC_ROUTINE_LOP_VECTOR", OACC_ROUTINE_LOP_VECTOR),
+    minit ("OACC_ROUTINE_LOP_SEQ", OACC_ROUTINE_LOP_SEQ),
+    /* 'OACC_ROUTINE_LOP_ERROR' intentionally ommitted here; it's only unsed
+       internally.  */
+    minit (NULL, -1)
+};
+
 /* This is to make sure the backend generates setup code in the correct
    order.  */
 
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90
new file mode 100644
index 000000000000..4e81f11fec86
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90
@@ -0,0 +1,47 @@
+! Valid use of routines defined inside a Fortran module.
+
+! { dg-compile-aux-modules "routine-module-mod-1.f90" }
+! { dg-additional-options "-fopt-info-optimized-omp" }
+
+program main
+  use routine_module_mod_1
+  implicit none
+
+  integer :: i
+
+  call pl_1
+
+  !$acc parallel loop seq ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call g_1 ! { dg-message "optimized: assigned OpenACC gang worker vector loop parallelism" }
+     call w_1 ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+     call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop gang ! { dg-message "optimized: assigned OpenACC gang loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call w_1 ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+     call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop worker ! { dg-message "optimized: assigned OpenACC worker loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop vector ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+  end do
+  !$acc end parallel loop
+end program main
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90
new file mode 100644
index 000000000000..eae0807643c9
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90
@@ -0,0 +1,31 @@
+! Invalid use of routines defined inside a Fortran module.
+
+! { dg-compile-aux-modules "routine-module-mod-1.f90" }
+
+program main
+  use routine_module_mod_1
+  implicit none
+
+  integer :: i
+
+  !$acc parallel loop gang
+  do i = 1, 10
+     call g_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop worker
+  do i = 1, 10
+     call g_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+     call w_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop vector
+  do i = 1, 10
+     call g_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+     call w_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+     call v_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+  end do
+  !$acc end parallel loop
+end program main
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90
new file mode 100644
index 000000000000..8f73db41d523
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90
@@ -0,0 +1,79 @@
+! OpenACC 'routine' directives inside a Fortran module.
+
+! { dg-additional-options "-fopt-info-optimized-omp" }
+
+module routine_module_mod_1
+contains
+  subroutine s_1
+    implicit none
+    !$acc routine
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+    ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+    do i = 1, 3
+    end do
+  end subroutine s_1
+
+  subroutine s_2
+    implicit none
+    !$acc routine seq
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+    ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+    do i = 1, 3
+    end do
+  end subroutine s_2
+
+  subroutine v_1
+    implicit none
+    !$acc routine vector
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+    do i = 1, 3
+    end do
+  end subroutine v_1
+
+  subroutine w_1
+    implicit none
+    !$acc routine worker
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+    do i = 1, 3
+    end do
+  end subroutine w_1
+
+  subroutine g_1
+    implicit none
+    !$acc routine gang
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC gang vector loop parallelism" }
+    do i = 1, 3
+    end do
+  end subroutine g_1
+
+  subroutine pl_1
+    implicit none
+
+    integer :: i
+
+    !$acc parallel loop ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+    ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+    do i = 1, 3
+       call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+       call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+       call g_1 ! { dg-message "optimized: assigned OpenACC gang worker vector loop parallelism" }
+       call w_1 ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+       call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+    end do
+  end subroutine pl_1
+end module routine_module_mod_1
-- 
2.17.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 658 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PR72741] Encode OpenACC 'routine' directive inside Fortran module files
  2019-02-28 21:12 ` [PR72741] Encode OpenACC 'routine' directive inside Fortran module files Thomas Schwinge
@ 2019-02-28 21:17   ` Jakub Jelinek
  2019-03-13 17:50     ` [PR72741] Encode OpenACC 'routine' directive's level of parallelism " Thomas Schwinge
  0 siblings, 1 reply; 14+ messages in thread
From: Jakub Jelinek @ 2019-02-28 21:17 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: gcc-patches, fortran, Tobias Burnus

On Thu, Feb 28, 2019 at 10:12:00PM +0100, Thomas Schwinge wrote:
> On Wed, 15 Jun 2016 20:12:15 -0700, Cesar Philippidis <cesar@codesourcery.com> wrote:
> The code changes now are actually very simple.  The "problem" is that
> we're incrementing the Fortran module version, 'MOD_VERSION', which
> breaks binary compatibility with Fortran module files created with
> earlier versions of GCC, which is something that is to be avoided, as
> I've heard.  Or, is it not that bad actually?

It is bad and we certainly shouldn't change it on release branches.
There are many ways to deal with it without bumping MOD_VERSION in a
backwards but not forwards compatible way, so that a newer compiler will be
able to parse old *.mod files, and newer compiler new ones as long as this
problematic stuff doesn't appear in.

	Jakub

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PR72741] Encode OpenACC 'routine' directive's level of parallelism inside Fortran module files
  2019-02-28 21:17   ` Jakub Jelinek
@ 2019-03-13 17:50     ` Thomas Schwinge
  2019-03-13 22:13       ` Thomas Koenig
  2019-03-21 19:47       ` Thomas Schwinge
  0 siblings, 2 replies; 14+ messages in thread
From: Thomas Schwinge @ 2019-03-13 17:50 UTC (permalink / raw)
  To: gcc-patches, fortran, Jakub Jelinek; +Cc: Tobias Burnus

[-- Attachment #1: Type: text/plain, Size: 1603 bytes --]

Hi!

On Thu, 28 Feb 2019 22:17:01 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Feb 28, 2019 at 10:12:00PM +0100, Thomas Schwinge wrote:
> > On Wed, 15 Jun 2016 20:12:15 -0700, Cesar Philippidis <cesar@codesourcery.com> wrote:
> > The code changes now are actually very simple.  The "problem" is that
> > we're incrementing the Fortran module version, 'MOD_VERSION', which
> > breaks binary compatibility with Fortran module files created with
> > earlier versions of GCC, which is something that is to be avoided, as
> > I've heard.  Or, is it not that bad actually?
> 
> It is bad and we certainly shouldn't change it on release branches.

ACK.

> There are many ways to deal with it without bumping MOD_VERSION in a
> backwards but not forwards compatible way, so that a newer compiler will be
> able to parse old *.mod files, and newer compiler new ones as long as this
> problematic stuff doesn't appear in.

Like the attached, actually pretty simple now?

It may seem wasteful to use individual bits for 'gang', 'worker',
'vector', 'seq', but that makes it easy to implement the behavior
described above by Jakub, and I've heard rumors that OpenACC might at
some point allow several of these level of parallelism clauses to be
specified (plus a new 'auto' clause?), and then this will be necessary
anyway (several of these bits can then in fact appear).

If approving this patch, please respond with "Reviewed-by: NAME <EMAIL>"
so that your effort will be recorded in the commit log, see
<https://gcc.gnu.org/wiki/Reviewed-by>.


Grüße
 Thomas



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-PR72741-Encode-OpenACC-routine-directive-s-level-of-.patch --]
[-- Type: text/x-diff, Size: 11631 bytes --]

From b2f200d24d040c6d34b5b4421e4cb1be9786030f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Wed, 13 Mar 2019 18:39:53 +0100
Subject: [PATCH] [PR72741] Encode OpenACC 'routine' directive's level of
 parallelism inside Fortran module files

If 'use'ing with an old GCC a new module file (with OpenACC 'routine'
directive's level of parallelism encoded), then that expectedly fails as
follows:

    f951: Fatal Error: Reading module 'routine_module_mod_1' at line 27 column 21: find_enum(): Enum not found

If 'use'ing with a new GCC an old module file (without OpenACC 'routine'
directive's level of parallelism encoded), then that (silently) continues to
accept the module file, and will proceed with the previous, erroneous behavior.

These seem to be acceptable compromises, instead of incrementing 'MOD_VERSION'.

	gcc/fortran/
	* module.c (verify_OACC_ROUTINE_LOP_NONE): New function.
	(enum ab_attribute): Add AB_OACC_ROUTINE_LOP_GANG,
	AB_OACC_ROUTINE_LOP_WORKER, AB_OACC_ROUTINE_LOP_VECTOR,
	AB_OACC_ROUTINE_LOP_SEQ.
	(attr_bits): Add these.
	(mio_symbol_attribute): Handle these.
	gcc/testsuite/
	* gfortran.dg/goacc/routine-module-1.f90: New file.
	* gfortran.dg/goacc/routine-module-2.f90: Likewise.
	* gfortran.dg/goacc/routine-module-mod-1.f90: Likewise.
---
 gcc/fortran/module.c                          | 57 ++++++++++++-
 .../gfortran.dg/goacc/routine-module-1.f90    | 47 +++++++++++
 .../gfortran.dg/goacc/routine-module-2.f90    | 31 ++++++++
 .../goacc/routine-module-mod-1.f90            | 79 +++++++++++++++++++
 4 files changed, 213 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90

diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 0572b8e02c17..39b420039dff 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -2011,7 +2011,9 @@ enum ab_attribute
   AB_OACC_DECLARE_COPYIN, AB_OACC_DECLARE_DEVICEPTR,
   AB_OACC_DECLARE_DEVICE_RESIDENT, AB_OACC_DECLARE_LINK,
   AB_OMP_DECLARE_TARGET_LINK, AB_PDT_KIND, AB_PDT_LEN, AB_PDT_TYPE,
-  AB_PDT_TEMPLATE, AB_PDT_ARRAY, AB_PDT_STRING
+  AB_PDT_TEMPLATE, AB_PDT_ARRAY, AB_PDT_STRING,
+  AB_OACC_ROUTINE_LOP_GANG, AB_OACC_ROUTINE_LOP_WORKER,
+  AB_OACC_ROUTINE_LOP_VECTOR, AB_OACC_ROUTINE_LOP_SEQ
 };
 
 static const mstring attr_bits[] =
@@ -2081,6 +2083,10 @@ static const mstring attr_bits[] =
     minit ("PDT_TEMPLATE", AB_PDT_TEMPLATE),
     minit ("PDT_ARRAY", AB_PDT_ARRAY),
     minit ("PDT_STRING", AB_PDT_STRING),
+    minit ("OACC_ROUTINE_LOP_GANG", AB_OACC_ROUTINE_LOP_GANG),
+    minit ("OACC_ROUTINE_LOP_WORKER", AB_OACC_ROUTINE_LOP_WORKER),
+    minit ("OACC_ROUTINE_LOP_VECTOR", AB_OACC_ROUTINE_LOP_VECTOR),
+    minit ("OACC_ROUTINE_LOP_SEQ", AB_OACC_ROUTINE_LOP_SEQ),
     minit (NULL, -1)
 };
 
@@ -2128,6 +2134,15 @@ DECL_MIO_NAME (sym_intent)
 DECL_MIO_NAME (inquiry_type)
 #undef DECL_MIO_NAME
 
+/* Verify OACC_ROUTINE_LOP_NONE.  */
+
+static void
+verify_OACC_ROUTINE_LOP_NONE (enum oacc_routine_lop lop)
+{
+  if (lop != OACC_ROUTINE_LOP_NONE)
+    bad_module ("Unsupported: multiple OpenACC 'routine' levels of parallelism");
+}
+
 /* Symbol attributes are stored in list with the first three elements
    being the enumerated fields, while the remaining elements (if any)
    indicate the individual attribute bits.  The access field is not
@@ -2292,6 +2307,30 @@ mio_symbol_attribute (symbol_attribute *attr)
 	MIO_NAME (ab_attribute) (AB_PDT_ARRAY, attr_bits);
       if (attr->pdt_string)
 	MIO_NAME (ab_attribute) (AB_PDT_STRING, attr_bits);
+      switch (attr->oacc_routine_lop)
+	{
+	case OACC_ROUTINE_LOP_NONE:
+	  /* This is the default anyway, and for maintaining compatibility with
+	     the current MOD_VERSION, we're not emitting anything in that
+	     case.  */
+	  break;
+	case OACC_ROUTINE_LOP_GANG:
+	  MIO_NAME (ab_attribute) (AB_OACC_ROUTINE_LOP_GANG, attr_bits);
+	  break;
+	case OACC_ROUTINE_LOP_WORKER:
+	  MIO_NAME (ab_attribute) (AB_OACC_ROUTINE_LOP_WORKER, attr_bits);
+	  break;
+	case OACC_ROUTINE_LOP_VECTOR:
+	  MIO_NAME (ab_attribute) (AB_OACC_ROUTINE_LOP_VECTOR, attr_bits);
+	  break;
+	case OACC_ROUTINE_LOP_SEQ:
+	  MIO_NAME (ab_attribute) (AB_OACC_ROUTINE_LOP_SEQ, attr_bits);
+	  break;
+	case OACC_ROUTINE_LOP_ERROR:
+	  /* ... intentionally omitted here; it's only unsed internally.  */
+	default:
+	  gcc_unreachable ();
+	}
 
       mio_rparen ();
 
@@ -2503,6 +2542,22 @@ mio_symbol_attribute (symbol_attribute *attr)
 	    case AB_PDT_STRING:
 	      attr->pdt_string = 1;
 	      break;
+	    case AB_OACC_ROUTINE_LOP_GANG:
+	      verify_OACC_ROUTINE_LOP_NONE (attr->oacc_routine_lop);
+	      attr->oacc_routine_lop = OACC_ROUTINE_LOP_GANG;
+	      break;
+	    case AB_OACC_ROUTINE_LOP_WORKER:
+	      verify_OACC_ROUTINE_LOP_NONE (attr->oacc_routine_lop);
+	      attr->oacc_routine_lop = OACC_ROUTINE_LOP_WORKER;
+	      break;
+	    case AB_OACC_ROUTINE_LOP_VECTOR:
+	      verify_OACC_ROUTINE_LOP_NONE (attr->oacc_routine_lop);
+	      attr->oacc_routine_lop = OACC_ROUTINE_LOP_VECTOR;
+	      break;
+	    case AB_OACC_ROUTINE_LOP_SEQ:
+	      verify_OACC_ROUTINE_LOP_NONE (attr->oacc_routine_lop);
+	      attr->oacc_routine_lop = OACC_ROUTINE_LOP_SEQ;
+	      break;
 	    }
 	}
     }
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90
new file mode 100644
index 000000000000..4e81f11fec86
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90
@@ -0,0 +1,47 @@
+! Valid use of routines defined inside a Fortran module.
+
+! { dg-compile-aux-modules "routine-module-mod-1.f90" }
+! { dg-additional-options "-fopt-info-optimized-omp" }
+
+program main
+  use routine_module_mod_1
+  implicit none
+
+  integer :: i
+
+  call pl_1
+
+  !$acc parallel loop seq ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call g_1 ! { dg-message "optimized: assigned OpenACC gang worker vector loop parallelism" }
+     call w_1 ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+     call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop gang ! { dg-message "optimized: assigned OpenACC gang loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call w_1 ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+     call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop worker ! { dg-message "optimized: assigned OpenACC worker loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop vector ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+  end do
+  !$acc end parallel loop
+end program main
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90
new file mode 100644
index 000000000000..eae0807643c9
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90
@@ -0,0 +1,31 @@
+! Invalid use of routines defined inside a Fortran module.
+
+! { dg-compile-aux-modules "routine-module-mod-1.f90" }
+
+program main
+  use routine_module_mod_1
+  implicit none
+
+  integer :: i
+
+  !$acc parallel loop gang
+  do i = 1, 10
+     call g_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop worker
+  do i = 1, 10
+     call g_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+     call w_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop vector
+  do i = 1, 10
+     call g_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+     call w_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+     call v_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+  end do
+  !$acc end parallel loop
+end program main
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90
new file mode 100644
index 000000000000..3855b8c88596
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90
@@ -0,0 +1,79 @@
+! OpenACC 'routine' directives inside a Fortran module.
+
+! { dg-additional-options "-fopt-info-optimized-omp" }
+
+module routine_module_mod_1
+contains
+  subroutine s_1
+    implicit none
+    !$acc routine
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+    ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+    do i = 1, 3
+    end do
+  end subroutine s_1
+
+  subroutine s_2
+    implicit none
+    !$acc routine seq
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+    ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+    do i = 1, 3
+    end do
+  end subroutine s_2
+
+  subroutine v_1
+    implicit none
+    !$acc routine vector
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+    do i = 1, 3
+    end do
+  end subroutine v_1
+
+  subroutine w_1
+    implicit none
+    !$acc routine worker
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+    do i = 1, 3
+    end do
+  end subroutine w_1
+
+  subroutine g_1
+    implicit none
+    !$acc routine gang
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC gang vector loop parallelism" }
+    do i = 1, 3
+    end do
+  end subroutine g_1
+
+  subroutine pl_1
+    implicit none
+
+    integer :: i
+
+    !$acc parallel loop ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+    ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+    do i = 1, 3
+       call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+       call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+       call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+       call w_1 ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+       call g_1 ! { dg-message "optimized: assigned OpenACC gang worker vector loop parallelism" }
+    end do
+  end subroutine pl_1
+end module routine_module_mod_1
-- 
2.17.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PR72741] Encode OpenACC 'routine' directive's level of parallelism inside Fortran module files
  2019-03-13 17:50     ` [PR72741] Encode OpenACC 'routine' directive's level of parallelism " Thomas Schwinge
@ 2019-03-13 22:13       ` Thomas Koenig
  2019-03-14  7:38         ` Thomas Schwinge
  2019-03-21 19:47       ` Thomas Schwinge
  1 sibling, 1 reply; 14+ messages in thread
From: Thomas Koenig @ 2019-03-13 22:13 UTC (permalink / raw)
  To: Thomas Schwinge, gcc-patches, fortran, Jakub Jelinek; +Cc: Tobias Burnus

Am 13.03.19 um 18:50 schrieb Thomas Schwinge:
>> There are many ways to deal with it without bumping MOD_VERSION in a
>> backwards but not forwards compatible way, so that a newer compiler will be
>> able to parse old *.mod files, and newer compiler new ones as long as this
>> problematic stuff doesn't appear in.
> Like the attached, actually pretty simple now?

Can you explain a) how this works, and b) how you tested it?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PR72741] Encode OpenACC 'routine' directive's level of parallelism inside Fortran module files
  2019-03-13 22:13       ` Thomas Koenig
@ 2019-03-14  7:38         ` Thomas Schwinge
  2019-03-20 10:09           ` Thomas Schwinge
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Schwinge @ 2019-03-14  7:38 UTC (permalink / raw)
  To: Thomas Koenig; +Cc: Tobias Burnus, gcc-patches, fortran, Jakub Jelinek

Hi Thomas!

On Wed, 13 Mar 2019 23:13:46 +0100, Thomas Koenig <tkoenig@netcologne.de> wrote:
> Am 13.03.19 um 18:50 schrieb Thomas Schwinge:
> >> There are many ways to deal with it without bumping MOD_VERSION in a
> >> backwards but not forwards compatible way, so that a newer compiler will be
> >> able to parse old *.mod files, and newer compiler new ones as long as this
> >> problematic stuff doesn't appear in.
> > Like the attached, actually pretty simple now?
> 
> Can you explain a) how this works

I'll be happy to elaborate, but I'm not sure at which level you'd like me
to explain?

This is basically the very same thing that's being done for other 'struct
symbol_attribute' flag fields, by interpreting the 'enum
oacc_routine_lop' values as individual flags, and with the corollary (to
maintain MOD_VERSION compatibility as best as we can) that we don't
stream out its default value (so doing correspondingly to when a "real"
flag's value is 'false').

> and b) how you tested it?

I had mentioned that in the commit message: with the relevant old/new GCC
combinations, using the included test case.

Happy to explain further, if necessary.


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PR72741] Encode OpenACC 'routine' directive's level of parallelism inside Fortran module files
  2019-03-14  7:38         ` Thomas Schwinge
@ 2019-03-20 10:09           ` Thomas Schwinge
  2019-03-21 20:49             ` Thomas Koenig
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Schwinge @ 2019-03-20 10:09 UTC (permalink / raw)
  To: Thomas Koenig, gcc-patches, fortran; +Cc: Tobias Burnus, Jakub Jelinek

Hi!

Are there any further questions, or am I good to commit my patch as
posted?

On Thu, 14 Mar 2019 08:38:30 +0100, I wrote:
> On Wed, 13 Mar 2019 23:13:46 +0100, Thomas Koenig <tkoenig@netcologne.de> wrote:
> > Am 13.03.19 um 18:50 schrieb Thomas Schwinge:
> > >> There are many ways to deal with it without bumping MOD_VERSION in a
> > >> backwards but not forwards compatible way, so that a newer compiler will be
> > >> able to parse old *.mod files, and newer compiler new ones as long as this
> > >> problematic stuff doesn't appear in.
> > > Like the attached, actually pretty simple now?
> > 
> > Can you explain a) how this works
> 
> I'll be happy to elaborate, but I'm not sure at which level you'd like me
> to explain?
> 
> This is basically the very same thing that's being done for other 'struct
> symbol_attribute' flag fields, by interpreting the 'enum
> oacc_routine_lop' values as individual flags, and with the corollary (to
> maintain MOD_VERSION compatibility as best as we can) that we don't
> stream out its default value (so doing correspondingly to when a "real"
> flag's value is 'false').
> 
> > and b) how you tested it?
> 
> I had mentioned that in the commit message: with the relevant old/new GCC
> combinations, using the included test case.
> 
> Happy to explain further, if necessary.


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PR72741] Encode OpenACC 'routine' directive's level of parallelism inside Fortran module files
  2019-03-13 17:50     ` [PR72741] Encode OpenACC 'routine' directive's level of parallelism " Thomas Schwinge
  2019-03-13 22:13       ` Thomas Koenig
@ 2019-03-21 19:47       ` Thomas Schwinge
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Schwinge @ 2019-03-21 19:47 UTC (permalink / raw)
  To: gcc-patches, fortran; +Cc: Tobias Burnus, Thomas Koenig, Jakub Jelinek


[-- Attachment #1.1: Type: text/plain, Size: 1770 bytes --]

Hi!

On Wed, 13 Mar 2019 18:50:38 +0100, I wrote:
> On Thu, 28 Feb 2019 22:17:01 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> > On Thu, Feb 28, 2019 at 10:12:00PM +0100, Thomas Schwinge wrote:
> > > On Wed, 15 Jun 2016 20:12:15 -0700, Cesar Philippidis <cesar@codesourcery.com> wrote:
> > > The code changes now are actually very simple.  The "problem" is that
> > > we're incrementing the Fortran module version, 'MOD_VERSION', which
> > > breaks binary compatibility with Fortran module files created with
> > > earlier versions of GCC, which is something that is to be avoided, as
> > > I've heard.  Or, is it not that bad actually?
> > 
> > It is bad and we certainly shouldn't change it on release branches.
> 
> ACK.
> 
> > There are many ways to deal with it without bumping MOD_VERSION in a
> > backwards but not forwards compatible way, so that a newer compiler will be
> > able to parse old *.mod files, and newer compiler new ones as long as this
> > problematic stuff doesn't appear in.
> 
> Like the attached, actually pretty simple now?
> 
> It may seem wasteful to use individual bits for 'gang', 'worker',
> 'vector', 'seq', but that makes it easy to implement the behavior
> described above by Jakub, and I've heard rumors that OpenACC might at
> some point allow several of these level of parallelism clauses to be
> specified (plus a new 'auto' clause?), and then this will be necessary
> anyway (several of these bits can then in fact appear).

Committed to trunk in r269854 "[testsuite, Fortran] Provide
'dg-compile-aux-modules' in 'gfortran.dg/goacc/goacc.exp'", r269855
"[PR72741] Encode OpenACC 'routine' directive's level of parallelism
inside Fortran module files", see attached.


Grüße
 Thomas



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: 0001-testsuite-Fortran-Provide-dg-compile-aux-modul.trunk.patch --]
[-- Type: text/x-diff, Size: 2425 bytes --]

From 44ff9fb6a5f3db5c2db521bdd2aed0d2c24e1f83 Mon Sep 17 00:00:00 2001
From: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date: Thu, 21 Mar 2019 19:44:34 +0000
Subject: [PATCH 1/2] [testsuite, Fortran] Provide 'dg-compile-aux-modules' in
 'gfortran.dg/goacc/goacc.exp'

..., as yet another copy from 'gfortran.dg/dg.exp', which there are a few
already.

	gcc/testsuite/
	* gfortran.dg/goacc/goacc.exp (dg-compile-aux-modules): New proc.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@269854 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/testsuite/ChangeLog                   |  2 ++
 gcc/testsuite/gfortran.dg/goacc/goacc.exp | 25 +++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index a5211cca6c6a..1f2f3eba6f40 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,7 @@
 2019-03-21  Thomas Schwinge  <thomas@codesourcery.com>
 
+	* gfortran.dg/goacc/goacc.exp (dg-compile-aux-modules): New proc.
+
 	PR fortran/56408
 	* gcc.target/powerpc/ppc-fortran/ppc-fortran.exp
 	(dg-compile-aux-modules): Fix diagnostic.
diff --git a/gcc/testsuite/gfortran.dg/goacc/goacc.exp b/gcc/testsuite/gfortran.dg/goacc/goacc.exp
index f1adb186a1e4..1b093f6d41f5 100644
--- a/gcc/testsuite/gfortran.dg/goacc/goacc.exp
+++ b/gcc/testsuite/gfortran.dg/goacc/goacc.exp
@@ -28,6 +28,31 @@ if ![check_effective_target_fopenacc] {
 # Initialize `dg'.
 dg-init
 
+global gfortran_test_path
+global gfortran_aux_module_flags
+set gfortran_test_path $srcdir/$subdir
+set gfortran_aux_module_flags "-fopenacc"
+proc dg-compile-aux-modules { args } {
+    global gfortran_test_path
+    global gfortran_aux_module_flags
+    if { [llength $args] != 2 } {
+	error "dg-compile-aux-modules: needs one argument"
+	return
+    }
+
+    set level [info level]
+    if { [info procs dg-save-unknown] != [list] } {
+	rename dg-save-unknown dg-save-unknown-level-$level
+    }
+
+    dg-test $gfortran_test_path/[lindex $args 1] "" $gfortran_aux_module_flags
+    # cleanup-modules is intentionally not invoked here.
+
+    if { [info procs dg-save-unknown-level-$level] != [list] } {
+	rename dg-save-unknown-level-$level dg-save-unknown
+    }
+}
+
 # Main loop.
 gfortran-dg-runtest [lsort \
        [find $srcdir/$subdir *.\[fF\]{,90,95,03,08} ] ] "" "-fopenacc"
-- 
2.17.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.3: 0002-PR72741-Encode-OpenACC-routine-directive-s-lev.trunk.patch --]
[-- Type: text/x-diff, Size: 13338 bytes --]

From 44ff4c8d4b40c8d9969066b8c38b3df6b76acf17 Mon Sep 17 00:00:00 2001
From: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date: Thu, 21 Mar 2019 19:44:45 +0000
Subject: [PATCH 2/2] [PR72741] Encode OpenACC 'routine' directive's level of
 parallelism inside Fortran module files

If 'use'ing with an old GCC a new module file (with OpenACC 'routine'
directive's level of parallelism encoded), then that expectedly fails as
follows:

    f951: Fatal Error: Reading module 'routine_module_mod_1' at line 27 column 21: find_enum(): Enum not found

If 'use'ing with a new GCC an old module file (without OpenACC 'routine'
directive's level of parallelism encoded), then that (silently) continues to
accept the module file, and will proceed with the previous, erroneous behavior.

These seem to be acceptable compromises, instead of incrementing 'MOD_VERSION'.

	gcc/fortran/
	PR fortran/72741
	* module.c (verify_OACC_ROUTINE_LOP_NONE): New function.
	(enum ab_attribute): Add AB_OACC_ROUTINE_LOP_GANG,
	AB_OACC_ROUTINE_LOP_WORKER, AB_OACC_ROUTINE_LOP_VECTOR,
	AB_OACC_ROUTINE_LOP_SEQ.
	(attr_bits): Add these.
	(mio_symbol_attribute): Handle these.
	gcc/testsuite/
	PR fortran/72741
	* gfortran.dg/goacc/routine-module-1.f90: New file.
	* gfortran.dg/goacc/routine-module-2.f90: Likewise.
	* gfortran.dg/goacc/routine-module-mod-1.f90: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@269855 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/fortran/ChangeLog                         | 10 +++
 gcc/fortran/module.c                          | 57 ++++++++++++-
 gcc/testsuite/ChangeLog                       |  5 ++
 .../gfortran.dg/goacc/routine-module-1.f90    | 47 +++++++++++
 .../gfortran.dg/goacc/routine-module-2.f90    | 31 ++++++++
 .../goacc/routine-module-mod-1.f90            | 79 +++++++++++++++++++
 6 files changed, 228 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90

diff --git a/gcc/fortran/ChangeLog b/gcc/fortran/ChangeLog
index 2a585dc45f5e..2afab3920bda 100644
--- a/gcc/fortran/ChangeLog
+++ b/gcc/fortran/ChangeLog
@@ -1,3 +1,13 @@
+2019-03-21  Thomas Schwinge  <thomas@codesourcery.com>
+
+	PR fortran/72741
+	* module.c (verify_OACC_ROUTINE_LOP_NONE): New function.
+	(enum ab_attribute): Add AB_OACC_ROUTINE_LOP_GANG,
+	AB_OACC_ROUTINE_LOP_WORKER, AB_OACC_ROUTINE_LOP_VECTOR,
+	AB_OACC_ROUTINE_LOP_SEQ.
+	(attr_bits): Add these.
+	(mio_symbol_attribute): Handle these.
+
 2019-03-20  Janus Weil  <janus@gcc.gnu.org>
 
 	PR fortran/71861
diff --git a/gcc/fortran/module.c b/gcc/fortran/module.c
index 0c2699c12609..3d4b17b599ee 100644
--- a/gcc/fortran/module.c
+++ b/gcc/fortran/module.c
@@ -2011,7 +2011,9 @@ enum ab_attribute
   AB_OACC_DECLARE_COPYIN, AB_OACC_DECLARE_DEVICEPTR,
   AB_OACC_DECLARE_DEVICE_RESIDENT, AB_OACC_DECLARE_LINK,
   AB_OMP_DECLARE_TARGET_LINK, AB_PDT_KIND, AB_PDT_LEN, AB_PDT_TYPE,
-  AB_PDT_TEMPLATE, AB_PDT_ARRAY, AB_PDT_STRING
+  AB_PDT_TEMPLATE, AB_PDT_ARRAY, AB_PDT_STRING,
+  AB_OACC_ROUTINE_LOP_GANG, AB_OACC_ROUTINE_LOP_WORKER,
+  AB_OACC_ROUTINE_LOP_VECTOR, AB_OACC_ROUTINE_LOP_SEQ
 };
 
 static const mstring attr_bits[] =
@@ -2081,6 +2083,10 @@ static const mstring attr_bits[] =
     minit ("PDT_TEMPLATE", AB_PDT_TEMPLATE),
     minit ("PDT_ARRAY", AB_PDT_ARRAY),
     minit ("PDT_STRING", AB_PDT_STRING),
+    minit ("OACC_ROUTINE_LOP_GANG", AB_OACC_ROUTINE_LOP_GANG),
+    minit ("OACC_ROUTINE_LOP_WORKER", AB_OACC_ROUTINE_LOP_WORKER),
+    minit ("OACC_ROUTINE_LOP_VECTOR", AB_OACC_ROUTINE_LOP_VECTOR),
+    minit ("OACC_ROUTINE_LOP_SEQ", AB_OACC_ROUTINE_LOP_SEQ),
     minit (NULL, -1)
 };
 
@@ -2128,6 +2134,15 @@ DECL_MIO_NAME (sym_intent)
 DECL_MIO_NAME (inquiry_type)
 #undef DECL_MIO_NAME
 
+/* Verify OACC_ROUTINE_LOP_NONE.  */
+
+static void
+verify_OACC_ROUTINE_LOP_NONE (enum oacc_routine_lop lop)
+{
+  if (lop != OACC_ROUTINE_LOP_NONE)
+    bad_module ("Unsupported: multiple OpenACC 'routine' levels of parallelism");
+}
+
 /* Symbol attributes are stored in list with the first three elements
    being the enumerated fields, while the remaining elements (if any)
    indicate the individual attribute bits.  The access field is not
@@ -2292,6 +2307,30 @@ mio_symbol_attribute (symbol_attribute *attr)
 	MIO_NAME (ab_attribute) (AB_PDT_ARRAY, attr_bits);
       if (attr->pdt_string)
 	MIO_NAME (ab_attribute) (AB_PDT_STRING, attr_bits);
+      switch (attr->oacc_routine_lop)
+	{
+	case OACC_ROUTINE_LOP_NONE:
+	  /* This is the default anyway, and for maintaining compatibility with
+	     the current MOD_VERSION, we're not emitting anything in that
+	     case.  */
+	  break;
+	case OACC_ROUTINE_LOP_GANG:
+	  MIO_NAME (ab_attribute) (AB_OACC_ROUTINE_LOP_GANG, attr_bits);
+	  break;
+	case OACC_ROUTINE_LOP_WORKER:
+	  MIO_NAME (ab_attribute) (AB_OACC_ROUTINE_LOP_WORKER, attr_bits);
+	  break;
+	case OACC_ROUTINE_LOP_VECTOR:
+	  MIO_NAME (ab_attribute) (AB_OACC_ROUTINE_LOP_VECTOR, attr_bits);
+	  break;
+	case OACC_ROUTINE_LOP_SEQ:
+	  MIO_NAME (ab_attribute) (AB_OACC_ROUTINE_LOP_SEQ, attr_bits);
+	  break;
+	case OACC_ROUTINE_LOP_ERROR:
+	  /* ... intentionally omitted here; it's only unsed internally.  */
+	default:
+	  gcc_unreachable ();
+	}
 
       mio_rparen ();
 
@@ -2503,6 +2542,22 @@ mio_symbol_attribute (symbol_attribute *attr)
 	    case AB_PDT_STRING:
 	      attr->pdt_string = 1;
 	      break;
+	    case AB_OACC_ROUTINE_LOP_GANG:
+	      verify_OACC_ROUTINE_LOP_NONE (attr->oacc_routine_lop);
+	      attr->oacc_routine_lop = OACC_ROUTINE_LOP_GANG;
+	      break;
+	    case AB_OACC_ROUTINE_LOP_WORKER:
+	      verify_OACC_ROUTINE_LOP_NONE (attr->oacc_routine_lop);
+	      attr->oacc_routine_lop = OACC_ROUTINE_LOP_WORKER;
+	      break;
+	    case AB_OACC_ROUTINE_LOP_VECTOR:
+	      verify_OACC_ROUTINE_LOP_NONE (attr->oacc_routine_lop);
+	      attr->oacc_routine_lop = OACC_ROUTINE_LOP_VECTOR;
+	      break;
+	    case AB_OACC_ROUTINE_LOP_SEQ:
+	      verify_OACC_ROUTINE_LOP_NONE (attr->oacc_routine_lop);
+	      attr->oacc_routine_lop = OACC_ROUTINE_LOP_SEQ;
+	      break;
 	    }
 	}
     }
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 1f2f3eba6f40..8afdf3e980e9 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,10 @@
 2019-03-21  Thomas Schwinge  <thomas@codesourcery.com>
 
+	PR fortran/72741
+	* gfortran.dg/goacc/routine-module-1.f90: New file.
+	* gfortran.dg/goacc/routine-module-2.f90: Likewise.
+	* gfortran.dg/goacc/routine-module-mod-1.f90: Likewise.
+
 	* gfortran.dg/goacc/goacc.exp (dg-compile-aux-modules): New proc.
 
 	PR fortran/56408
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90
new file mode 100644
index 000000000000..4e81f11fec86
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-module-1.f90
@@ -0,0 +1,47 @@
+! Valid use of routines defined inside a Fortran module.
+
+! { dg-compile-aux-modules "routine-module-mod-1.f90" }
+! { dg-additional-options "-fopt-info-optimized-omp" }
+
+program main
+  use routine_module_mod_1
+  implicit none
+
+  integer :: i
+
+  call pl_1
+
+  !$acc parallel loop seq ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call g_1 ! { dg-message "optimized: assigned OpenACC gang worker vector loop parallelism" }
+     call w_1 ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+     call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop gang ! { dg-message "optimized: assigned OpenACC gang loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call w_1 ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+     call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop worker ! { dg-message "optimized: assigned OpenACC worker loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop vector ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+  do i = 1, 10
+     call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+     call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+  end do
+  !$acc end parallel loop
+end program main
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90
new file mode 100644
index 000000000000..eae0807643c9
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-module-2.f90
@@ -0,0 +1,31 @@
+! Invalid use of routines defined inside a Fortran module.
+
+! { dg-compile-aux-modules "routine-module-mod-1.f90" }
+
+program main
+  use routine_module_mod_1
+  implicit none
+
+  integer :: i
+
+  !$acc parallel loop gang
+  do i = 1, 10
+     call g_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop worker
+  do i = 1, 10
+     call g_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+     call w_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+  end do
+  !$acc end parallel loop
+
+  !$acc parallel loop vector
+  do i = 1, 10
+     call g_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+     call w_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+     call v_1 ! { dg-error "routine call uses same OpenACC parallelism as containing loop" }
+  end do
+  !$acc end parallel loop
+end program main
diff --git a/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90 b/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90
new file mode 100644
index 000000000000..3855b8c88596
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/goacc/routine-module-mod-1.f90
@@ -0,0 +1,79 @@
+! OpenACC 'routine' directives inside a Fortran module.
+
+! { dg-additional-options "-fopt-info-optimized-omp" }
+
+module routine_module_mod_1
+contains
+  subroutine s_1
+    implicit none
+    !$acc routine
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+    ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+    do i = 1, 3
+    end do
+  end subroutine s_1
+
+  subroutine s_2
+    implicit none
+    !$acc routine seq
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+    ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+    do i = 1, 3
+    end do
+  end subroutine s_2
+
+  subroutine v_1
+    implicit none
+    !$acc routine vector
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+    do i = 1, 3
+    end do
+  end subroutine v_1
+
+  subroutine w_1
+    implicit none
+    !$acc routine worker
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+    do i = 1, 3
+    end do
+  end subroutine w_1
+
+  subroutine g_1
+    implicit none
+    !$acc routine gang
+
+    integer :: i
+
+    !$acc loop ! { dg-message "optimized: assigned OpenACC gang vector loop parallelism" }
+    do i = 1, 3
+    end do
+  end subroutine g_1
+
+  subroutine pl_1
+    implicit none
+
+    integer :: i
+
+    !$acc parallel loop ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+    ! { dg-warning "insufficient partitioning available to parallelize loop" "" { target *-*-* } .-1 }
+    do i = 1, 3
+       call s_1 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+       call s_2 ! { dg-message "optimized: assigned OpenACC seq loop parallelism" }
+       call v_1 ! { dg-message "optimized: assigned OpenACC vector loop parallelism" }
+       call w_1 ! { dg-message "optimized: assigned OpenACC worker vector loop parallelism" }
+       call g_1 ! { dg-message "optimized: assigned OpenACC gang worker vector loop parallelism" }
+    end do
+  end subroutine pl_1
+end module routine_module_mod_1
-- 
2.17.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 658 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PR72741] Encode OpenACC 'routine' directive's level of parallelism inside Fortran module files
  2019-03-20 10:09           ` Thomas Schwinge
@ 2019-03-21 20:49             ` Thomas Koenig
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Koenig @ 2019-03-21 20:49 UTC (permalink / raw)
  To: Thomas Schwinge, gcc-patches, fortran; +Cc: Tobias Burnus, Jakub Jelinek

Hi Thomas,

> Are there any further questions, or am I good to commit my patch as
> posted?

Problem is, I have never looked into module writing / reading in any
detail, so it will take a few days for me to get up to speed so I can
really review your patch.

If, in the meantime, maybe somebody else with more knowledge in that
area (Janne?) could step in.

Regards

	Thomas

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-03-21 20:49 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-16  3:12 [PATCH,openacc] check for compatible loop parallelism with acc routine calls Cesar Philippidis
2016-06-17 14:42 ` Jakub Jelinek
2016-06-23 16:05   ` Cesar Philippidis
2016-06-29 14:11     ` Thomas Schwinge
2016-06-29 14:35       ` Jakub Jelinek
2016-06-29 15:31       ` Cesar Philippidis
2019-02-28 21:12 ` [PR72741] Encode OpenACC 'routine' directive inside Fortran module files Thomas Schwinge
2019-02-28 21:17   ` Jakub Jelinek
2019-03-13 17:50     ` [PR72741] Encode OpenACC 'routine' directive's level of parallelism " Thomas Schwinge
2019-03-13 22:13       ` Thomas Koenig
2019-03-14  7:38         ` Thomas Schwinge
2019-03-20 10:09           ` Thomas Schwinge
2019-03-21 20:49             ` Thomas Koenig
2019-03-21 19:47       ` Thomas Schwinge

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).